ZiyaZa opened a new pull request, #52922:
URL: https://github.com/apache/spark/pull/52922

   ### What changes were proposed in this pull request?
   
   This PR adds support for reading/writing NullType columns in Parquet files 
via the `UNKNOWN` logical type annotation. Notable changes are:
   - Changing `ParquetFileFormat.supportDataType` to support NullType
   - Changing `ParquetToSparkSchemaConverter` to infer NullType if a primitive 
type has `UNKNOWN` type annotation and there's no Spark-provided expected type
   - Changing `SparkToParquetSchemaConverter` to convert NullType into a 
Parquet Boolean physical type with `UNKNOWN` annotation (physical type 
selection here is arbitrary)
   - Optimization in `On/OffHeapColumnVector` to not allocate memory if whole 
vector is guaranteed to hold only nulls, which is the case for NullType columns 
and columns that are missing from the file.
   
   ### Why are the changes needed?
   
   To support reading/writing NullType columns in Parquet files.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Previously, trying to read or write Parquet files where the schema 
contained NullType column would throw an error. Now, we no longer throw, and 
instead write/read data as expected, using the `UNKNOWN` type annotation.
   
   ### How was this patch tested?
   
   Manually verified that the Parquet files we write is readable by Apache 
Arrow. Also verified that we can read a simple Parquet file with NullType 
written by Apache Arrow. Added a new unit test. Fixed existing tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to