Why when writing Parquet files, columns are converted to nullable?

Julien Benoit Fri, 24 Apr 2020 01:03:40 -0700

Hi,

Spark documentation says:
"When writing Parquet files, all columns are automatically converted to be
nullable for compatibility reasons."


Could you elaborate on the reasons for this choice?

Is this for a similar reason as Protobuf which gets rid of "required"
fields in version 3, since Protobuf and Parquet inherit from Dremel paper?

Which risks imply such a decision?

Nullability seems like a validation constraint and I am still not convinced
if this is the responsibility of Parquet schema to enforce this constraint
or not. Having too many constraints would make parsing, compression less
efficient. I imagine if we had dozens of numerical types?

I cannot find an answer on this mailing-list or on SO, nor in Google too.
If this question has already been answered feel free to redirect me to it.


Thank you.

Julien.

Why when writing Parquet files, columns are converted to nullable?

Reply via email to