Hi, Spark documentation says: "When writing Parquet files, all columns are automatically converted to be nullable for compatibility reasons."
Could you elaborate on the reasons for this choice? Is this for a similar reason as Protobuf which gets rid of "required" fields in version 3, since Protobuf and Parquet inherit from Dremel paper? Which risks imply such a decision? Nullability seems like a validation constraint and I am still not convinced if this is the responsibility of Parquet schema to enforce this constraint or not. Having too many constraints would make parsing, compression less efficient. I imagine if we had dozens of numerical types? I cannot find an answer on this mailing-list or on SO, nor in Google too. If this question has already been answered feel free to redirect me to it. Thank you. Julien.