Takuya Ueshin created SPARK-3036: ------------------------------------ Summary: Add MapType containing null value support to Parquet. Key: SPARK-3036 URL: https://issues.apache.org/jira/browse/SPARK-3036 Project: Spark Issue Type: Bug Components: SQL Reporter: Takuya Ueshin Priority: Blocker
Current Parquet schema for {{MapType}} is as follows regardless of {{valueContainsNull}}: {noformat} message root { optional group a (MAP) { repeated group map (MAP_KEY_VALUE) { required int32 key; required int32 value; } } } {noformat} and if the map contains {{null}} value, it throws runtime exception. To handle {{MapType}} containing {{null}} value, the schema should be as follows if {{valueContainsNull}} is {{true}}: {noformat} message root { optional group a (MAP) { repeated group map (MAP_KEY_VALUE) { required int32 key; optional int32 value; } } } {noformat} FYI: Hive's Parquet writer *always* uses the latter schema, but reader can read from both schema. NOTICE: This change will break backward compatibility when the schema is read from Parquet metadata ({{"org.apache.spark.sql.parquet.row.metadata"}}). -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org