[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

viirya Thu, 26 Feb 2015 15:11:27 -0800

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/4729#issuecomment-76295694
  
    @liancheng Unlike the issue of `ParquetConversions`, I think the array 
insertion issue may not be just a Hive specific one. The problem is when we 
create Parquet table that includes array (or map, struct), by default we use a 
schema that sets `containsNull` as true. But actually later we want to insert 
data, the data schema could have `containsNull` as true or false. In Hive, 
seems that it solves this problem by only supporting these fields containing 
null elements. So no matter the inserting data contains null or not, we set its 
schema to have `containsNull` as true before inserting into Parquet file. Since 
I think we don't want to explicitly change the data schema and affect other 
parts, doing it in `RowWriteSupport` should be ok, except you have other 
concerns.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5950][SQL] Enable inserting array into ...

Reply via email to