[GitHub] spark pull request: [SPARK-6052][SQL]In JSON schema inference, we ...

liancheng Mon, 02 Mar 2015 07:28:29 -0800

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/4806#issuecomment-76731541
  
    Yeah, we introduced it for potential optimizations, but seems that it's 
causing more troubles. We decided to ignore nullability in Parquet and JSON 
data sources because this seems to be making more sense for most scenarios, 
especially when dealing with "dirty" datasets.
    
    However, completely ignoring nullability in Spark SQL also means that we 
lose part of the schema information, which affects data sources like Avro, 
ProtocolBuffer, and Thrift. Not quite sure whether this is a good idea for 
now...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6052][SQL]In JSON schema inference, we ...

Reply via email to