[GitHub] spark issue #15155: [SPARK-17477][SQL] SparkSQL cannot handle schema evoluti...

HyukjinKwon Mon, 19 Sep 2016 17:38:07 -0700

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/15155
  
    -1 : As far as I know, we are picking up a single Parquet file to read 
Spark-side schema. In this case, it is ambiguous to decide which one is "new" 
and "old". So, sometimes it'd be failed to read long as int and sometime it'd 
succeed to read int as long.
    
     I guess we need to enable merge schemas to support to infer schema from 
Parquet first but we are not supporting merging schemas with upcasting - 
[SPARK-15516](https://issues.apache.org/jira/browse/SPARK-15516). So, IMHO, 
[SPARK-15516](https://issues.apache.org/jira/browse/SPARK-15516) blocks this.
    
    If we talk about the case of setting the schema explicitly in this case, 
then, it'd turn into the subset of 
[SPARK-16544](https://issues.apache.org/jira/browse/SPARK-16544). In this case, 
I submitted a PR already https://github.com/apache/spark/pull/14215 but I 
decided to close for a better approach. If this looks good, I'd like to bring 
and re-open my old PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15155: [SPARK-17477][SQL] SparkSQL cannot handle schema evoluti...

Reply via email to