[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

wgtmac Mon, 12 Sep 2016 10:05:31 -0700

Github user wgtmac commented on the issue:

    https://github.com/apache/spark/pull/15035
  
    @HyukjinKwon This is not parquet specific, it applies to other data sources 
as well.
    1. Change the reading path for parquet: It does not solve the problem. Some 
queries need to read all parquet files.
    2. Make changes in row: yes, I have to change it per row because some 
parquet files have int while some parquet files have long. We can't know which 
row is good or problematic. 
    3. Vectorized parquet reader: This is a good catch. I haven't considered 
this yet.
    
    It would be great if you can come up with other good ideas and continue to 
work on it. Feedbacks and discussions are welcome. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

Reply via email to