[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-15 Thread wgtmac
Github user wgtmac commented on the issue: https://github.com/apache/spark/pull/15035 Just confirmed that this also doesn't work with vectorized reader. What I did is as follows: 1. Created a flat hive table with schema "name: String, id: Long". But the parquet file which

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-14 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/15035 For our vectorized parquet reader, we try to take care of these type conversions here:

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15035 We definitely shouldn't change SpecificMutableRow to do this upcast; otherwise we might introduce subtle bugs with type mismatches in the future. cc @sameeragarwal to see if there is a better

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-12 Thread wgtmac
Github user wgtmac commented on the issue: https://github.com/apache/spark/pull/15035 @HyukjinKwon Yup that makes sense. Do you have any idea where is the best place to fix this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15035 Hm.. are you sure this is a problem in all data sources? IIUC, JSON and CSV kind of allows permissive upcasting whereas ORC and Parquet do not - so this would be rather ORC and Parquet specific

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-12 Thread wgtmac
Github user wgtmac commented on the issue: https://github.com/apache/spark/pull/15035 @JoshRosen yes it may have mask overflow risk. This conversion happens when user provided schema or hive metastore schema has Long but the parquet files have Int as the schema. We cannot avoid this

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-12 Thread wgtmac
Github user wgtmac commented on the issue: https://github.com/apache/spark/pull/15035 @HyukjinKwon This is not parquet specific, it applies to other data sources as well. 1. Change the reading path for parquet: It does not solve the problem. Some queries need to read all parquet

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15035 Do you mind if I ask whether this work with vectorized parquet reader too? I know normal Parquet reader uses `SpecificMutableRow` but IIRC, Parquet vectorized reader replies on `ColumnarBatch`

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15035 Shouldn't we change the reading path for Parquet rather than changing the target row to avoid per-record type dispatch? Also, it seems a Parquet specific issue but I wonder making changes in

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-09 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15035 +1 on adding a test, otherwise this risks regressing in future refactorings. Also, I'm not sure whether `SpecificMutableRow` itself is necessarily the right place to be performing this type

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-09 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15035 Would it maybe make sense to add an automated test for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15035 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this