Re: Spark data quality bug when reading parquet files from hive metastore

2018-09-07 Thread Long, Andrew
Thanks Fokko, I will definitely take a look at this. Cheers Andrew From: "Driesprong, Fokko" Date: Friday, August 24, 2018 at 2:39 AM To: "reubensaw...@hotmail.com" Cc: "dev@spark.apache.org" Subject: Re: Spark data quality bug when reading parquet files f

Re: Spark data quality bug when reading parquet files from hive metastore

2018-08-24 Thread Driesprong, Fokko
Hi Andrew, This blog gives an idea how to schema is resolved: https://blog.godatadriven.com/multiformat-spark-partition There is some optimisation going on when reading Parquet using Spark. Hope this helps. Cheers, Fokko Op wo 22 aug. 2018 om 23:59 schreef t4 : >

Re: Spark data quality bug when reading parquet files from hive metastore

2018-08-22 Thread t4
https://issues.apache.org/jira/browse/SPARK-23576 ? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Spark data quality bug when reading parquet files from hive metastore

2018-08-22 Thread Long, Andrew
Hello Friends, I’ve encountered a bug where spark silently corrupts data when reading from a parquet hive table where the table schema does not match the file schema. I’d like to give a shot at adding some extra validations to the code to handle this corner case and I was wondering if anyone