Github user wgtmac commented on the issue: https://github.com/apache/spark/pull/15035 @HyukjinKwon This is not parquet specific, it applies to other data sources as well. 1. Change the reading path for parquet: It does not solve the problem. Some queries need to read all parquet files. 2. Make changes in row: yes, I have to change it per row because some parquet files have int while some parquet files have long. We can't know which row is good or problematic. 3. Vectorized parquet reader: This is a good catch. I haven't considered this yet. It would be great if you can come up with other good ideas and continue to work on it. Feedbacks and discussions are welcome. Thanks!
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org