[GitHub] [spark] LuciferYang opened a new pull request #30652: [SPARK-33673][SQL] Avoid push down partition filters to ParquetScan for DataSourceV2

GitBox Mon, 07 Dec 2020 08:01:13 -0800


LuciferYang opened a new pull request #30652:
URL: https://github.com/apache/spark/pull/30652



   ### What changes were proposed in this pull request?
   As described in SPARK-33673, some test suites in 
`ParquetV2SchemaPruningSuite` will failed when set `parquet.version` to 1.11.1 
because Parquet will return empty results for non-existent column since 
PARQUET-1765.
   
   This pr change to use `dataSchema` instead of `schema` to build 
`pushedParquetFilters` in `ParquetScanBuilder` to avoid push down partition 
filters to `ParquetScan` for `DataSourceV2`
   
   ### Why are the changes needed?
   Prepare for upgrade using Parquet 1.11.1.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   **Manual test** 
   
   ```
   mvn -Dtest=none 
-DwildcardSuites=org.apache.spark.sql.execution.datasources.parquet.ParquetV2SchemaPruningSuite
 -Dparquet.version=1.11.1 test -pl sql/core -am
   ```
   
   **Before**
   
   **After**
   
   ```
   Run completed in 3 minutes, 46 seconds.
   Total number of tests run: 134
   Suites: completed 2, aborted 0
   Tests: succeeded 134, failed 0, canceled 0, ignored 0, pending 0
   All tests passed.
   
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LuciferYang opened a new pull request #30652: [SPARK-33673][SQL] Avoid push down partition filters to ParquetScan for DataSourceV2

Reply via email to