[GitHub] [spark] JiJiTang opened a new pull request #28319: [SPARK-31364][SQL]Benchmark Parquet Nested Field Predicate Pushdown

GitBox Thu, 23 Apr 2020 12:58:21 -0700


JiJiTang opened a new pull request #28319:
URL: https://github.com/apache/spark/pull/28319



   [SPARK-31364][SQL] Benchmark Parquet Nested Field Predicate Pushdown
   
   ### What changes were proposed in this pull request?
   Adding benchmark suite for nested predicate pushdown with parquet file:
   
   Performance comparison: Nested predicate pushdown disabled vs enabled,  with 
the following queries scenarios:
   
   1.  When predicate pushed down, parquet reader are able to filter out all 
the row groups without loading them.
   
   2. When predicate pushed down, parquet reader only loads one of the row 
groups.
   
   3. When predicate pushed down, parquet reader can't filter out any row group 
in order to see if we introduce too much overhead or not when enabling nested 
predicate push down. 
   
   ### Why are the changes needed?
   No benchmark exists today for nested fields predicate pushdown performance 
evaluation.
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
    Benchmark runs and reporting result.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] JiJiTang opened a new pull request #28319: [SPARK-31364][SQL]Benchmark Parquet Nested Field Predicate Pushdown

Reply via email to