[GitHub] [spark] stczwd commented on pull request #35669: [SPARK-38041][SQL]DataFilter pushed down with PartitionFilter

GitBox Sun, 27 Feb 2022 17:37:37 -0800


stczwd commented on pull request #35669:
URL: https://github.com/apache/spark/pull/35669#issuecomment-1053776419



   > @stczwd Thanks for working on this. I haven't looked at your 
implementation yet. Based on the PR description, this PR will push down both 
data filters and partition filters, but it seems to me that we only need to 
push down data filters, because the partition filters have already been handled 
by partition pruning.
   
   Thanks for your attension. Yeap, partition filters have already been handled 
by partition pruning. However, the data filters, which pushed down, contains 
conditions for every partitions, which means we need scan all conditions for 
each partition.
   Simple examples.
   1. if condition is `(a > 0 and c = 0) or c=2`, then no datafilter will be 
pushed down, we will scan all data of partition(c=0) and partition(c=1);
   2. if condition is `(a > 10 and c=0) or (a <=10 and c=2)`, then the pushed 
filter will be `a>10 or a<=10`, we will also scan all data of partition(c=0) 
and partition(c=1);


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] stczwd commented on pull request #35669: [SPARK-38041][SQL]DataFilter pushed down with PartitionFilter

Reply via email to