cloud-fan commented on PR #35395:
URL: https://github.com/apache/spark/pull/35395#issuecomment-1090542094

   I'm fine with the current change but still want to put one concern on the 
table: shall we apply filter pushdown twice for simple DELETE execution? e.g. 
we first pushdown the DELETE condition to identify the files we need to 
replace, then we pushdown the negated DELETE condition to prune the parquet row 
groups.
   
   For example, if the DELETE condition is `col > 10`, and a parquet file has 
two row groups: group 1 has values 0 to 10, and group 2 has values 11 to 20. 
This file will be identified as "affected groups" after we pushdown the DELETE 
condition `col > 10`, and we can still push down the negated DELETE condition 
`col <= 10` to skip reading row group 2 at runtime.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to