cloud-fan commented on PR #35395: URL: https://github.com/apache/spark/pull/35395#issuecomment-1090542094
I'm fine with the current change but still want to put one concern on the table: shall we apply filter pushdown twice for simple DELETE execution? e.g. we first pushdown the DELETE condition to identify the files we need to replace, then we pushdown the negated DELETE condition to prune the parquet row groups. For example, if the DELETE condition is `col > 10`, and a parquet file has two row groups: group 1 has values 0 to 10, and group 2 has values 11 to 20. This file will be identified as "affected groups" after we pushdown the DELETE condition `col > 10`, and we can still push down the negated DELETE condition `col <= 10` to skip reading row group 2 at runtime. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
