dbtsai commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields URL: https://github.com/apache/spark/pull/27155#issuecomment-574826262 Hello @emaynardigs , Thank you for your contribution, and I do value your work a lot. In fact, at Apple, we are still using an updated version of https://github.com/apache/spark/pull/22535 which is critical to our production job. As far as I know, Databirkcs's runtime also has an implementation with similar approach to tackle this issue. The reason why I am inactive on my previous PR is that I feel adding nested support to the current filter api is a short term solution since the design doesn't consider this complex use-cases. For a better long term solution, I would like to create a new set of FilterV2 apis in DSv2 framework that makes nested columns as first class support. + @cloud-fan @rdblue @viirya for feedback on this. I already started to work on FilterV2 api, and here is WIP code https://github.com/dbtsai/spark/pull/10/files . The change is bigger than I thought, and now, I am debating do we actually need a new FilterV2 framework? Feedback and idea are welcome. Thanks.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
