LorenzoMartini commented on PR #36918: URL: https://github.com/apache/spark/pull/36918#issuecomment-1377123165
Hi @huaxingao. We are trying to use spark datasourceV2 and noticed that the spark v2 built-in data sources (eg parquet one, looking at `ParquetScan`) don't support this (`SupportsRuntimeFiltering` nor `SupportsRuntimeV2Filtering`) by default, creating a large performance difference between using v1 and v2 datasource ootb. Is there a plan to have them support this? It would be really beneficial for the file scans to be able to do this and given they already benefit of some push downs we were wondering why the runtime filtering is not implemented. Or maybe I am missing something? And in that case it would be great to understand how to have spark file sources take advantage of dpp. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
