rdblue commented on issue #25955: [SPARK-29277][SQL] Add early DSv2 filter and projection pushdown URL: https://github.com/apache/spark/pull/25955#issuecomment-543415656 @cloud-fan, I rebased and updated this if you want to have another look. I updated this as you suggested so that `optimizedPlan` will always contain `DataSourceV2ScanRelation`. That change allows us to remove quite a few cases. I also updated this to solve the problem where DDL commands would have other rules run on the relation, including early push-down. As we discussed, I removed the relation from `children` for DDL commands so that rules are not run automatically, and added cases to `ResolveTables` for those plans. Right now, that is done for `DescribeTable` and `AlterTable`. Other DDL commands create tables and don't have a relation. I should point out that I didn't change `DeleteFromTable` or `UpdateTable`. Those aren't DDL commands because they modify data. Those plans also rely on the relation as a child to resolve references in the delete and update expressions. Because some rules need to run, I think it should be okay if all of the rules run. This still works fine because the plans only rely on the output of the table and it doesn't matter if the underlying relation is converted to `DataSourceV2ScanRelation`. If we want to avoid the relation underneath `DeleteFromTable` getting converted, we could avoid early push-down when there is no filter or projection. But this strategy would cause `DataSourceV2Relation` to show up in optimized plans again and require adding back all the cases I just removed. I don't have a strong opinion here and could go either way. Last, I made a small change to `Analyzer` while I was updating the `ResolveTables` rule. The cases for `UnresolvedRelation` and `InsertIntoStatement` used `lookupV2RelationAndCatalog`, which has been removed elsewhere. I removed those last uses so we could get rid of that method. Now, `UnresolvedRelation` and `InsertIntoStatement` are resolved in `ResolveCatalogs` to use `UnresolvedV2Relation` and then matched in `ResolveTables`.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
