rdblue commented on issue #25955: [SPARK-29277][SQL] Add early DSv2 filter and 
projection pushdown
URL: https://github.com/apache/spark/pull/25955#issuecomment-543415656
 
 
   @cloud-fan, I rebased and updated this if you want to have another look.
   
   I updated this as you suggested so that `optimizedPlan` will always contain 
`DataSourceV2ScanRelation`. That change allows us to remove quite a few cases.
   
   I also updated this to solve the problem where DDL commands would have other 
rules run on the relation, including early push-down. As we discussed, I 
removed the relation from `children` for DDL commands so that rules are not run 
automatically, and added cases to `ResolveTables` for those plans. Right now, 
that is done for `DescribeTable` and `AlterTable`. Other DDL commands create 
tables and don't have a relation.
   
   I should point out that I didn't change `DeleteFromTable` or `UpdateTable`. 
Those aren't DDL commands because they modify data. Those plans also rely on 
the relation as a child to resolve references in the delete and update 
expressions. Because some rules need to run, I think it should be okay if all 
of the rules run. This still works fine because the plans only rely on the 
output of the table and it doesn't matter if the underlying relation is 
converted to `DataSourceV2ScanRelation`.
   
   If we want to avoid the relation underneath `DeleteFromTable` getting 
converted, we could avoid early push-down when there is no filter or 
projection. But this strategy would cause `DataSourceV2Relation` to show up in 
optimized plans again and require adding back all the cases I just removed. I 
don't have a strong opinion here and could go either way.
   
   Last, I made a small change to `Analyzer` while I was updating the 
`ResolveTables` rule. The cases for `UnresolvedRelation` and 
`InsertIntoStatement` used `lookupV2RelationAndCatalog`, which has been removed 
elsewhere. I removed those last uses so we could get rid of that method. Now, 
`UnresolvedRelation` and `InsertIntoStatement` are resolved in 
`ResolveCatalogs` to use `UnresolvedV2Relation` and then matched in 
`ResolveTables`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to