cloud-fan commented on a change in pull request #25955: [SPARK-29277][SQL] Add
early DSv2 filter and projection pushdown
URL: https://github.com/apache/spark/pull/25955#discussion_r337318929
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
##########
@@ -243,17 +247,36 @@ class FindDataSourceTable(sparkSession: SparkSession)
extends Rule[LogicalPlan]
override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
case i @ InsertIntoStatement(UnresolvedCatalogRelation(tableMeta), _, _,
_, _)
if DDLUtils.isDatasourceTable(tableMeta) =>
- i.copy(table = readDataSourceTable(tableMeta))
+ if (DataSource.isV2Provider(tableMeta.provider.get,
sparkSession.sessionState.conf)) {
Review comment:
I see the problem now. The table lookup for SELECT/INSERT is more
complicated than I thought:
1. try to lookup temp view first.
2. lookup table/view. If it's a table from the session catalog, we should
create a v1 relation if table provider is v1, otherwise create v2 relation.
In fact, we rely on the order of `ResolveTables` and `ResolveRelations`,
which is pretty bad and violates the design of catalyst. The rules in one batch
should be order-insensitive.
This fix does resolve the problem: even if we mistakenly resolve to a v1
relation, we still have a chance to correct it to a v2 relation. But I think
it's better to fix the root cause: `ResolveTables` and `ResolveRelations` are
order-sensitive.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]