aokolnychyi commented on a change in pull request #35395:
URL: https://github.com/apache/spark/pull/35395#discussion_r802877435
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala
##########
@@ -55,3 +55,9 @@ trait AnalysisOnlyCommand extends Command {
// on the `AnalysisContext`
def markAsAnalyzed(analysisContext: AnalysisContext): LogicalPlan
}
+
+/**
+ * A command that is nested within another command after the analysis and does
not have to be
+ * executed eagerly. Such commands will be either removed or made top-level in
the optimizer.
+ */
+trait NestedCommand extends Command
Review comment:
`ReplaceData` is `NestedCommand`. Here is an example how it is handled.
```
sql(s"DELETE FROM $tableNameAsString WHERE id <= 1")
```
```
== Parsed Logical Plan ==
'DeleteFromTable ('id <= 1)
+- 'UnresolvedRelation [cat, ns1, test_table], [], false
== Analyzed Logical Plan ==
DeleteFromTable (id#88 <= 1)
:- RelationV2[id#88, dep#89] cat.ns1.test_table
+- ReplaceData RelationV2[id#88, dep#89] cat.ns1.test_table
+- Filter NOT ((id#88 <= 1) <=> true)
+- RelationV2[id#88, dep#89, _partition#91] cat.ns1.test_table
== Optimized Logical Plan ==
ReplaceData RelationV2[id#88, dep#89] cat.ns1.test_table,
org.apache.spark.sql.connector.catalog.InMemoryRowLevelOperationTable$PartitionBasedOperation$$anon$2$$anon$3@bc5bbcd
+- Project [id#88, dep#89]
+- Sort [_partition#91 ASC NULLS FIRST], false
+- RepartitionByExpression [_partition#91], 5
+- Filter NOT ((id#88 <= 1) <=> true)
+- RelationV2[id#88, dep#89, _partition#91] cat.ns1.test_table
== Physical Plan ==
ReplaceData
org.apache.spark.sql.connector.catalog.InMemoryRowLevelOperationTable$PartitionBasedOperation$$anon$2$$anon$3@bc5bbcd
+- AdaptiveSparkPlan isFinalPlan=false
+- Project [id#88, dep#89]
+- Sort [_partition#91 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(_partition#91, 5), REPARTITION_BY_NUM,
[id=#182]
+- Project [id#88, dep#89, _partition#91]
+- Filter NOT ((id#88 <= 1) <=> true)
+- BatchScan[id#88, dep#89, _partition#91] class
org.apache.spark.sql.connector.catalog.InMemoryTable$InMemoryBatchScan
RuntimeFilters: []
```
Originally, `ReplaceData` is nested in `DeleteFromTable`. We need to execute
that plan only if the table does not support DELETEs with filters. Currently,
`ReplaceData` becomes a top-level node in the optimizer but I will try to move
that to the physical planning (i.e. `DataSourceV2Strategy`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]