[GitHub] [spark] aokolnychyi commented on a change in pull request #35395: [SPARK-38085][SQL] DataSource V2: Handle DELETE commands for group-based sources

GitBox Wed, 09 Feb 2022 09:52:19 -0800


aokolnychyi commented on a change in pull request #35395:
URL: https://github.com/apache/spark/pull/35395#discussion_r802877435




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala
##########
@@ -55,3 +55,9 @@ trait AnalysisOnlyCommand extends Command {
   // on the `AnalysisContext`
   def markAsAnalyzed(analysisContext: AnalysisContext): LogicalPlan
 }
+
+/**
+ * A command that is nested within another command after the analysis and does 
not have to be
+ * executed eagerly. Such commands will be either removed or made top-level in 
the optimizer.
+ */
+trait NestedCommand extends Command

Review comment:
       `ReplaceData` is `NestedCommand`. Here is an example how it is handled.
   
   ```
   sql(s"DELETE FROM $tableNameAsString WHERE id <= 1")
   ```
   
   ```
   == Parsed Logical Plan ==
   'DeleteFromTable ('id <= 1)
   +- 'UnresolvedRelation [cat, ns1, test_table], [], false
   
   == Analyzed Logical Plan ==
   DeleteFromTable (id#88 <= 1)
   :- RelationV2[id#88, dep#89] cat.ns1.test_table
   +- ReplaceData RelationV2[id#88, dep#89] cat.ns1.test_table
      +- Filter NOT ((id#88 <= 1) <=> true)
         +- RelationV2[id#88, dep#89, _partition#91] cat.ns1.test_table
   
   == Optimized Logical Plan ==
   ReplaceData RelationV2[id#88, dep#89] cat.ns1.test_table, 
org.apache.spark.sql.connector.catalog.InMemoryRowLevelOperationTable$PartitionBasedOperation$$anon$2$$anon$3@bc5bbcd
   +- Project [id#88, dep#89]
      +- Sort [_partition#91 ASC NULLS FIRST], false
         +- RepartitionByExpression [_partition#91], 5
            +- Filter NOT ((id#88 <= 1) <=> true)
               +- RelationV2[id#88, dep#89, _partition#91] cat.ns1.test_table
   
   == Physical Plan ==
   ReplaceData 
org.apache.spark.sql.connector.catalog.InMemoryRowLevelOperationTable$PartitionBasedOperation$$anon$2$$anon$3@bc5bbcd
   +- AdaptiveSparkPlan isFinalPlan=false
      +- Project [id#88, dep#89]
         +- Sort [_partition#91 ASC NULLS FIRST], false, 0
            +- Exchange hashpartitioning(_partition#91, 5), REPARTITION_BY_NUM, 
[id=#182]
               +- Project [id#88, dep#89, _partition#91]
                  +- Filter NOT ((id#88 <= 1) <=> true)
                     +- BatchScan[id#88, dep#89, _partition#91] class 
org.apache.spark.sql.connector.catalog.InMemoryTable$InMemoryBatchScan 
RuntimeFilters: []
   
   ```
   
   Originally, `ReplaceData` is nested in `DeleteFromTable`. We need to execute 
that plan only if the table does not support DELETEs with filters.  Currently, 
`ReplaceData` becomes a top-level node in the optimizer but I will try to move 
that to the physical planning (i.e. `DataSourceV2Strategy`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] aokolnychyi commented on a change in pull request #35395: [SPARK-38085][SQL] DataSource V2: Handle DELETE commands for group-based sources

Reply via email to