HeartSaVioR commented on code in PR #39082:
URL: https://github.com/apache/spark/pull/39082#discussion_r1053715659
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala:
##########
@@ -183,16 +172,80 @@ object LogicalRDD {
}
}
+ val logicalPlan = originDataset.logicalPlan
val optimizedPlan = originDataset.queryExecution.optimizedPlan
val executedPlan = originDataset.queryExecution.executedPlan
+ val (stats, constraints) = rewriteStatsAndConstraints(logicalPlan,
optimizedPlan)
+
LogicalRDD(
originDataset.logicalPlan.output,
rdd,
firstLeafPartitioning(executedPlan.outputPartitioning),
executedPlan.outputOrdering,
isStreaming
- )(originDataset.sparkSession, Some(optimizedPlan.stats),
Some(optimizedPlan.constraints))
Review Comment:
We tried this before, and realized that this could break existing use case
when someone is trying to checkpoint "subtree" of logical plan. Given that we
know exprId can differ, it would break expressions in above node(s).
One of the actual example is merge into materialize source of Delta Lake.
This performs join with source DF and target table with merge condition as join
condition (here the condition is built with logical plan), and source DF can
checkpoint and be replaced with LogicalRDD. It should produce the same output
to not break join condition.
https://github.com/delta-io/delta/blob/4e51a9969708080b9ac002462f20f64000288978/core/src/main/scala/org/apache/spark/sql/delta/commands/MergeIntoCommand.scala#L458-L472
https://github.com/delta-io/delta/blob/master/core/src/main/scala/org/apache/spark/sql/delta/commands/merge/MergeIntoMaterializeSource.scala#L245-L251
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]