HeartSaVioR commented on code in PR #37187:
URL: https://github.com/apache/spark/pull/37187#discussion_r920774002


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala:
##########
@@ -116,10 +116,17 @@ case class LogicalRDD(
       case e: Attribute => rewrite.getOrElse(e, e)
     }.asInstanceOf[SortOrder])
 
+    val rewrittenOriginLogicalPlan = originLogicalPlan.map { plan =>
+      val projectList = output.map { attr =>
+        Alias(attr, attr.name)(exprId = rewrite.getOrElse(attr, attr).exprId)

Review Comment:
   This is more about the sake of defensive programming - if there is a bug 
which makes the two set of columns be out of sync, we just allow them to be out 
of sync in future instead of failing the query, given that the impact of two 
set of columns be out of sync is not that quite serious, e.g. column stat won't 
be available.
   
   In opposite way, I'm also in favor of fail-fast, setting the precondition 
that "two set of columns should be in sync", and assert the precondition on 
initialization of the class. After that we can safely assume that precondition 
is respected, and then it'd be safe to just use `rewrite(attr)` here.
   
   I'm fine either way. WDYT? cc. @cloud-fan as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to