Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21184#discussion_r202334300
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
    @@ -284,6 +288,80 @@ class Analyzer(
         }
       }
     
    +  /**
    +   * Replaces [[Alias]] with the same exprId but different references with 
[[Alias]] having
    +   * different exprIds. This is a rare situation which can cause incorrect 
results.
    +   */
    +  object DeduplicateAliases extends Rule[LogicalPlan] {
    --- End diff --
    
    Yes, that is also true. But in many places in the codebase we just compare 
attributes using `semanticEquals` or in some other cases, even `equals`. Well, 
if we admit that different attributes can have the same `exprId`, all these 
places should be checked in order to be sure that the same problem cannot 
happen there too. Moreover (this is more a nit), the `semanticEquals` or 
`sameRef` method itself would be wrong according to its semantic, as it may 
return `true` even when two attributes don't have the same reference. This is 
the reason why I opted for this solution, which seems to me cleaner as it 
solves the root cause of the problem. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to