[GitHub] [spark] shrprasa commented on pull request #40258: [WIP][SPARK-42655]:Incorrect ambiguous column reference error

via GitHub Fri, 03 Mar 2023 10:15:38 -0800


shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1453913043


   @srowen  Please ignore that change. It was work in progress to check few 
things. 
   The reason why we get ambiguous error in below scenario and why it's not 
correct is the result of attribute resolution returns  
   two values but both values are same. Thus, it should not throw ambiguous 
error.
   
   val df1 = 
sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", 
"col5")
   val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID")
   val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*)
   df3.select("id").show()
   org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could 
be: id, id.
   
   df3.explain()
   == Physical Plan ==
   *(1) Project [_1#6 AS id#17, _2#7 AS col2#18, _3#8 AS col3#19, _4#9 AS 
col4#20, _5#10 AS col5#21, _1#6 AS ID#17]
   
   Before the fix, attributes matched were:
   attributes: Vector(id#17, id#17)
   Thus, it throws ambiguous reference error. But if we consider only unique 
matches, it will return correct result.
   unique attributes: Vector(id#17)
   
   
       /** Map to use for direct case insensitive attribute lookups. */
       @transient private lazy val direct: Map[String, Seq[Attribute]] = {
         unique(attrs.groupBy(_.name.toLowerCase(Locale.ROOT)))
       }
   
   has value Vector(id#17, col2#18, col3#19, col4#20, col5#21, **ID**#17) but 
it should be  Vector(id#17, col2#18, col3#19, col4#20, col5#21, **id**#17)
   
   The key used for lookup is being considered as case insensitive but the 
values itself are case sensitive.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] shrprasa commented on pull request #40258: [WIP][SPARK-42655]:Incorrect ambiguous column reference error

Reply via email to