shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1453913043
@srowen Please ignore that change. It was work in progress to check few things. The reason why we get ambiguous error in below scenario and why it's not correct is the result of attribute resolution returns two values but both values are same. Thus, it should not throw ambiguous error. val df1 = sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", "col5") val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID") val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*) df3.select("id").show() org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could be: id, id. df3.explain() == Physical Plan == *(1) Project [_1#6 AS id#17, _2#7 AS col2#18, _3#8 AS col3#19, _4#9 AS col4#20, _5#10 AS col5#21, _1#6 AS ID#17] Before the fix, attributes matched were: attributes: Vector(id#17, id#17) Thus, it throws ambiguous reference error. But if we consider only unique matches, it will return correct result. unique attributes: Vector(id#17) /** Map to use for direct case insensitive attribute lookups. */ @transient private lazy val direct: Map[String, Seq[Attribute]] = { unique(attrs.groupBy(_.name.toLowerCase(Locale.ROOT))) } has value Vector(id#17, col2#18, col3#19, col4#20, col5#21, **ID**#17) but it should be Vector(id#17, col2#18, col3#19, col4#20, col5#21, **id**#17) The key used for lookup is being considered as case insensitive but the values itself are case sensitive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org