[ https://issues.apache.org/jira/browse/SPARK-43841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726980#comment-17726980 ]
Bruce Robbins commented on SPARK-43841: --------------------------------------- PR at https://github.com/apache/spark/pull/41353 > Non-existent column in projection of full outer join with USING results in > StringIndexOutOfBoundsException > ---------------------------------------------------------------------------------------------------------- > > Key: SPARK-43841 > URL: https://issues.apache.org/jira/browse/SPARK-43841 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0 > Reporter: Bruce Robbins > Priority: Minor > > The following query throws a {{StringIndexOutOfBoundsException}}: > {noformat} > with v1 as ( > select * from values (1, 2) as (c1, c2) > ), > v2 as ( > select * from values (2, 3) as (c1, c2) > ) > select v1.c1, v1.c2, v2.c1, v2.c2, b > from v1 > full outer join v2 > using (c1); > {noformat} > The query should fail anyway, since {{b}} refers to a non-existent column. > But it should fail with a helpful error message, not with a > {{StringIndexOutOfBoundsException}}. > The issue seems to be in > {{StringUtils#orderSuggestedIdentifiersBySimilarity}}. > {{orderSuggestedIdentifiersBySimilarity}} assumes that a list of candidate > attributes with a mix of prefixes will never have an attribute name with an > empty prefix. But in this case it does ({{c1}} from the {{coalesce}} has no > prefix, since it is not associated with any relation or subquery): > {noformat} > +- 'Project [c1#5, c2#6, c1#7, c2#8, 'b] > +- Project [coalesce(c1#5, c1#7) AS c1#9, c2#6, c2#8] <== c1#9 has no > prefix, unlike c2#6 (v1.c2) or c2#8 (v2.c2) > +- Join FullOuter, (c1#5 = c1#7) > :- SubqueryAlias v1 > : +- CTERelationRef 0, true, [c1#5, c2#6] > +- SubqueryAlias v2 > +- CTERelationRef 1, true, [c1#7, c2#8] > {noformat} > Because of this, {{orderSuggestedIdentifiersBySimilarity}} returns a sorted > list of suggestions like this: > {noformat} > ArrayBuffer(.c1, v1.c2, v2.c2) > {noformat} > {{UnresolvedAttribute.parseAttributeName}} chokes on an attribute name that > starts with a namespace separator ('.'). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org