bersprockets opened a new pull request, #41353:
URL: https://github.com/apache/spark/pull/41353
### What changes were proposed in this pull request?
In `StringUtils#orderSuggestedIdentifiersBySimilarity`, handle the case
where the candidate attributes have a mix of empty and non-empty prefixes.
### Why are the changes needed?
The following query throws a `StringIndexOutOfBoundsException`:
```
with v1 as (
select * from values (1, 2) as (c1, c2)
),
v2 as (
select * from values (2, 3) as (c1, c2)
)
select v1.c1, v1.c2, v2.c1, v2.c2, b
from v1
full outer join v2
using (c1);
```
The query should fail anyway, since `b` refers to a non-existent column. But
it should fail with a helpful error message, not with a
`StringIndexOutOfBoundsException`.
`StringUtils#orderSuggestedIdentifiersBySimilarity` assumes that a list of
suggested attributes with a mix of prefixes will never have an attribute name
with an empty prefix. But in this case it does (`c1` from the `coalesce` has no
prefix, since it is not associated with any relation or subquery):
```
+- 'Project [c1#5, c2#6, c1#7, c2#8, 'b]
+- Project [coalesce(c1#5, c1#7) AS c1#9, c2#6, c2#8] <== c1#9 has no
prefix, unlike c2#6 (v1.c2) or c2#8 (v2.c2)
+- Join FullOuter, (c1#5 = c1#7)
:- SubqueryAlias v1
: +- CTERelationRef 0, true, [c1#5, c2#6]
+- SubqueryAlias v2
+- CTERelationRef 1, true, [c1#7, c2#8]
```
Because of this, `orderSuggestedIdentifiersBySimilarity` returns a sorted
list of suggestions like this:
```
ArrayBuffer(.c1, v1.c2, v2.c2)
```
`UnresolvedAttribute.parseAttributeName` chokes on an attribute name that
starts with a '.'.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]