bersprockets opened a new pull request, #41353:
URL: https://github.com/apache/spark/pull/41353

   ### What changes were proposed in this pull request?
   
   In `StringUtils#orderSuggestedIdentifiersBySimilarity`, handle the case 
where the candidate attributes have a mix of empty and non-empty prefixes.
   
   ### Why are the changes needed?
   
   The following query throws a `StringIndexOutOfBoundsException`:
   ```
   with v1 as (
    select * from values (1, 2) as (c1, c2)
   ),
   v2 as (
     select * from values (2, 3) as (c1, c2)
   )
   select v1.c1, v1.c2, v2.c1, v2.c2, b
   from v1
   full outer join v2
   using (c1);
   ```
   The query should fail anyway, since `b` refers to a non-existent column. But 
it should fail with a helpful error message, not with a 
`StringIndexOutOfBoundsException`.
   
   `StringUtils#orderSuggestedIdentifiersBySimilarity` assumes that a list of 
suggested attributes with a mix of prefixes will never have an attribute name 
with an empty prefix. But in this case it does (`c1` from the `coalesce` has no 
prefix, since it is not associated with any relation or subquery):
   ```
   +- 'Project [c1#5, c2#6, c1#7, c2#8, 'b]
      +- Project [coalesce(c1#5, c1#7) AS c1#9, c2#6, c2#8] <== c1#9 has no 
prefix, unlike c2#6 (v1.c2) or c2#8 (v2.c2)
         +- Join FullOuter, (c1#5 = c1#7)
            :- SubqueryAlias v1
            :  +- CTERelationRef 0, true, [c1#5, c2#6]
            +- SubqueryAlias v2
               +- CTERelationRef 1, true, [c1#7, c2#8]
   ```
   Because of this, `orderSuggestedIdentifiersBySimilarity` returns a sorted 
list of suggestions like this:
   ```
   ArrayBuffer(.c1, v1.c2, v2.c2)
   ```
   `UnresolvedAttribute.parseAttributeName` chokes on an attribute name that 
starts with a '.'.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to