AngersZhuuuu commented on pull request #33183:
URL: https://github.com/apache/spark/pull/33183#issuecomment-874514198


   Put this in comment for easy understand this issue
   
   In our product env, we meet a case that when  matching NestedColumnAliasing.
   
   For input plan 
   ```
   
   Project [struct_data#6.search_params AS value#7, 
struct_data#6.search_params.col1 AS col1#8, struct_data#6.search_params.col2 AS 
col2#9]
   +- Repartition 100, true
      +- LocalRelation <empty>, [struct_data#6]
   ```
   
   In the first loop we got nestedFieldReferences as  below,
   ```
   struct_data.search_params
   struct_data.search_params.col1
   struct_data.search_params.col2
   ```
   then after handle duplicated we got alias map
   ```
   struct_data -> struct_data.search_params 
   ```
   
   nestedFieldToAlias as
   ```
   struct_data.search_params -> (struct_data.search_params as 
extract_search_params)
   ```
   
   attrToAliases  as 
   ```
   struct_data -> [struct_data.search_params as extract_search_params]
   ```
   but when call replaceToAliases -> getNewProjectList, when replace   
`struct_data.search_params.col1` ,  `struct_data.search_params.col2`   it can't 
match  `struct_data.search_params` since we when call  `map.contains` and they 
don't have same construct method.
    
   we can get result plan as 
   ```
   Project [_extract_search_params#12 AS value#7, 
struct_data#6.search_params.col1 AS col1#8, struct_data#6.search_params.col2 AS 
col2#9]
   +- Repartition 100, true
      +- Project [struct_data#6.search_params AS _extract_search_params#12]
         +- LocalRelation <empty>, [struct_data#6]
   ```
   
   Then in the second loop  we got nestedFieldReferences as  below,
   ```
   struct_data.search_params.col1
   struct_data.search_params.col2
   ```
   
   but we gen aliasMap as 
   ```
   struct_data -> [struct_data.search_params.col1, 
struct_data.search_params.col2]
   ```
   nestedFieldToAlias as
   ```
   struct_data.search_params.col1 -> extract_col1
   struct_data.search_params.col2  -> extract_col2
   ```
   
   attrToAliases  as 
   ```
   struct_data -> [extract_col1, extract_col2]
   ```
   since we got the reference as `struct_data`, 
   but in current's project 's child's output, we don't contains `struct_data` 
col since we have replace it to `data.a`
   then the final plan will be as below 
   ```
   !Project [_extract_search_params#16 AS value#7, _extract_col1#17 AS col1#8, 
_extract_col2#17 AS col2#9]
   +- Repartition 100, true
      +- Project [struct_data#6.search_params AS _extract_search_params#16]
         +- LocalRelation <empty>, [struct_data#6]
   ```
   
    when execute will throw exception like
   
   ```
   Couldn't find extract_col1#14in [_extract_search_params#16]
   ```
    
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to