AngersZhuuuu commented on pull request #33183:
URL: https://github.com/apache/spark/pull/33183#issuecomment-874514198
Put this in comment for easy understand this issue
In our product env, we meet a case that when matching NestedColumnAliasing.
For input plan
```
Project [struct_data#6.search_params AS value#7,
struct_data#6.search_params.col1 AS col1#8, struct_data#6.search_params.col2 AS
col2#9]
+- Repartition 100, true
+- LocalRelation <empty>, [struct_data#6]
```
In the first loop we got nestedFieldReferences as below,
```
struct_data.search_params
struct_data.search_params.col1
struct_data.search_params.col2
```
then after handle duplicated we got alias map
```
struct_data -> struct_data.search_params
```
nestedFieldToAlias as
```
struct_data.search_params -> (struct_data.search_params as
extract_search_params)
```
attrToAliases as
```
struct_data -> [struct_data.search_params as extract_search_params]
```
but when call replaceToAliases -> getNewProjectList, when replace
`struct_data.search_params.col1` , `struct_data.search_params.col2` it can't
match `struct_data.search_params` since we when call `map.contains` and they
don't have same construct method.
we can get result plan as
```
Project [_extract_search_params#12 AS value#7,
struct_data#6.search_params.col1 AS col1#8, struct_data#6.search_params.col2 AS
col2#9]
+- Repartition 100, true
+- Project [struct_data#6.search_params AS _extract_search_params#12]
+- LocalRelation <empty>, [struct_data#6]
```
Then in the second loop we got nestedFieldReferences as below,
```
struct_data.search_params.col1
struct_data.search_params.col2
```
but we gen aliasMap as
```
struct_data -> [struct_data.search_params.col1,
struct_data.search_params.col2]
```
nestedFieldToAlias as
```
struct_data.search_params.col1 -> extract_col1
struct_data.search_params.col2 -> extract_col2
```
attrToAliases as
```
struct_data -> [extract_col1, extract_col2]
```
since we got the reference as `struct_data`,
but in current's project 's child's output, we don't contains `struct_data`
col since we have replace it to `data.a`
then the final plan will be as below
```
!Project [_extract_search_params#16 AS value#7, _extract_col1#17 AS col1#8,
_extract_col2#17 AS col2#9]
+- Repartition 100, true
+- Project [struct_data#6.search_params AS _extract_search_params#16]
+- LocalRelation <empty>, [struct_data#6]
```
when execute will throw exception like
```
Couldn't find extract_col1#14in [_extract_search_params#16]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]