GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21999
[WIP][SQL] Flattening nested structures
## What changes were proposed in this pull request?
In the PR, I propose new unary expression `StructFlatten` for flattening
nested structures. For example, a dataset with the schema:
```
root
|-- st: struct (nullable = false)
| |-- col1: long (nullable = false)
| |-- col2: struct (nullable = false)
| | |-- col3: long (nullable = false)
```
by applying `struct_flatten(st)` it will be transformed to:
```
root
|-- structflatten(st): struct (nullable = false)
| |-- col1: long (nullable = false)
| |-- col2_col3: long (nullable = false)
```
## How was this patch tested?
Added new tests to `CollectionExpressionsSuite` to check flattening of 2-3
nested structures and negative tests to be sure that `struct_flatten` doesn't
affect other types.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 struct_flatten
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21999.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21999
----
commit 5603918ae963f78aafb2d1f4f2bd9d566870495b
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-04T13:38:08Z
Initial implementation
commit 0be0d059b8bf571068226c515888a64093468cff
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-04T16:07:45Z
Making the depth and delimiter as parameters
commit 5666ec372a4b79f6161120584abc0c312b111bfb
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-04T18:04:23Z
Test for depth = 0
commit cd88a2125ba6932ba1fdceca1a24d57124a23afa
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-04T18:21:19Z
Test for depth = 1
commit b0da02d37ac6db38f63bac95dc295ac37fe4a692
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-04T18:30:18Z
Renaming st to struct
commit ec361791b83d71f29823157a2c2b49162ddb5901
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-04T19:24:37Z
Negative tests
commit ced63d7f093c168e2bc9457b6c08b87bfe6c0751
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-04T20:10:00Z
Register struct_flatten
commit 5b568c67951f6f620cd0d549fdbd0c25f819fe43
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-08-04T20:42:00Z
Merge remote-tracking branch 'origin/master' into struct_flatten
# Conflicts:
#
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]