GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21699
[SPARK-24722][SQL] pivot() with Column type argument
## What changes were proposed in this pull request?
In the PR, I propose column-based API for the `pivot()` function. It allows
using of nested columns as the pivot column. Also this makes it consistent with
how groupBy() works.
## How was this patch tested?
I added new tests to `DataFramePivotSuite` and updated PySpark examples for
the `pivot()` function.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 pivot-column
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21699.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21699
----
commit 889e9223510c821f359ee3ce5bec6ce2f746a027
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-02T17:09:19Z
Adding pivot() which takes Column as its argument
commit f736ea2bca27fee37281bcadb333e7b6bbcd6124
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-02T17:21:07Z
Tests for new function
commit 5e6822650f4781c343d477589bf252c37b8453c4
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-02T18:24:18Z
the since tag is updated
commit c82c3979065aba48536a743ebf3384f3c95b570c
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-02T19:05:48Z
Test for nested columns
commit 7d0d2261cef4c66226cd59635603391faabf0046
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-02T20:38:02Z
Python test for nested columns
commit 0fdd11ff26b4f4ca3b79bdd116aaf1c558643698
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-02T21:02:58Z
Adding ticket number to test's title
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]