Re: Spark DataFrame UNPIVOT feature

2018-08-22 Thread Maciej Szymkiewicz
Given popularity of related SO questions:


   - https://stackoverflow.com/q/41670103/1560062
   - https://stackoverflow.com/q/42465568/1560062
   - https://stackoverflow.com/q/41670103/1560062

it is probably more "nobody thought about asking",  than "it is not used
often".

On Wed, 22 Aug 2018 at 00:07, Reynold Xin  wrote:

> Probably just because it is not used that often and nobody has submitted a
> patch for it. I've used pivot probably on average once a week (primarily in
> spreadsheets), but I've never used unpivot ...
>
>
> On Tue, Aug 21, 2018 at 3:06 PM Ivan Gozali  wrote:
>
>> Hi there,
>>
>> I was looking into why the UNPIVOT feature isn't implemented, given that
>> Spark already has PIVOT implemented natively in the DataFrame/Dataset API.
>>
>> Came across this JIRA  
>> which
>> talks about implementing PIVOT in Spark 1.6, but no mention whatsoever
>> regarding UNPIVOT, even though the JIRA curiously references a blog post
>> that talks about both PIVOT and UNPIVOT :)
>>
>> Is this because UNPIVOT is just simply generating multiple slim tables by
>> selecting each column, and making a union out of all of them?
>>
>> Thank you!
>>
>> --
>> Regards,
>>
>>
>> Ivan Gozali
>> Lecida
>> Email: i...@lecida.com
>>
>


Re: Spark DataFrame UNPIVOT feature

2018-08-22 Thread Mike Hynes
Hi Reynold/Ivan,

People familiar with pandas and R dataframes will likely have used the
dataframe "melt" idiom, which is the functionality I believe you are
referring to:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html

I have had to write this function myself in my own work in Spark SQL, as it
is a common step in data wrangling when you do not control the structure of
the input dataframes you are working with in your pipelines.

I would hence second Ivan that adding it as a native dataframe method would
no doubt be helpful (and for what it's worth, so would other concepts from
the pandas API, such as named indexing & multilevel indexing).

Cheers,
Mike




On Tue, Aug 21, 2018, 5:07 PM Reynold Xin,  wrote:

> Probably just because it is not used that often and nobody has submitted a
> patch for it. I've used pivot probably on average once a week (primarily in
> spreadsheets), but I've never used unpivot ...
>
>
> On Tue, Aug 21, 2018 at 3:06 PM Ivan Gozali  wrote:
>
>> Hi there,
>>
>> I was looking into why the UNPIVOT feature isn't implemented, given that
>> Spark already has PIVOT implemented natively in the DataFrame/Dataset API.
>>
>> Came across this JIRA  
>> which
>> talks about implementing PIVOT in Spark 1.6, but no mention whatsoever
>> regarding UNPIVOT, even though the JIRA curiously references a blog post
>> that talks about both PIVOT and UNPIVOT :)
>>
>> Is this because UNPIVOT is just simply generating multiple slim tables by
>> selecting each column, and making a union out of all of them?
>>
>> Thank you!
>>
>> --
>> Regards,
>>
>>
>> Ivan Gozali
>> Lecida
>> Email: i...@lecida.com
>>
>


Re: Spark DataFrame UNPIVOT feature

2018-08-21 Thread Reynold Xin
Probably just because it is not used that often and nobody has submitted a
patch for it. I've used pivot probably on average once a week (primarily in
spreadsheets), but I've never used unpivot ...


On Tue, Aug 21, 2018 at 3:06 PM Ivan Gozali  wrote:

> Hi there,
>
> I was looking into why the UNPIVOT feature isn't implemented, given that
> Spark already has PIVOT implemented natively in the DataFrame/Dataset API.
>
> Came across this JIRA  which
> talks about implementing PIVOT in Spark 1.6, but no mention whatsoever
> regarding UNPIVOT, even though the JIRA curiously references a blog post
> that talks about both PIVOT and UNPIVOT :)
>
> Is this because UNPIVOT is just simply generating multiple slim tables by
> selecting each column, and making a union out of all of them?
>
> Thank you!
>
> --
> Regards,
>
>
> Ivan Gozali
> Lecida
> Email: i...@lecida.com
>