Its actually a bit tougher as you’ll first need all the years. Also not sure 
how you would reprsent your “columns” given they are dynamic based on the input 
data.

Depending on your downstream processing, I’d probably try to emulate it with a 
hash map with years as keys instead of the columns.

There is probably a nicer solution using the data frames API but I’m not 
familiar with it.

If you actually need vectors I think this article I saw recently on the data 
bricks blog will highlight some options (look for gather encoder)
https://databricks.com/blog/2015/10/20/audience-modeling-with-spark-ml-pipelines.html

-adrian

From: Deng Ching-Mallete
Date: Friday, October 30, 2015 at 4:35 AM
To: Ascot Moss
Cc: User
Subject: Re: Pivot Data in Spark and Scala

Hi,

You could transform it into a pair RDD then use the combineByKey function.

HTH,
Deng

On Thu, Oct 29, 2015 at 7:29 PM, Ascot Moss 
<ascot.m...@gmail.com<mailto:ascot.m...@gmail.com>> wrote:
Hi,

I have data as follows:

A, 2015, 4
A, 2014, 12
A, 2013, 1
B, 2015, 24
B, 2013 4


I need to convert the data to a new format:
A ,    4,    12,    1
B,   24,        ,    4

Any idea how to make it in Spark Scala?

Thanks


Reply via email to