Hi,
Imagine you have a structure like this:
val events = sqlContext.createDataFrame(
Seq(
("a", Map("a"->1,"b"->1)),
("b", Map("b"->1,"c"->1)),
("c", Map("a"->1,"c"->1))
)
).toDF("id","map")
What I want to achieve is have the map values as a separate columns.
Basically I want to achieve this:
+---+----+----+----+
| id| a| b| c|
+---+----+----+----+
| a| 1| 1|null|
| b|null| 1| 1|
| c| 1|null| 1|
+---+----+----+----+
I managed to create it with an explode-pivot combo, but for large dataset,
and a list of map keys around 1000 I imagine this will
be prohibitively expensive. I reckon there must be a much easier way to
achieve that, than:
val exploded =
events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value")
Any help would be appreciated. :)