This may seem contrived but, suppose I wanted to create a collection of "single column" RDD's that contain calculated values, so I want to cache these to avoid re-calc.

i.e.

rdd1 = {Names]
rdd2 = {Star Sign}
rdd3 = {Age}

Then I want to create a new virtual RDD that is a collection of these RDD's to create a "multi-column" RDD

rddA = {Names, Age}
rddB = {Names, Star Sign}

I saw that rdd.union() merges rows, but anything that can combine columns?

Cheers
- Ian

Reply via email to