Parition RDD by key to create DataFrames

Mohamed Nadjib MAMI Tue, 15 Mar 2016 10:34:43 -0700

Hi,

I have a pair RDD of the form: (mykey, (value1, value2))

How can I create a DataFrame having the schema [V1 String, V2 String] tostore [value1, value2] and save it into a Parquet table named "mykey"?

/createDataFrame()/ method takes an RDD and a schema (StructType) inparameters. The schema is known up front ([V1 String, V2 String]), butgetting an RDD by partitioning the original RDD based on the key is whatI can't get my head around so far.

Similar questions have been around (likehttp://stackoverflow.com/questions/25046199/apache-spark-splitting-pair-rdd-into-multiple-rdds-by-key-to-save-values)but they do not use DataFrames.


Thanks in advance!

Parition RDD by key to create DataFrames

Reply via email to