I think you could create a DataFrame with schema (mykey, value1,
value2), then partition it by mykey when saving as parquet.
r2 = rdd.map((k, v) => Row(k, v._1, v._2))
df = sqlContext.createDataFrame(r2, schema)
df.write.partitionBy("myKey").parquet(path)
On Tue, Mar 15, 2016 at 10:33 AM, Moham
Hi,
I have a pair RDD of the form: (mykey, (value1, value2))
How can I create a DataFrame having the schema [V1 String, V2 String] to
store [value1, value2] and save it into a Parquet table named "mykey"?
/createDataFrame()/ method takes an RDD and a schema (StructType) in
parameters. The sc