Best way to turn an RDD back into a SchemaRDD

Jan-Paul Bultmann Wed, 09 Apr 2014 16:07:18 -0700

Hey,
My application requires the use of “classical” RDD methods like `distinct` and 
`subtract` on SchemaRDDs.
What is the preferred way to turn the resulting regular 
RDD[org.apache.spark.sql.Row] back into SchemaRDDs?
Calling toSchemaRDD, will not work as the Schema information seems lost already.
To make matters even more complicated the contents of Row are Any typed.


So to turn make this work one has to map over the result RDD, call 
`asInstanceOf` on the Content and then put that back into
case classes. Which seems like overkill to me.

Is there a better way, that maybe employs some smart casting or reuse of Schema 
information?

All the best,
Jan

Best way to turn an RDD back into a SchemaRDD

Reply via email to