Hey, My application requires the use of “classical” RDD methods like `distinct` and `subtract` on SchemaRDDs. What is the preferred way to turn the resulting regular RDD[org.apache.spark.sql.Row] back into SchemaRDDs? Calling toSchemaRDD, will not work as the Schema information seems lost already. To make matters even more complicated the contents of Row are Any typed.
So to turn make this work one has to map over the result RDD, call `asInstanceOf` on the Content and then put that back into case classes. Which seems like overkill to me. Is there a better way, that maybe employs some smart casting or reuse of Schema information? All the best, Jan