Hi, I have a Spark job with many transformations (sequence of maps and mapPartitions) and only one action in the end (DataFrame.write()). The transformations return an RDD<Row>, so I need to create a DataFrame. To be able to use sqlContext.createDataFrame() I need to know the schema of the Row but for that, I need to trigger the transformations (the schema can be different based on the user arguments, I can't hardcode it.). This results in 2 actions which seems like a waste of resources. Is there a way to solve this with only one action?
Regards, Zsolt