Hi,

I have a Spark job with many transformations (sequence of maps and
mapPartitions) and only one action in the end (DataFrame.write()). The
transformations return an RDD<Row>, so I need to create a DataFrame.
To be able to use sqlContext.createDataFrame() I need to know the schema of
the Row but for that, I need to trigger the transformations (the schema can
be different based on the user arguments, I can't hardcode it.). This
results in 2 actions  which seems like a waste of resources. Is there a way
to solve this with only one action?

Regards,
Zsolt

Reply via email to