Hello All,
In spark i am creating the custom partitions with Custom RDD, each partition will have different schema. Now in the transformation step we need to get the schema and run some Dataframe SQL queries per partition, because each partition data has different schema. How to get the Dataframe's per partition of a RDD?. As of now i am doing foreachPartition on RDD and converting Iterable<Row> to List and converting that to Dataframe. But the problem is converting Iterable to List will bring all the data to memory and it might crash the process. Is there any known way to do this ? or is there any way to handle Custom Partitions in Dataframes instead of using RDD ? I am using Spark version 1.6.2. Any pointers would be helpful. Thanks in advance