Hi Manjunath,
why not creating 10 DataFrames loading the different tables in the first
place?
Enrico
Am 27.02.20 um 14:53 schrieb Manjunath Shetty H:
Hi Vinodh,
ThanksĀ for the quick response. Didn't got what you meant exactly, any
reference or snippetĀ will be helpful.
To explain the problem more,
* I have 10 partitions , each partition loads the data from
different table and different SQL shard.
* Most of the partitions will have different schema.
* Before persisting the data i want to do some column level
manipulation using data frame.
So thats why i want to create 10 (based on partitions ) dataframes
that maps to 10 different table/shard from a RDD.
Regards
Manjunath
------------------------------------------------------------------------
*From:* Charles vinodh <mig.flan...@gmail.com>
*Sent:* Thursday, February 27, 2020 7:04 PM
*To:* manjunathshe...@live.com <manjunathshe...@live.com>
*Cc:* user <user@spark.apache.org>
*Subject:* Re: Convert each partition of RDD to Dataframe
Just split the single rdd into multiple individual rdds using a filter
operation and then convert each individual rdds to it's respective
dataframe..
On Thu, Feb 27, 2020, 7:29 AM Manjunath Shetty H
<manjunathshe...@live.com <mailto:manjunathshe...@live.com>> wrote:
Hello All,
In spark i am creating the custom partitions with Custom RDD, each
partition will have different schema. Now in the transformation
step we need to get the schema and run some Dataframe SQL queries
per partition, because each partition data has different schema.
How to get the Dataframe's per partition of a RDD?.
As of now i am doing|foreachPartition|on RDD and
converting|Iterable<Row>|to|List|and converting that to Dataframe.
But the problem is converting|Iterable|to|List|will bring all the
data to memory and it might crash the process.
Is there any known way to do this ? or is there any way to handle
Custom Partitions in|Dataframes|instead of using|RDD|?
I am using Spark version|1.6.2|.
Any pointers would be helpful. Thanks in advance