Re: How to change Dataframe schema

2020-05-16 Thread Adi Polak
Hi Manjunath,
Can you share the data example?
>From the information shared above, it seems that you will need to apply
mapping with custom logic on the rows in your RDD to be consistent before
you can apply the schema.

I recommend reading about the mapping functionality here:
https://data-flair.training/blogs/apache-spark-map-vs-flatmap/

I hope it helps!

-Adi

On Sat, 16 May 2020 at 17:50, Manjunath Shetty H 
wrote:

> Hi,
>
> I have a dataframe with some columns and data that is fetched from JDBC,
> as i have to maintain the schema consistent in the ORC file i have to apply
> different schema for that dataframe. Column names will be same, but Data or
> Schema may contain some extra columns.
>
> Is there any way i can apply the schema on top the existing Dataframe ?.
> Schema may be just doing the columns reordering in the most of the cases.
>
> i have tried this "
>
> DataFrame dfNew = hc.createDataFrame(df.rdd(), ((StructType) 
> DataType.fromJson(schema)));
>
> "
>
> But this will map the columns based on index and it will fail in case of
> columns reordering.
>
> Any pointers will be helpful.
>
> Thanks and Regards
> Manjunath Shetty
>


How to change Dataframe schema

2020-05-16 Thread Manjunath Shetty H
Hi,

I have a dataframe with some columns and data that is fetched from JDBC, as i 
have to maintain the schema consistent in the ORC file i have to apply 
different schema for that dataframe. Column names will be same, but Data or 
Schema may contain some extra columns.

Is there any way i can apply the schema on top the existing Dataframe ?. Schema 
may be just doing the columns reordering in the most of the cases.

i have tried this "

DataFrame dfNew = hc.createDataFrame(df.rdd(), ((StructType) 
DataType.fromJson(schema)));

"

But this will map the columns based on index and it will fail in case of 
columns reordering.

Any pointers will be helpful.

Thanks and Regards
Manjunath Shetty