subject:"Union of multiple data frames"

Re: Union of multiple data frames

2018-04-06 Thread Alessandro Solimando

Hello Cesar,
can you add some details like: number of columns, avg number of rows in the
DFs, time spent to compute the plan with all the unions, and the time
needed to perform the action?

Thanks,
Alessandro

On 5 April 2018 at 23:22, Cesar <ces...@gmail.com> wrote:

> Thanks for your answers.
>
> The suggested method works when the number of Data Frames is small.
>
> However, I am trying to union >30 Data Frames, and the time to create the
> plan is taking longer than the execution, which should not be the case.
>
> Thanks!
> --
> Cesar
>
> On Thu, Apr 5, 2018 at 1:29 PM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>>
>> Hi Ceasar
>>
>> I have used Brandson approach in the past with out any problem
>>
>> Andy
>> From: Brandon Geise <brandonge...@gmail.com>
>> Date: Thursday, April 5, 2018 at 11:23 AM
>> To: Cesar <ces...@gmail.com>, "user @spark" <user@spark.apache.org>
>> Subject: Re: Union of multiple data frames
>>
>> Maybe something like
>>
>>
>>
>> var finalDF = spark.sqlContext.emptyDataFrame
>>
>> for (df <- dfs){
>>
>> finalDF = finalDF.union(df)
>>
>> }
>>
>>
>>
>>
>>
>> Where dfs is a Seq of dataframes.
>>
>>
>>
>> *From: *Cesar <ces...@gmail.com>
>> *Date: *Thursday, April 5, 2018 at 2:17 PM
>> *To: *user <user@spark.apache.org>
>> *Subject: *Union of multiple data frames
>>
>>
>>
>>
>>
>> The following code works for small n, but not for large n (>20):
>>
>>
>>
>> val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)
>>
>> dfUnion.show()
>>
>>
>>
>> By not working, I mean that Spark takes a lot of time to create the
>> execution plan.
>>
>>
>>
>> *Is there a more optimal way to perform a union of multiple data frames?*
>>
>>
>>
>>
>> thanks
>>
>> --
>>
>> Cesar Flores
>>
>>
>
>
> --
> Cesar Flores
>

Re: Union of multiple data frames

2018-04-05 Thread Cesar

Thanks for your answers.

The suggested method works when the number of Data Frames is small.

However, I am trying to union >30 Data Frames, and the time to create the
plan is taking longer than the execution, which should not be the case.

Thanks!
--
Cesar

On Thu, Apr 5, 2018 at 1:29 PM, Andy Davidson <a...@santacruzintegration.com
> wrote:

>
> Hi Ceasar
>
> I have used Brandson approach in the past with out any problem
>
> Andy
> From: Brandon Geise <brandonge...@gmail.com>
> Date: Thursday, April 5, 2018 at 11:23 AM
> To: Cesar <ces...@gmail.com>, "user @spark" <user@spark.apache.org>
> Subject: Re: Union of multiple data frames
>
> Maybe something like
>
>
>
> var finalDF = spark.sqlContext.emptyDataFrame
>
> for (df <- dfs){
>
> finalDF = finalDF.union(df)
>
> }
>
>
>
>
>
> Where dfs is a Seq of dataframes.
>
>
>
> *From: *Cesar <ces...@gmail.com>
> *Date: *Thursday, April 5, 2018 at 2:17 PM
> *To: *user <user@spark.apache.org>
> *Subject: *Union of multiple data frames
>
>
>
>
>
> The following code works for small n, but not for large n (>20):
>
>
>
> val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)
>
> dfUnion.show()
>
>
>
> By not working, I mean that Spark takes a lot of time to create the
> execution plan.
>
>
>
> *Is there a more optimal way to perform a union of multiple data frames?*
>
>
>
>
> thanks
>
> --
>
> Cesar Flores
>
>


-- 
Cesar Flores

Re: Union of multiple data frames

2018-04-05 Thread Andy Davidson


Hi Ceasar

I have used Brandson approach in the past with out any problem

Andy
From:  Brandon Geise <brandonge...@gmail.com>
Date:  Thursday, April 5, 2018 at 11:23 AM
To:  Cesar <ces...@gmail.com>, "user @spark" <user@spark.apache.org>
Subject:  Re: Union of multiple data frames

> Maybe something like
>  
> var finalDF = spark.sqlContext.emptyDataFrame
> for (df <- dfs){
> finalDF = finalDF.union(df)
> }
>  
>  
> Where dfs is a Seq of dataframes.
>  
> 
> From: Cesar <ces...@gmail.com>
> Date: Thursday, April 5, 2018 at 2:17 PM
> To: user <user@spark.apache.org>
> Subject: Union of multiple data frames
> 
>  
> 
>  
> 
> The following code works for small n, but not for large n (>20):
> 
>  
> 
> val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)
> 
> dfUnion.show()
> 
>  
> 
> By not working, I mean that Spark takes a lot of time to create the execution
> plan.
> 
>  
> 
> Is there a more optimal way to perform a union of multiple data frames?
> 
>  
> 
> thanks
> -- 
> 
> Cesar Flores

Re: Union of multiple data frames

2018-04-05 Thread Brandon Geise

Maybe something like

 

var finalDF = spark.sqlContext.emptyDataFrame

for (df <- dfs){

    finalDF = finalDF.union(df)

}

 

 

Where dfs is a Seq of dataframes.

 

From: Cesar <ces...@gmail.com>
Date: Thursday, April 5, 2018 at 2:17 PM
To: user <user@spark.apache.org>
Subject: Union of multiple data frames

 

 

The following code works for small n, but not for large n (>20):

 

val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)

dfUnion.show()

 

By not working, I mean that Spark takes a lot of time to create the execution 
plan.

 

Is there a more optimal way to perform a union of multiple data frames?


 

thanks

-- 

Cesar Flores

Union of multiple data frames

2018-04-05 Thread Cesar

The following code works for small n, but not for large n (>20):

val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)
dfUnion.show()

By not working, I mean that Spark takes a lot of time to create the
execution plan.

*Is there a more optimal way to perform a union of multiple data frames?*


thanks
-- 
Cesar Flores

Re: Union of multiple data frames

Re: Union of multiple data frames

Re: Union of multiple data frames

Re: Union of multiple data frames

Union of multiple data frames

5 matches

Site Navigation

Mail list logo

Footer information