Re: Behind the scene of RDD to DataFrame

2016-02-21 Thread Weiwei Zhang
Thanks a lot!

Best Regards,
Weiwei

On Sat, Feb 20, 2016 at 11:53 PM, Hemant Bhanawat 
wrote:

> toDF internally calls sqlcontext.createDataFrame which transforms the RDD
> to RDD[InternalRow]. This RDD[InternalRow] is then mapped to a dataframe.
>
> Type conversions (from scala types to catalyst types) are involved but no
> shuffling.
>
> Hemant Bhanawat 
> www.snappydata.io
>
> On Sun, Feb 21, 2016 at 11:48 AM, Weiwei Zhang 
> wrote:
>
>> Hi there,
>>
>> Could someone explain to me what is behind the scene of rdd.toDF()? More
>> importantly, will this step involve a lot of shuffles and cause the surge
>> of the size of intermediate files? Thank you.
>>
>> Best Regards,
>> Vivian
>>
>
>


Re: Behind the scene of RDD to DataFrame

2016-02-20 Thread Hemant Bhanawat
toDF internally calls sqlcontext.createDataFrame which transforms the RDD
to RDD[InternalRow]. This RDD[InternalRow] is then mapped to a dataframe.

Type conversions (from scala types to catalyst types) are involved but no
shuffling.

Hemant Bhanawat 
www.snappydata.io

On Sun, Feb 21, 2016 at 11:48 AM, Weiwei Zhang 
wrote:

> Hi there,
>
> Could someone explain to me what is behind the scene of rdd.toDF()? More
> importantly, will this step involve a lot of shuffles and cause the surge
> of the size of intermediate files? Thank you.
>
> Best Regards,
> Vivian
>


Behind the scene of RDD to DataFrame

2016-02-20 Thread Weiwei Zhang
Hi there,

Could someone explain to me what is behind the scene of rdd.toDF()? More
importantly, will this step involve a lot of shuffles and cause the surge
of the size of intermediate files? Thank you.

Best Regards,
Vivian