subject:"Save a spark RDD to disk"

Re: Save a spark RDD to disk

2016-11-09 Thread Michael Segel

Can you increase the number of partitions and also increase the number of 
executors?
(This should improve the parallelization but you may become disk i/o bound)

On Nov 8, 2016, at 4:08 PM, Elf Of Lothlorein 
> wrote:

Hi
I am trying to save a RDD to disk and I am using the saveAsNewAPIHadoopFile for 
that. I am seeing that it takes almost 20 mins for about 900 GB of data. Is 
there any parameter that I can tune to make this saving faster.
I am running about 45 executors with 5 cores each on 5 Spark worker nodes and 
using Spark on YARN for this..
Thanks for your help.
C

Re: Save a spark RDD to disk

2016-11-08 Thread Andrew Holway

Thats around 750MB/s which seems quite respectable even in this day and age!

How many and what kind of disks to you have attached to your nodes? What
are you expecting?

On Tue, Nov 8, 2016 at 11:08 PM, Elf Of Lothlorein 
wrote:

> Hi
> I am trying to save a RDD to disk and I am using the
> saveAsNewAPIHadoopFile for that. I am seeing that it takes almost 20 mins
> for about 900 GB of data. Is there any parameter that I can tune to make
> this saving faster.
> I am running about 45 executors with 5 cores each on 5 Spark worker nodes
> and using Spark on YARN for this..
> Thanks for your help.
> C
>

-- 
Otter Networks UG
http://otternetworks.de
Gotenstraße 17
10829 Berlin

Save a spark RDD to disk

2016-11-08 Thread Elf Of Lothlorein

Hi
I am trying to save a RDD to disk and I am using the saveAsNewAPIHadoopFile
for that. I am seeing that it takes almost 20 mins for about 900 GB of
data. Is there any parameter that I can tune to make this saving faster.
I am running about 45 executors with 5 cores each on 5 Spark worker nodes
and using Spark on YARN for this..
Thanks for your help.
C

Re: Save a spark RDD to disk

Re: Save a spark RDD to disk

Save a spark RDD to disk

3 matches

Site Navigation

Mail list logo

Footer information