subject:"Re\: Queue independent jobs"

Re: Queue independent jobs

2015-01-09 Thread Anders Arpteg

Awesome, it actually seems to work. Amazing how simple it can be
sometimes...

Thanks Sean!

On Fri, Jan 9, 2015 at 12:42 PM, Sean Owen  wrote:

> You can parallelize on the driver side. The way to do it is almost
> exactly what you have here, where you're iterating over a local Scala
> collection of dates and invoking a Spark operation for each. Simply
> write "dateList.par.map(...)" to make the local map proceed in
> parallel. It should invoke the Spark jobs simultaneously.
>
> On Fri, Jan 9, 2015 at 10:46 AM, Anders Arpteg  wrote:
> > Hey,
> >
> > Lets say we have multiple independent jobs that each transform some data
> and
> > store in distinct hdfs locations, is there a nice way to run them in
> > parallel? See the following pseudo code snippet:
> >
> > dateList.map(date =>
> > sc.hdfsFile(date).map(transform).saveAsHadoopFile(date))
> >
> > It's unfortunate if they run in sequence, since all the executors are not
> > used efficiently. What's the best way to parallelize execution of these
> > jobs?
> >
> > Thanks,
> > Anders
>

Re: Queue independent jobs

2015-01-09 Thread Sean Owen

You can parallelize on the driver side. The way to do it is almost
exactly what you have here, where you're iterating over a local Scala
collection of dates and invoking a Spark operation for each. Simply
write "dateList.par.map(...)" to make the local map proceed in
parallel. It should invoke the Spark jobs simultaneously.

On Fri, Jan 9, 2015 at 10:46 AM, Anders Arpteg  wrote:
> Hey,
>
> Lets say we have multiple independent jobs that each transform some data and
> store in distinct hdfs locations, is there a nice way to run them in
> parallel? See the following pseudo code snippet:
>
> dateList.map(date =>
> sc.hdfsFile(date).map(transform).saveAsHadoopFile(date))
>
> It's unfortunate if they run in sequence, since all the executors are not
> used efficiently. What's the best way to parallelize execution of these
> jobs?
>
> Thanks,
> Anders

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Queue independent jobs

Re: Queue independent jobs

2 matches

Site Navigation

Mail list logo

Footer information