Allowing parallelism in spark local mode

2016-02-12 Thread yael aharon
Hello,
I have an application that receives requests over HTTP and uses spark in
local mode to process the requests. Each request is running in its own
thread.
It seems that spark is queueing the jobs, processing them one at a time.
When 2 requests arrive simultaneously, the processing time for each of them
is almost doubled.
I tried setting spark.default.parallelism, spark.executor.cores,
spark.driver.cores but that did not change the time in a meaningful way.

Am I missing something obvious?
thanks, Yael


Re: Allowing parallelism in spark local mode

2016-02-12 Thread Chris Fregly
sounds like the first job is occupying all resources.  you should limit the
resources that a single job can acquire.

fair scheduler is one way to do that.

a possibly simpler way is to configured spark.deploy.defaultCores or
spark.cores.max?

the defaults for these values - for the Spark default cluster resource
manager (aka Spark Standalone) - is infinite.  every job will try to
acquire every resource.

https://spark.apache.org/docs/latest/spark-standalone.html

here's an example config that i use for my reference data pipeline project:

https://github.com/fluxcapacitor/pipeline/blob/master/config/spark/spark-defaults.conf

i'm always playing with these values to simulate different conditions, but
that's the current snapshot that might be helpful.

also, don't forget about executor memory...


On Fri, Feb 12, 2016 at 1:40 PM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

> You’ll want to setup the FAIR scheduler as described here:
> https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
>
> From: yael aharon <yael.aharo...@gmail.com>
> Date: Friday, February 12, 2016 at 2:00 PM
> To: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Allowing parallelism in spark local mode
>
> Hello,
> I have an application that receives requests over HTTP and uses spark in
> local mode to process the requests. Each request is running in its own
> thread.
> It seems that spark is queueing the jobs, processing them one at a time.
> When 2 requests arrive simultaneously, the processing time for each of them
> is almost doubled.
> I tried setting spark.default.parallelism, spark.executor.cores,
> spark.driver.cores but that did not change the time in a meaningful way.
>
> Am I missing something obvious?
> thanks, Yael
>
>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com


Re: Allowing parallelism in spark local mode

2016-02-12 Thread Silvio Fiorito
You’ll want to setup the FAIR scheduler as described here: 
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

From: yael aharon <yael.aharo...@gmail.com<mailto:yael.aharo...@gmail.com>>
Date: Friday, February 12, 2016 at 2:00 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Allowing parallelism in spark local mode

Hello,
I have an application that receives requests over HTTP and uses spark in local 
mode to process the requests. Each request is running in its own thread.
It seems that spark is queueing the jobs, processing them one at a time. When 2 
requests arrive simultaneously, the processing time for each of them is almost 
doubled.
I tried setting spark.default.parallelism, spark.executor.cores, 
spark.driver.cores but that did not change the time in a meaningful way.

Am I missing something obvious?
thanks, Yael