workers

Matthew Farrellee Sat, 06 Sep 2014 05:48:07 -0700

On 09/04/2014 09:55 PM, Mozumder, Monir wrote:

I have this 2-node cluster setup, where each node has 4-cores.


             MASTER

     (Worker-on-master)              (Worker-on-node1)

(slaves(master,node1))

SPARK_WORKER_INSTANCES=1

I am trying to understand Spark's parallelize behavior. The sparkPi
example has this code:

     val slices = 8

     val n = 100000 * slices

     val count = spark.parallelize(1 to n, slices).map { i =>

       val x = random * 2 - 1

       val y = random * 2 - 1

       if (x*x + y*y < 1) 1 else 0

     }.reduce(_ + _)

As per documentation: "Spark will run one task for each slice of the
cluster. Typically you want 2-4 slices for each CPU in your cluster". I
set slices to be 8 which means the workingset will be divided among 8
tasks on the cluster, in turn each worker node gets 4 tasks (1:1 per core)

Questions:

    i)  Where can I see task level details? Inside executors I dont see
task breakdown so I can see the effect of slices on the UI.

under http://localhost:4040/stages/ you can drill into individual stagesto see task details

    ii) How to  programmatically find the working set size for the map
function above? I assume it is n/slices (100000 above)


it'll be roughly n/slices. you can mapPqrtitions() and check their length

    iii) Are the multiple tasks run by an executor run sequentially or
paralelly in multiple threads?

parallel. have a look athttps://spark.apache.org/docs/latest/cluster-overview.html

    iv) Reasoning behind 2-4 slices per CPU.

typically things like "2-4 slices per CPU" are general rules of thumbbecause tasks are more io bound than not. depending on your workloadthis might change. it's probably one of the last things you'll want tooptimize, first being the transformation ordering in your dag.

    v) I assume ideally we should tune SPARK_WORKER_INSTANCES to
correspond to number of

Bests,

-Monir



best,


matt

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How spark parallelize maps Slices to tasks/executors/workers

Reply via email to