Hi all,

I was currently playing around with spark-env around SPARK_LOCAL_DIRS in order 
to add additional shuffle storage.

But since I did this, I am getting too many open files error if total executor 
cores is high. I am also getting low parallelism, by monitoring the running 
tasks on some big job, most tasks run on the driver host, and very limited in 
other nodes, while using ANY locality.

Generally speaking, Could I be doing anything wrong regarding this setting?

I am setting on each node, local different phyisical hard drives to store 
shuffle information. Returning this configuration to a single folder storage on 
each node, everything runs normally

Thanks,
Saif

Reply via email to