Re: Spark job does not perform well when some RDD in memory and some on Disk

2016-02-04 Thread Prabhu Joseph
If spark.locality.wait is 0, then there are two performance issues: 1. Task Scheduler won't wait to schedule the tasks as DATA_LOCAL, will launch it immediately on some node even if it is less local. The probability of tasks running as less local will be higher and affect the overall Job Perfor

Re: Spark job does not perform well when some RDD in memory and some on Disk

2016-02-04 Thread Alonso Isidoro Roman
"But learned that it is better not to reduce it to 0." could you explain a bit more this sentence? thanks Alonso Isidoro Roman. Mis citas preferidas (de hoy) : "Si depurar es el proceso de quitar los errores de software, entonces programar debe ser el proceso de introducirlos..." - Edsger Dij

Re: Spark job does not perform well when some RDD in memory and some on Disk

2016-02-04 Thread Prabhu Joseph
Okay, the reason for the task delay within executor when some RDD in memory and some in Hadoop i.e, Multiple Locality Levels NODE_LOCAL and ANY, in this case Scheduler waits for *spark.locality.wait *3 seconds default. During this period, scheduler waits to launch a data-local task before giving up