If spark.locality.wait is 0, then there are two performance issues:
1. Task Scheduler won't wait to schedule the tasks as DATA_LOCAL, will
launch it immediately on some node even if it is less local. The
probability of tasks running as less local will be higher
and affect the overall Job Perfor
"But learned that it is better not to reduce it to 0."
could you explain a bit more this sentence?
thanks
Alonso Isidoro Roman.
Mis citas preferidas (de hoy) :
"Si depurar es el proceso de quitar los errores de software, entonces
programar debe ser el proceso de introducirlos..."
- Edsger Dij
Okay, the reason for the task delay within executor when some RDD in memory
and some in Hadoop i.e, Multiple Locality Levels NODE_LOCAL and ANY, in
this case Scheduler waits
for *spark.locality.wait *3 seconds default. During this period, scheduler
waits to launch a data-local task before giving up