Hi, Any suggestion to the following issue ?? I have replication factor 3 in my HDFS. With 3 datanodes, i ran my experiments. Now i just added another node to it with no data in it. When i ran, SPARK launches non-local tasks in it and the time taken is more than what it took for 3 node cluster.
Here delayed scheduling fails i think because of the parameter spark.locality.wait.node which is by default 3 sec. It launches "ANY" level tasks in the added data node. *How to set the spark.locality.wait.node parameter in the env for interactive shell sc.* Thanks !