gatorsmile commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task location URL: https://github.com/apache/spark/pull/26633#issuecomment-558888608 @tgravescs Changing the default value of `spark.locality.wait` is a very important topic. We need to collect more feedbacks from the community, instead of making the decision among us. 3 second is just a magic number. Anybody knows the history? Why we chose 3 seconds instead of 1 second or 0.5 second? Also, the perf is related to the environment and the workload patterns [e.g., the cost of shuffling data, the current workload sizing, and the cluster's resource availability]. When running a short or streaming query in an idle local cluster, setting it to zero might not be a bad idea. When running it in a cloud environment, I do not know which value is the best. This really needs to do more performance testing using common workloads to find the next magic number. Normally, we should be really careful when introducing any performance related change. The decisions we made will impact a lot of end users. Any **larger than 5%** perf regression for a single query [from the perf benchmark] is not acceptable when I worked for a commercial database.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
