attilapiros opened a new pull request #31397: URL: https://github.com/apache/spark/pull/31397
### What changes were proposed in this pull request? Fixing the flaky `handle large number of containers and tasks (SPARK-18750)` by avoiding to use `DNSToSwitchMapping` as in some situation DNS lookup could be extremely slow. ### Why are the changes needed? After https://github.com/apache/spark/pull/31363 was merged the flaky `handle large number of containers and tasks (SPARK-18750)` test failed again in some other PRs but now we have the exact place where the test is stuck. It is in the DNS lookup: ``` [info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (30 seconds, 4 milliseconds) [info] Failed with an exception or a timeout at thread join: [info] [info] java.lang.RuntimeException: Timeout at waiting for thread to stop (its stack trace is added to the exception) [info] at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) [info] at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) [info] at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) [info] at java.net.InetAddress.getAllByName0(InetAddress.java:1277) [info] at java.net.InetAddress.getAllByName(InetAddress.java:1193) [info] at java.net.InetAddress.getAllByName(InetAddress.java:1127) [info] at java.net.InetAddress.getByName(InetAddress.java:1077) [info] at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:568) [info] at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:585) [info] at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109) [info] at org.apache.spark.deploy.yarn.SparkRackResolver.coreResolve(SparkRackResolver.scala:75) [info] at org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:66) [info] at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.$anonfun$localityOfRequestedContainers$3(LocalityPreferredContainerPlacementStrategy.scala:142) [info] at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy$$Lambda$658/1080992036.apply$mcVI$sp(Unknown Source) [info] at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) [info] at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.localityOfRequestedContainers(LocalityPreferredContainerPlacementStrategy.scala:138) [info] at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.org$apache$spark$deploy$yarn$LocalityPlacementStrategySuite$$runTest(LocalityPlacementStrategySuite.scala:94) [info] at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite$$anon$1.run(LocalityPlacementStrategySuite.scala:40) [info] at java.lang.Thread.run(Thread.java:748) (LocalityPlacementStrategySuite.scala:61) ... ``` This could be because of the DNS servers used by those build machines are not configured to handle IPv6 queries and the client has to wait for the IPv6 query to timeout before falling back to IPv4. This even make the tests more consistent. As when a single host was given to lookup via `resolve(hostName: String)` it gave a different answer from calling `resolve(hostNames: Seq[String])` with a `Seq` containing that single host. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
