[GitHub] [spark] attilapiros opened a new pull request #31397: [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test

GitBox Fri, 29 Jan 2021 04:04:52 -0800


attilapiros opened a new pull request #31397:
URL: https://github.com/apache/spark/pull/31397



   ### What changes were proposed in this pull request?
   
   Fixing the flaky `handle large number of containers and tasks (SPARK-18750)` 
by avoiding to use `DNSToSwitchMapping` as in some situation DNS lookup could 
be extremely slow. 
   
   ### Why are the changes needed?
   
   After https://github.com/apache/spark/pull/31363 was merged the flaky 
`handle large number of containers and tasks (SPARK-18750)` test failed again 
in some other PRs but now we have the exact place where the test is stuck. 
   
   It is in the DNS lookup: 
   
   ```
   [info] - handle large number of containers and tasks (SPARK-18750) *** 
FAILED *** (30 seconds, 4 milliseconds)
   [info]   Failed with an exception or a timeout at thread join:
   [info]   
   [info]   java.lang.RuntimeException: Timeout at waiting for thread to stop 
(its stack trace is added to the exception)
   [info]       at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
   [info]       at 
java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
   [info]       at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
   [info]       at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
   [info]       at java.net.InetAddress.getAllByName(InetAddress.java:1193)
   [info]       at java.net.InetAddress.getAllByName(InetAddress.java:1127)
   [info]       at java.net.InetAddress.getByName(InetAddress.java:1077)
   [info]       at 
org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:568)
   [info]       at 
org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:585)
   [info]       at 
org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109)
   [info]       at 
org.apache.spark.deploy.yarn.SparkRackResolver.coreResolve(SparkRackResolver.scala:75)
   [info]       at 
org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:66)
   [info]       at 
org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.$anonfun$localityOfRequestedContainers$3(LocalityPreferredContainerPlacementStrategy.scala:142)
   [info]       at 
org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy$$Lambda$658/1080992036.apply$mcVI$sp(Unknown
 Source)
   [info]       at 
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
   [info]       at 
org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.localityOfRequestedContainers(LocalityPreferredContainerPlacementStrategy.scala:138)
   [info]       at 
org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.org$apache$spark$deploy$yarn$LocalityPlacementStrategySuite$$runTest(LocalityPlacementStrategySuite.scala:94)
   [info]       at 
org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite$$anon$1.run(LocalityPlacementStrategySuite.scala:40)
   [info]       at java.lang.Thread.run(Thread.java:748) 
(LocalityPlacementStrategySuite.scala:61)
   ...
   ```
   
   This could be because of the DNS servers used by those build machines are 
not configured to handle IPv6 queries and the client has to wait for the IPv6 
query to timeout before falling back to IPv4.
   
   This even make the tests more consistent. As when a single host was given to 
lookup via `resolve(hostName: String)` it gave a different answer from calling 
`resolve(hostNames: Seq[String])` with a `Seq` containing that single host. 
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] attilapiros opened a new pull request #31397: [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test

Reply via email to