daniel-goldstein commented on pull request #24559:
URL: https://github.com/apache/spark/pull/24559#issuecomment-1030687911


   Not sure if this is the best place for this, but we've encountered the 
binding failure multiple times in our own containerized environments and found 
this issue to be in containers that ended up getting entirely numeric 
hostnames. `getaddrinfo` (which I assume Java's `getLocalHost` uses), may 
sometimes misinterpret fully numeric hostnames [as IP 
addresses](https://bugzilla.redhat.com/show_bug.cgi?id=1059122). I'm assuming 
based off [this 
code](https://github.com/apache/spark/blob/c7c51bcab5cb067d36bccf789e0e4ad7f37ffb7c/core/src/main/scala/org/apache/spark/util/Utils.scala#L1013)
 that the `SPARK_LOCAL_IP` workaround circumvents this issue. Here's an easy 
replication of this error (though a bit contrived):
   
   ```
   docker run --hostname 2886795934 -e SPARK_MODE=master bitnami/spark:3.2.1
   ```
   
   In our case, setting a numeric hostname was our fault, and docker 
[explicitly 
rejects](https://github.com/moby/moby/blob/12f1b3ce43fe4aea5a41750bcc20f2a7dd67dbfc/pkg/stringid/stringid.go#L47)
 numeric hostnames, it seems for the same reason. I'm not very familiar with GA 
and from a quick browsing am unsure if this could ever happen, but thought it 
might be good to keep in mind if this continues to be a sporadic failure and 
whether or not Spark should be aware of this failure mode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to