jlpedrosa commented on a change in pull request #24702: [SPARK-27989] 
[Kubernetes] [Core] Added retries on the connection to the driver for k8s
URL: https://github.com/apache/spark/pull/24702#discussion_r292395924
 
 

 ##########
 File path: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
 ##########
 @@ -51,6 +51,8 @@ ENV SPARK_HOME /opt/spark
 
 WORKDIR /opt/spark/work-dir
 RUN chmod g+w /opt/spark/work-dir
+#Disable negative dns reslolution 
https://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html
+RUN sed -i -e 
's/networkaddress.cache.negative.ttl=10/networkaddress.cache.negative.ttl=0/g'  
/usr/lib/jvm/java-1.8-openjdk/jre/lib/security/java.security
 
 Review comment:
   Hi @srowen 
   Let me answer properly. AFAIK that DNS negative lookup is NOT cached, not at 
least in linux, positive caching yes, retries and timeouts, and it also depends 
on distribution, not all of them have it enabled (and there are different ways 
to achieve so).
   
   What I meant is due to the fact that is cached, then retries won't work 
because they'll happen in a tight loop (the for loop we were adding), unless we 
start adding sleeps in the scala code, which may or may not be enough depending 
on the resolution timeouts at OS level. The end result of the caching of the 
negative resolution at java code is that final users end up with a very strange 
behaviour where the timeouts of the OS level are only respected once. 
   
   What will happen if we don't disable caching, the first call to resolve, 
will wait the OS timeout (I think 5 seconds is the default), then the 
subsequent calls will just don't even try because the java layer won't try to 
invoke it. 
[here](https://github.com/bpupadhyaya/openjdk-8/blob/45af329463a45955ea2759b89cb0ebfe40570c3f/jdk/src/share/classes/java/net/InetAddress.java#L1251)
 and 
[here](https://github.com/bpupadhyaya/openjdk-8/blob/45af329463a45955ea2759b89cb0ebfe40570c3f/jdk/src/share/classes/java/net/InetAddress.java#L885)
   
   So what this means is, if we don't disable negative DNS caches, then retries 
won't be able to solve the issue it in a reliable manner (a complicated 
combination of retries, sleeps in scala, and OS timers would be able to solve 
it).
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to