Whenever export SPARK_LOCAL_IP="127.0.0.1" is added to spark-defaults applicationMaster will always be hosted on 127.0.0.1 (cluster or client mode) which is not the intended goal.
Le sam. 13 nov. 2021 à 04:57, Prabhu Joseph <[email protected]> a écrit : > Have seen this exception. This I think was fixed by export > SPARK_LOCAL_IP="127.0.0.1" before the spark submit command. > > Can you check below for more details > > scala - How to solve "Can't assign requested address: Service > 'sparkDriver' failed after 16 retries" when running spark code? - Stack > Overflow > <https://stackoverflow.com/questions/52133731/how-to-solve-cant-assign-requested-address-service-sparkdriver-failed-after> > > Unable to find Spark Driver after 16 retries · Issue #435 · dotnet/spark > (github.com) <https://github.com/dotnet/spark/issues/435> > > What is spark.local.ip ,spark.driver.host,spark.driver.bindAddress and > spark.driver.hostname? - Stack Overflow > <https://stackoverflow.com/questions/43692453/what-is-spark-local-ip-spark-driver-host-spark-driver-bindaddress-and-spark-dri> > > On Fri, Nov 12, 2021 at 9:52 PM marc nicole <[email protected]> wrote: > >> Here's the exception whenever the applicationMaster is one of the slaves >> (cluster mode) : (also increasing spark tries or yarn tries didn't help) >> >> 2021-11-12 17:20:37,301 ERROR yarn.ApplicationMaster: Uncaught exception: >> org.apache.spark.SparkException: Exception thrown in awaitResult: >> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:504) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898) >> at >> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) >> Caused by: java.net.BindException: Cannot assign requested address: bind: >> Service 'sparkDriver' failed after 16 retries (on a random free port)! >> Consider explicitly setting the appropriate binding address for the service >> 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the >> correct binding address. >> at sun.nio.ch.Net.bind0(Native Method) >> at sun.nio.ch.Net.bind(Net.java:438) >> at sun.nio.ch.Net.bind(Net.java:430) >> at >> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225) >> at >> io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134) >> at >> io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550) >> at >> io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334) >> at >> io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506) >> at >> io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491) >> at >> io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973) >> at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248) >> at >> io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356) >> at >> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) >> at >> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) >> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) >> at >> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) >> at >> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >> at >> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) >> at java.lang.Thread.run(Thread.java:748) >> 2021-11-12 17:20:37,308 INFO util.ShutdownHookManager: Shutdown hook called >> >> >> Le ven. 12 nov. 2021 à 16:43, Prabhu Joseph <[email protected]> >> a écrit : >> >>> Can you share the exception seen from the spark application logs. Thanks. >>> >>> On Fri, Nov 12, 2021, 7:24 PM marc nicole <[email protected]> wrote: >>> >>>> Hi Guys ! >>>> >>>> if i specify bindAddress in the spark-defaults.conf then for YARN >>>> (client >>>> mode) everything works fine and the applicationMaster finds the driver. >>>> But >>>> if i submit cluster mode then the applicationMaster, if hosted on worker >>>> nodes, won't find the driver and results in bind error. >>>> >>>> >>>> >>>> Any idea what's the missing config ? >>>> >>>> >>>> To note that i create the driver through a SparkSession object (not a >>>> SparkContext). >>>> >>>> Hint i was thinking a propagation of the driver config to the worker >>>> would >>>> solve this e.g. through spark.yarn.dist.files >>>> >>>> Any suggestions here ? >>>> >>>
