Re: Question regarding Sparks new Internal authentication mechanism
Hi Marcelo, Thanks for looking into it. I have opened a jira for this: https://issues.apache.org/jira/browse/SPARK-21494 And yes, it works fine with internal shuffle service. But for our system we have external shuffle/dynamic allocation configured by default. We wanted to try switching from the standard SASL/3DES to the new AES based authentication. Thanks ! On Thu, Jul 20, 2017 at 4:32 PM, Marcelo Vanzin wrote: > Also, things seem to work with all your settings if you disable use of > the shuffle service (which also means no dynamic allocation), if that > helps you make progress in what you wanted to do. > > On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin > wrote: > > Hmm... I tried this with the new shuffle service (I generally have an > > old one running) and also see failures. I also noticed some odd things > > in your logs that I'm also seeing in mine, but it's better to track > > these in a bug instead of e-mail. > > > > Please file a bug and attach your logs there, I'll take a look at this. > > > > On Thu, Jul 20, 2017 at 2:06 PM, Udit Mehrotra > > wrote: > >> Hi Marcelo, > >> > >> I ran with setting DEBUG level logging for 'org.apache.spark.network. > crypto' > >> for both Spark and Yarn. > >> > >> However, the DEBUG logs still do not convey anything meaningful. Please > find > >> it attached. Can you please take a quick look, and let me know if you > see > >> anything suspicious ? > >> > >> If not, do you think I should open a JIRA for this ? > >> > >> Thanks ! > >> > >> On Wed, Jul 19, 2017 at 3:14 PM, Marcelo Vanzin > wrote: > >>> > >>> Hmm... that's not enough info and logs are intentionally kept silent > >>> to avoid flooding, but if you enable DEBUG level logging for > >>> org.apache.spark.network.crypto in both YARN and the Spark app, that > >>> might provide more info. > >>> > >>> On Wed, Jul 19, 2017 at 2:58 PM, Udit Mehrotra > >>> wrote: > >>> > So I added these settings in yarn-site.xml as well. Now I get a > >>> > completely > >>> > different error, but atleast it seems like it is using the crypto > >>> > library: > >>> > > >>> > ExecutorLostFailure (executor 1 exited caused by one of the running > >>> > tasks) > >>> > Reason: Unable to create executor due to Unable to register with > >>> > external > >>> > shuffle server due to : java.lang.IllegalArgumentException: > >>> > Authentication > >>> > failed. > >>> > at > >>> > > >>> > org.apache.spark.network.crypto.AuthRpcHandler.receive( > AuthRpcHandler.java:125) > >>> > at > >>> > > >>> > org.apache.spark.network.server.TransportRequestHandler. > processRpcRequest(TransportRequestHandler.java:157) > >>> > at > >>> > > >>> > org.apache.spark.network.server.TransportRequestHandler.handle( > TransportRequestHandler.java:105) > >>> > at > >>> > > >>> > org.apache.spark.network.server.TransportChannelHandler.channelRead( > TransportChannelHandler.java:118) > >>> > > >>> > Any clue about this ? > >>> > > >>> > > >>> > On Wed, Jul 19, 2017 at 1:13 PM, Marcelo Vanzin > > >>> > wrote: > >>> >> > >>> >> On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra > >>> >> wrote: > >>> >> > Is there any additional configuration I need for external shuffle > >>> >> > besides > >>> >> > setting the following: > >>> >> > spark.network.crypto.enabled true > >>> >> > spark.network.crypto.saslFallback false > >>> >> > spark.authenticate true > >>> >> > >>> >> Have you set these options on the shuffle service configuration too > >>> >> (which is the YARN xml config file, not spark-defaults.conf)? > >>> >> > >>> >> If you have there might be an issue, and you should probably file a > >>> >> bug and include your NM's log file. > >>> >> > >>> >> -- > >>> >> Marcelo > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> Marcelo > >> > >> > > > > > > > > -- > > Marcelo > > > > -- > Marcelo >
Re: Question regarding Sparks new Internal authentication mechanism
Also, things seem to work with all your settings if you disable use of the shuffle service (which also means no dynamic allocation), if that helps you make progress in what you wanted to do. On Thu, Jul 20, 2017 at 4:25 PM, Marcelo Vanzin wrote: > Hmm... I tried this with the new shuffle service (I generally have an > old one running) and also see failures. I also noticed some odd things > in your logs that I'm also seeing in mine, but it's better to track > these in a bug instead of e-mail. > > Please file a bug and attach your logs there, I'll take a look at this. > > On Thu, Jul 20, 2017 at 2:06 PM, Udit Mehrotra > wrote: >> Hi Marcelo, >> >> I ran with setting DEBUG level logging for 'org.apache.spark.network.crypto' >> for both Spark and Yarn. >> >> However, the DEBUG logs still do not convey anything meaningful. Please find >> it attached. Can you please take a quick look, and let me know if you see >> anything suspicious ? >> >> If not, do you think I should open a JIRA for this ? >> >> Thanks ! >> >> On Wed, Jul 19, 2017 at 3:14 PM, Marcelo Vanzin wrote: >>> >>> Hmm... that's not enough info and logs are intentionally kept silent >>> to avoid flooding, but if you enable DEBUG level logging for >>> org.apache.spark.network.crypto in both YARN and the Spark app, that >>> might provide more info. >>> >>> On Wed, Jul 19, 2017 at 2:58 PM, Udit Mehrotra >>> wrote: >>> > So I added these settings in yarn-site.xml as well. Now I get a >>> > completely >>> > different error, but atleast it seems like it is using the crypto >>> > library: >>> > >>> > ExecutorLostFailure (executor 1 exited caused by one of the running >>> > tasks) >>> > Reason: Unable to create executor due to Unable to register with >>> > external >>> > shuffle server due to : java.lang.IllegalArgumentException: >>> > Authentication >>> > failed. >>> > at >>> > >>> > org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:125) >>> > at >>> > >>> > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157) >>> > at >>> > >>> > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) >>> > at >>> > >>> > org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) >>> > >>> > Any clue about this ? >>> > >>> > >>> > On Wed, Jul 19, 2017 at 1:13 PM, Marcelo Vanzin >>> > wrote: >>> >> >>> >> On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra >>> >> wrote: >>> >> > Is there any additional configuration I need for external shuffle >>> >> > besides >>> >> > setting the following: >>> >> > spark.network.crypto.enabled true >>> >> > spark.network.crypto.saslFallback false >>> >> > spark.authenticate true >>> >> >>> >> Have you set these options on the shuffle service configuration too >>> >> (which is the YARN xml config file, not spark-defaults.conf)? >>> >> >>> >> If you have there might be an issue, and you should probably file a >>> >> bug and include your NM's log file. >>> >> >>> >> -- >>> >> Marcelo >>> > >>> > >>> >>> >>> >>> -- >>> Marcelo >> >> > > > > -- > Marcelo -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Question regarding Sparks new Internal authentication mechanism
Hmm... I tried this with the new shuffle service (I generally have an old one running) and also see failures. I also noticed some odd things in your logs that I'm also seeing in mine, but it's better to track these in a bug instead of e-mail. Please file a bug and attach your logs there, I'll take a look at this. On Thu, Jul 20, 2017 at 2:06 PM, Udit Mehrotra wrote: > Hi Marcelo, > > I ran with setting DEBUG level logging for 'org.apache.spark.network.crypto' > for both Spark and Yarn. > > However, the DEBUG logs still do not convey anything meaningful. Please find > it attached. Can you please take a quick look, and let me know if you see > anything suspicious ? > > If not, do you think I should open a JIRA for this ? > > Thanks ! > > On Wed, Jul 19, 2017 at 3:14 PM, Marcelo Vanzin wrote: >> >> Hmm... that's not enough info and logs are intentionally kept silent >> to avoid flooding, but if you enable DEBUG level logging for >> org.apache.spark.network.crypto in both YARN and the Spark app, that >> might provide more info. >> >> On Wed, Jul 19, 2017 at 2:58 PM, Udit Mehrotra >> wrote: >> > So I added these settings in yarn-site.xml as well. Now I get a >> > completely >> > different error, but atleast it seems like it is using the crypto >> > library: >> > >> > ExecutorLostFailure (executor 1 exited caused by one of the running >> > tasks) >> > Reason: Unable to create executor due to Unable to register with >> > external >> > shuffle server due to : java.lang.IllegalArgumentException: >> > Authentication >> > failed. >> > at >> > >> > org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:125) >> > at >> > >> > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157) >> > at >> > >> > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) >> > at >> > >> > org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) >> > >> > Any clue about this ? >> > >> > >> > On Wed, Jul 19, 2017 at 1:13 PM, Marcelo Vanzin >> > wrote: >> >> >> >> On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra >> >> wrote: >> >> > Is there any additional configuration I need for external shuffle >> >> > besides >> >> > setting the following: >> >> > spark.network.crypto.enabled true >> >> > spark.network.crypto.saslFallback false >> >> > spark.authenticate true >> >> >> >> Have you set these options on the shuffle service configuration too >> >> (which is the YARN xml config file, not spark-defaults.conf)? >> >> >> >> If you have there might be an issue, and you should probably file a >> >> bug and include your NM's log file. >> >> >> >> -- >> >> Marcelo >> > >> > >> >> >> >> -- >> Marcelo > > -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Question regarding Sparks new Internal authentication mechanism
Hmm... that's not enough info and logs are intentionally kept silent to avoid flooding, but if you enable DEBUG level logging for org.apache.spark.network.crypto in both YARN and the Spark app, that might provide more info. On Wed, Jul 19, 2017 at 2:58 PM, Udit Mehrotra wrote: > So I added these settings in yarn-site.xml as well. Now I get a completely > different error, but atleast it seems like it is using the crypto library: > > ExecutorLostFailure (executor 1 exited caused by one of the running tasks) > Reason: Unable to create executor due to Unable to register with external > shuffle server due to : java.lang.IllegalArgumentException: Authentication > failed. > at > org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:125) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) > > Any clue about this ? > > > On Wed, Jul 19, 2017 at 1:13 PM, Marcelo Vanzin wrote: >> >> On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra >> wrote: >> > Is there any additional configuration I need for external shuffle >> > besides >> > setting the following: >> > spark.network.crypto.enabled true >> > spark.network.crypto.saslFallback false >> > spark.authenticate true >> >> Have you set these options on the shuffle service configuration too >> (which is the YARN xml config file, not spark-defaults.conf)? >> >> If you have there might be an issue, and you should probably file a >> bug and include your NM's log file. >> >> -- >> Marcelo > > -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Question regarding Sparks new Internal authentication mechanism
So I added these settings in yarn-site.xml as well. Now I get a completely different error, but atleast it seems like it is using the crypto library: ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Unable to create executor due to Unable to register with external shuffle server due to : java.lang.IllegalArgumentException: Authentication failed. at org.apache.spark.network.crypto.AuthRpcHandler.receive(AuthRpcHandler.java:125) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) Any clue about this ? On Wed, Jul 19, 2017 at 1:13 PM, Marcelo Vanzin wrote: > On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra > wrote: > > Is there any additional configuration I need for external shuffle besides > > setting the following: > > spark.network.crypto.enabled true > > spark.network.crypto.saslFallback false > > spark.authenticate true > > Have you set these options on the shuffle service configuration too > (which is the YARN xml config file, not spark-defaults.conf)? > > If you have there might be an issue, and you should probably file a > bug and include your NM's log file. > > -- > Marcelo >
Re: Question regarding Sparks new Internal authentication mechanism
So i am using the standard documentation here for configuring external shuffle service: https://spark.apache.org/docs/latest/running-on-yarn.html# configuring-the-external-shuffle-service I can see on the cluster that following jars are present as well: /usr/lib/spark/yarn/spark-2.2.0-yarn-shuffle.jar Is there any additional configuration I need for external shuffle besides setting the following: spark.network.crypto.enabled true spark.network.crypto.saslFallback false spark.authenticate true On Wed, Jul 19, 2017 at 12:51 PM, Marcelo Vanzin wrote: > Well, how did you install the Spark shuffle service on YARN? It's not > part of YARN. > > If you really have the Spark 2.2 shuffle service jar deployed in your > YARN service, then perhaps you didn't configure it correctly to use > the new auth mechanism. > > On Wed, Jul 19, 2017 at 12:47 PM, Udit Mehrotra > wrote: > > Sorry about that. Will keep the list in my replies. > > > > So, just to clarify I am not using an older version of sparks shuffle > > service. This is a brand new cluster with just Spark 2.2.0 installed > > alongside hadoop 2.7.3. Could there be anything else I am missing, or I > can > > try differently ? > > > > > > Thanks ! > > > > > > On Wed, Jul 19, 2017 at 12:03 PM, Marcelo Vanzin > > wrote: > >> > >> Please include the list on your replies, so others can benefit from > >> the discussion too. > >> > >> On Wed, Jul 19, 2017 at 11:43 AM, Udit Mehrotra > >> wrote: > >> > Hi Marcelo, > >> > > >> > Thanks a lot for confirming that. Can you explain what you mean by > >> > upgrading > >> > the version of shuffle service ? Wont it automatically use the > >> > corresponding > >> > class from spark 2.2.0 to start the external shuffle service ? > >> > >> That depends on how you deploy your shuffle service. Normally YARN > >> will have no idea that your application is using a new Spark - it will > >> still have the old version of the service jar in its classpath. > >> > >> > >> -- > >> Marcelo > > > > > > > > -- > Marcelo >
Re: Question regarding Sparks new Internal authentication mechanism
On Wed, Jul 19, 2017 at 1:10 PM, Udit Mehrotra wrote: > Is there any additional configuration I need for external shuffle besides > setting the following: > spark.network.crypto.enabled true > spark.network.crypto.saslFallback false > spark.authenticate true Have you set these options on the shuffle service configuration too (which is the YARN xml config file, not spark-defaults.conf)? If you have there might be an issue, and you should probably file a bug and include your NM's log file. -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Question regarding Sparks new Internal authentication mechanism
Well, how did you install the Spark shuffle service on YARN? It's not part of YARN. If you really have the Spark 2.2 shuffle service jar deployed in your YARN service, then perhaps you didn't configure it correctly to use the new auth mechanism. On Wed, Jul 19, 2017 at 12:47 PM, Udit Mehrotra wrote: > Sorry about that. Will keep the list in my replies. > > So, just to clarify I am not using an older version of sparks shuffle > service. This is a brand new cluster with just Spark 2.2.0 installed > alongside hadoop 2.7.3. Could there be anything else I am missing, or I can > try differently ? > > > Thanks ! > > > On Wed, Jul 19, 2017 at 12:03 PM, Marcelo Vanzin > wrote: >> >> Please include the list on your replies, so others can benefit from >> the discussion too. >> >> On Wed, Jul 19, 2017 at 11:43 AM, Udit Mehrotra >> wrote: >> > Hi Marcelo, >> > >> > Thanks a lot for confirming that. Can you explain what you mean by >> > upgrading >> > the version of shuffle service ? Wont it automatically use the >> > corresponding >> > class from spark 2.2.0 to start the external shuffle service ? >> >> That depends on how you deploy your shuffle service. Normally YARN >> will have no idea that your application is using a new Spark - it will >> still have the old version of the service jar in its classpath. >> >> >> -- >> Marcelo > > -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Question regarding Sparks new Internal authentication mechanism
Sorry about that. Will keep the list in my replies. So, just to clarify I am not using an older version of sparks shuffle service. This is a brand new cluster with just Spark 2.2.0 installed alongside hadoop 2.7.3. Could there be anything else I am missing, or I can try differently ? Thanks ! On Wed, Jul 19, 2017 at 12:03 PM, Marcelo Vanzin wrote: > Please include the list on your replies, so others can benefit from > the discussion too. > > On Wed, Jul 19, 2017 at 11:43 AM, Udit Mehrotra > wrote: > > Hi Marcelo, > > > > Thanks a lot for confirming that. Can you explain what you mean by > upgrading > > the version of shuffle service ? Wont it automatically use the > corresponding > > class from spark 2.2.0 to start the external shuffle service ? > > That depends on how you deploy your shuffle service. Normally YARN > will have no idea that your application is using a new Spark - it will > still have the old version of the service jar in its classpath. > > > -- > Marcelo >
Re: Question regarding Sparks new Internal authentication mechanism
Please include the list on your replies, so others can benefit from the discussion too. On Wed, Jul 19, 2017 at 11:43 AM, Udit Mehrotra wrote: > Hi Marcelo, > > Thanks a lot for confirming that. Can you explain what you mean by upgrading > the version of shuffle service ? Wont it automatically use the corresponding > class from spark 2.2.0 to start the external shuffle service ? That depends on how you deploy your shuffle service. Normally YARN will have no idea that your application is using a new Spark - it will still have the old version of the service jar in its classpath. -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Question regarding Sparks new Internal authentication mechanism
Hi Spark Dev’s, I am trying out the new Spark’s internal authentication mechanism based off AES encryption, https://issues.apache.org/jira/browse/SPARK-19139 which has come up in Spark 2.2.0. I set the following properties in my spark-defaults: spark.network.crypto.enabled true spark.network.crypto.saslFallback false spark.authenticate true This seems to work fine with internal shuffle service of Spark. However, when in I try it with Yarn’s external shuffle service the executors are unable to register with the shuffle service as it still expects SASL authentication. Here is the error I get: ExecutorLostFailure (executor 42 exited caused by one of the running tasks) Reason: Unable to create executor due to Unable to register with external shuffle server due to : java.lang.IllegalStateException: Expected SaslMessage, received something else (maybe your client does not have SASL enabled?) at org.apache.spark.network.sasl.SaslMessage.decode(SaslMessage.java:69) at org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:89) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:157) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118) Can someone confirm that this is expected behavior? Or provide some guidance, on how I can make it work with external shuffle service ? Note: If I set ‘spark.network.crypto.saslFallback’ to true, the job runs fine with external shuffle service as well since its falling back to sasl authentication. Thank you for your help.
Re: Question regarding Sparks new Internal authentication mechanism
On Wed, Jul 19, 2017 at 11:19 AM, Udit Mehrotra wrote: > spark.network.crypto.saslFallback false > spark.authenticate true > > This seems to work fine with internal shuffle service of Spark. However, > when in I try it with Yarn’s external shuffle service the executors are > unable to register with the shuffle service as it still expects SASL > authentication. Here is the error I get: > > Can someone confirm that this is expected behavior? Or provide some > guidance, on how I can make it work with external shuffle service ? Yes, that's the expected behavior, since you disabled SASL fallback in your configuration. If you set it back on, then you can talk to the old shuffle service. Or you could upgrade the version of the shuffle service running on your YARN cluster so that it also supports the new auth mechanism. -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org