Hi Roland, I switched to the default security groups, ran my job again but the same exception pops up :-( ... All traffic is open on the security groups now.
Jochen Op vr 4 okt. 2019 om 17:37 schreef Roland Johann <roland.joh...@phenetic.io >: > This are dynamic port ranges and dependa on configuration of your cluster. > Per job there is a separate application master so there can‘t be just one > port. > If I remeber correctly the default EMR setup creates worker security > groups with unrestricted traffic within the group, e.g. Between the worker > nodes. > Depending on your security requirements I suggest that you start with a > default like setup and determine ports and port ranges from the docs > afterwards to further restrict traffic between the nodes. > > Kind regards > > Jochen Hebbrecht <jochenhebbre...@gmail.com> schrieb am Fr. 4. Okt. 2019 > um 17:16: > >> Hi Roland, >> >> We have indeed custom security groups. Can you tell me where exactly I >> need to be able to access what? >> For example, is it from the master instance to the driver instance? And >> which port should be open? >> >> Jochen >> >> Op vr 4 okt. 2019 om 17:14 schreef Roland Johann < >> roland.joh...@phenetic.io>: >> >>> Ho Jochen, >>> >>> did you setup the EMR cluster with custom security groups? Can you >>> confirm that the relevant EC2 instances can connect through relevant ports? >>> >>> Best regards >>> >>> Jochen Hebbrecht <jochenhebbre...@gmail.com> schrieb am Fr. 4. Okt. >>> 2019 um 17:09: >>> >>>> Hi Jeff, >>>> >>>> Thanks! Just tried that, but the same timeout occurs :-( ... >>>> >>>> Jochen >>>> >>>> Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang <zjf...@gmail.com>: >>>> >>>>> You can try to increase property spark.yarn.am.waitTime (by default >>>>> it is 100s) >>>>> Maybe you are doing some very time consuming operation when >>>>> initializing SparkContext, which cause timeout. >>>>> >>>>> See this property here >>>>> http://spark.apache.org/docs/latest/running-on-yarn.html >>>>> >>>>> >>>>> Jochen Hebbrecht <jochenhebbre...@gmail.com> 于2019年10月4日周五 下午10:08写道: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark >>>>>> job towards the cluster. Thhe job gets accepted, but the YARN application >>>>>> fails with: >>>>>> >>>>>> >>>>>> {code} >>>>>> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception: >>>>>> java.util.concurrent.TimeoutException: Futures timed out after >>>>>> [100000 milliseconds] >>>>>> at >>>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) >>>>>> at >>>>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) >>>>>> at >>>>>> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468) >>>>>> at org.apache.spark.deploy.yarn.ApplicationMaster.org >>>>>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) >>>>>> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED, >>>>>> exitCode: 13, (reason: Uncaught exception: >>>>>> java.util.concurrent.TimeoutException: Futures timed out after [100000 >>>>>> milliseconds] >>>>>> at >>>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) >>>>>> at >>>>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) >>>>>> at >>>>>> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468) >>>>>> at org.apache.spark.deploy.yarn.ApplicationMaster.org >>>>>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) >>>>>> at >>>>>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) >>>>>> {code} >>>>>> >>>>>> It actually goes wrong at this line: >>>>>> https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468 >>>>>> >>>>>> Now, I'm 100% sure Spark is OK and there's no bug, but there must be >>>>>> something wrong with my setup. I don't understand the code of the >>>>>> ApplicationMaster, so could somebody explain me what it is trying to >>>>>> reach? >>>>>> Where exactly does the connection timeout? So at least I can debug it >>>>>> further because I don't have a clue what it is doing :-) >>>>>> >>>>>> Thanks for any help! >>>>>> Jochen >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>>> >>>> -- >>> >>> >>> *Roland Johann*Software Developer/Data Engineer >>> >>> *phenetic GmbH* >>> Lütticher Straße 10, 50674 Köln, Germany >>> <https://www.google.com/maps/search/L%C3%BCtticher+Stra%C3%9Fe+10,+50674+K%C3%B6ln,+Germany?entry=gmail&source=g> >>> >>> Mobil: +49 172 365 26 46 >>> Mail: roland.joh...@phenetic.io >>> Web: phenetic.io >>> >>> Handelsregister: Amtsgericht Köln (HRB 92595) >>> Geschäftsführer: Roland Johann, Uwe Reimann >>> >> -- > > > *Roland Johann*Software Developer/Data Engineer > > *phenetic GmbH* > Lütticher Straße 10, 50674 Köln, Germany > > Mobil: +49 172 365 26 46 > Mail: roland.joh...@phenetic.io > Web: phenetic.io > > Handelsregister: Amtsgericht Köln (HRB 92595) > Geschäftsführer: Roland Johann, Uwe Reimann >