[jira] [Commented] (SPARK-19569) could not get APP ID and cause failed to connect to spark driver on yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017066#comment-16017066 ] Saisai Shao commented on SPARK-19569: - [~ouyangxc.zte] In you above code you directly call {{client.submitApplication()}} to invoke Spark application, I assume this client is {{org.apache.spark.deploy.yarn.Client}}. From my understanding it is not allowed to directly call this class. Also if you directly using yarn#client to invoke Spark on YARN application, I would doubt you will probably have to do lots of preparation works done by SparkSubmit. > could not get APP ID and cause failed to connect to spark driver on > yarn-client mode > - > > Key: SPARK-19569 > URL: https://issues.apache.org/jira/browse/SPARK-19569 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2 > Environment: hadoop2.7.1 > spark2.0.2 > hive2.2 >Reporter: KaiXu > > when I run Hive queries on Spark, got below error in the console, after check > the container's log, found it failed to connected to spark driver. I have set > hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been > submitted after 3601s', actually during this long-time period it's impossible > no available resource, and also did not see any issue related to the network, > so the cause is not clear from the message "Possible reasons include network > issues, errors in remote driver or the cluster has no available resources, > etc.". > From Hive's log, failed to get APP ID, so this might be the cause why the > driver did not start up. > console log: > Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00 > Job hasn't been submitted after 3601s. Aborting it. > Possible reasons include network issues, errors in remote driver or the > cluster has no available resources, etc. > Please check YARN or Spark driver's logs for further information. > Status: SENT > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask > container's log: > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources > Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 > file: > "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip" > } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: > PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: > 8020 file: > "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" > } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE) > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: > appattempt_1486905599813_0046_02 > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: > 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); groups > with view permissions: Set(); users with modify permissions: Set(root); > groups with modify permissions: Set() > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be > reachable. > 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:4365
[jira] [Commented] (SPARK-19569) could not get APP ID and cause failed to connect to spark driver on yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016996#comment-16016996 ] Xiaochen Ouyang commented on SPARK-19569: - It is really a problem, we should reopen this issue. Because we can reproduce this problem by programing way. as follow: val conf = new SparkConf() conf.set("spark.app.name", "SparkOnYarnClient") conf.setMaster("yarn-client") conf.set("spark.driver.host","192.168.10.128") val arg0 = new ArrayBuffer[String]() arg0 += "--jar" arg0 += args(0) arg0 += "--class" arg0 += "com.hello.SparkPI" val client = new Client(cArgs, hadoopConf, conf) client.submitApplication() But, it will be successfully when we using spark-submit shell to submit a job whih yarn-client mode. > could not get APP ID and cause failed to connect to spark driver on > yarn-client mode > - > > Key: SPARK-19569 > URL: https://issues.apache.org/jira/browse/SPARK-19569 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2 > Environment: hadoop2.7.1 > spark2.0.2 > hive2.2 >Reporter: KaiXu > > when I run Hive queries on Spark, got below error in the console, after check > the container's log, found it failed to connected to spark driver. I have set > hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been > submitted after 3601s', actually during this long-time period it's impossible > no available resource, and also did not see any issue related to the network, > so the cause is not clear from the message "Possible reasons include network > issues, errors in remote driver or the cluster has no available resources, > etc.". > From Hive's log, failed to get APP ID, so this might be the cause why the > driver did not start up. > console log: > Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00 > Job hasn't been submitted after 3601s. Aborting it. > Possible reasons include network issues, errors in remote driver or the > cluster has no available resources, etc. > Please check YARN or Spark driver's logs for further information. > Status: SENT > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask > container's log: > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources > Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 > file: > "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip" > } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: > PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: > 8020 file: > "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" > } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE) > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: > appattempt_1486905599813_0046_02 > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: > 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); groups > with view permissions: Set(); users with modify permissions: Set(root); > groups with modify permissions: Set() > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be > reachable. > 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Fai