Hi All:I was submitting a spark_program.jar to `spark on yarn cluster` on a
driver machine with yarn-client mode. Here is the spark-submit command I used:
./spark-submit --master yarn-client --class
com.charlie.spark.grax.OldFollowersExample --queue dt_spark
~/script/spark-flume-test-0.1-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.1.jarThe queue
`dt_spark` was free, and the program was submitted succesfully and running on
the cluster. But on console, it showed repeatedly that:
14/11/18 15:11:48 WARN YarnClientClusterScheduler: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
Checked the cluster UI logs, I find no errors:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/disk5/yarn/usercache/linqili/filecache/6957209742046754908/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hadoop/hadoop-2.0.0-cdh4.2.1/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/11/18 14:28:16 INFO SecurityManager: Changing view acls to: hadoop,linqili
14/11/18 14:28:16 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(hadoop, linqili)
14/11/18 14:28:17 INFO Slf4jLogger: Slf4jLogger started
14/11/18 14:28:17 INFO Remoting: Starting remoting
14/11/18 14:28:17 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkyar...@longzhou-hdp3.lz.dscc:37187]
14/11/18 14:28:17 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkyar...@longzhou-hdp3.lz.dscc:37187]
14/11/18 14:28:17 INFO ExecutorLauncher: ApplicationAttemptId:
appattempt_1415961020140_0325_000001
14/11/18 14:28:17 INFO ExecutorLauncher: Connecting to ResourceManager at
longzhou-hdpnn.lz.dscc/192.168.19.107:12032
14/11/18 14:28:17 INFO ExecutorLauncher: Registering the ApplicationMaster
14/11/18 14:28:18 INFO ExecutorLauncher: Waiting for spark driver to be
reachable.
14/11/18 14:28:18 INFO ExecutorLauncher: Master now available:
192.168.59.90:36691
14/11/18 14:28:18 INFO ExecutorLauncher: Listen to driver:
akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler
14/11/18 14:28:18 INFO ExecutorLauncher: Allocating 1 executors.
14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor containers
with 1408 of memory each.
14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 1, priority = 1 , capability : memory: 1408)
14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor containers
with 1408 of memory each.
14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 1, priority = 1 , capability : memory: 1408)
14/11/18 14:28:18 INFO RackResolver: Resolved longzhou-hdp3.lz.dscc to /rack1
14/11/18 14:28:18 INFO YarnAllocationHandler: launching container on
container_1415961020140_0325_01_000002 host longzhou-hdp3.lz.dscc
14/11/18 14:28:18 INFO ExecutorRunnable: Starting Executor Container
14/11/18 14:28:18 INFO ExecutorRunnable: Connecting to ContainerManager at
longzhou-hdp3.lz.dscc:12040
14/11/18 14:28:18 INFO ExecutorRunnable: Setting up ContainerLaunchContext
14/11/18 14:28:18 INFO ExecutorRunnable: Preparing Local resources
14/11/18 14:28:18 INFO ExecutorLauncher: All executors have launched.
14/11/18 14:28:18 INFO ExecutorLauncher: Started progress reporter thread -
sleep time : 5000
14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 0, priority = 1 , capability : memory: 1408)
14/11/18 14:28:18 INFO ExecutorRunnable: Prepared Local resources
Map(__spark__.jar -> resource {, scheme: "hdfs", host:
"longzhou-hdpnn.lz.dscc", port: 11000, file:
"/user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar",
}, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: PRIVATE,
)
14/11/18 14:28:18 INFO ExecutorRunnable: Setting up executor with commands:
List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m
-Xmx1024m ,
-Djava.security.krb5.conf=/home/linqili/proc/spark_client/hadoop/kerberos5-client/etc/krb5.conf
-Djava.library.path=/home/linqili/proc/spark_client/hadoop/lib/native/Linux-amd64-64,
-Djava.io.tmpdir=$PWD/tmp,
-Dlog4j.configuration=log4j-spark-container.properties,
org.apache.spark.executor.CoarseGrainedExecutorBackend,
akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler, 1,
longzhou-hdp3.lz.dscc, 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
14/11/18 14:28:23 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 0, priority = 1 , capability : memory: 1408)
14/11/18 14:28:23 INFO YarnAllocationHandler: Completed container
container_1415961020140_0325_01_000002 (state: COMPLETE, exit status: 1)
14/11/18 14:28:23 INFO YarnAllocationHandler: Container marked as failed:
container_1415961020140_0325_01_000002
14/11/18 14:28:28 INFO ExecutorLauncher: Allocating 1 containers to make up for
(potentially ?) lost containers
14/11/18 14:28:28 INFO YarnAllocationHandler: Allocating 1 executor containers
with 1408 of memory each.
14/11/18 14:28:28 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 1, priority = 1 , capability : memory: 1408)
14/11/18 14:28:33 INFO ExecutorLauncher: Allocating 1 containers to make up for
(potentially ?) lost containers
14/11/18 14:28:33 INFO YarnAllocationHandler: Allocating 1 executor containers
with 1408 of memory each.
14/11/18 14:28:33 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 1, priority = 1 , capability : memory: 1408)
14/11/18 14:28:33 INFO RackResolver: Resolved longzhou-hdp2.lz.dscc to /rack1
14/11/18 14:28:33 INFO YarnAllocationHandler: launching container on
container_1415961020140_0325_01_000003 host longzhou-hdp2.lz.dscc
14/11/18 14:28:33 INFO ExecutorRunnable: Starting Executor Container
14/11/18 14:28:33 INFO ExecutorRunnable: Connecting to ContainerManager at
longzhou-hdp2.lz.dscc:12040
14/11/18 14:28:33 INFO ExecutorRunnable: Setting up ContainerLaunchContext
14/11/18 14:28:33 INFO ExecutorRunnable: Preparing Local resources
14/11/18 14:28:33 INFO ExecutorRunnable: Prepared Local resources
Map(__spark__.jar -> resource {, scheme: "hdfs", host:
"longzhou-hdpnn.lz.dscc", port: 11000, file:
"/user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar",
}, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: PRIVATE,
)
14/11/18 14:28:33 INFO ExecutorRunnable: Setting up executor with commands:
List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m
-Xmx1024m ,
-Djava.security.krb5.conf=/home/linqili/proc/spark_client/hadoop/kerberos5-client/etc/krb5.conf
-Djava.library.path=/home/linqili/proc/spark_client/hadoop/lib/native/Linux-amd64-64,
-Djava.io.tmpdir=$PWD/tmp,
-Dlog4j.configuration=log4j-spark-container.properties,
org.apache.spark.executor.CoarseGrainedExecutorBackend,
akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler, 2,
longzhou-hdp2.lz.dscc, 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
14/11/18 14:28:38 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 0, priority = 1 , capability : memory: 1408)
14/11/18 14:28:38 INFO YarnAllocationHandler: Ignoring container
container_1415961020140_0325_01_000004 at host longzhou-hdp2.lz.dscc, since we
already have the required number of containers for it.
14/11/18 14:28:38 INFO YarnAllocationHandler: Completed container
container_1415961020140_0325_01_000003 (state: COMPLETE, exit status: 1)
14/11/18 14:28:38 INFO YarnAllocationHandler: Container marked as failed:
container_1415961020140_0325_01_000003
14/11/18 14:28:43 INFO ExecutorLauncher: Allocating 1 containers to make up for
(potentially ?) lost containers
14/11/18 14:28:43 INFO YarnAllocationHandler: Releasing 1 containers.
pendingReleaseContainers : {container_1415961020140_0325_01_000004=true}
14/11/18 14:28:43 INFO YarnAllocationHandler: Allocating 1 executor containers
with 1408 of memory each.
14/11/18 14:28:43 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 1, priority = 1 , capability : memory: 1408)
14/11/18 14:28:48 INFO ExecutorLauncher: Allocating 1 containers to make up for
(potentially ?) lost containers
14/11/18 14:28:48 INFO YarnAllocationHandler: Allocating 1 executor containers
with 1408 of memory each.
14/11/18 14:28:48 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 1, priority = 1 , capability : memory: 1408)
14/11/18 14:28:48 INFO YarnAllocationHandler: launching container on
container_1415961020140_0325_01_000005 host longzhou-hdp2.lz.dscc
14/11/18 14:28:48 INFO ExecutorRunnable: Starting Executor Container
14/11/18 14:28:48 INFO ExecutorRunnable: Connecting to ContainerManager at
longzhou-hdp2.lz.dscc:12040
14/11/18 14:28:48 INFO ExecutorRunnable: Setting up ContainerLaunchContext
14/11/18 14:28:48 INFO ExecutorRunnable: Preparing Local resources
14/11/18 14:28:48 INFO ExecutorRunnable: Prepared Local resources
Map(__spark__.jar -> resource {, scheme: "hdfs", host:
"longzhou-hdpnn.lz.dscc", port: 11000, file:
"/user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar",
}, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: PRIVATE,
)
14/11/18 14:28:48 INFO ExecutorRunnable: Setting up executor with commands:
List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m
-Xmx1024m ,
-Djava.security.krb5.conf=/home/linqili/proc/spark_client/hadoop/kerberos5-client/etc/krb5.conf
-Djava.library.path=/home/linqili/proc/spark_client/hadoop/lib/native/Linux-amd64-64,
-Djava.io.tmpdir=$PWD/tmp,
-Dlog4j.configuration=log4j-spark-container.properties,
org.apache.spark.executor.CoarseGrainedExecutorBackend,
akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler, 3,
longzhou-hdp2.lz.dscc, 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
14/11/18 14:28:53 INFO YarnAllocationHandler: ResourceRequest (host : *, num
containers: 0, priority = 1 , capability : memory: 1408)
14/11/18 14:28:53 INFO YarnAllocationHandler: Ignoring container
container_1415961020140_0325_01_000006 at host longzhou-hdp2.lz.dscc, since we
already have the required number of containers for it.Is there any hint?
Thanks.