Re: Standalone client failing with docker deployed cluster

2014-05-16 Thread Bharath Ravi Kumar
(Trying to bubble up the issue again...)

Any insights (based on the enclosed logs) into why standalone client
invocation might fail while issuing jobs through the spark console
succeeded?

Thanks,
Bharath


On Thu, May 15, 2014 at 5:08 PM, Bharath Ravi Kumar wrote:

> Hi,
>
> I'm running the spark server with a single worker on a laptop using the
> docker images. The spark shell examples run fine with this setup. However,
> a standalone java client that tries to run wordcount on a local files (1 MB
> in size), the execution fails with the following error on the stdout of the
> worker:
>
> 14/05/15 10:31:21 INFO Slf4jLogger: Slf4jLogger started
> 14/05/15 10:31:21 INFO Remoting: Starting remoting
> 14/05/15 10:31:22 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkExecutor@worker1:55924]
> 14/05/15 10:31:22 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://sparkExecutor@worker1:55924]
> 14/05/15 10:31:22 INFO CoarseGrainedExecutorBackend: Connecting to driver:
> akka.tcp://spark@R9FX97h.local:56720/user/CoarseGrainedScheduler
> 14/05/15 10:31:22 INFO WorkerWatcher: Connecting to worker
> akka.tcp://sparkWorker@worker1:50040/user/Worker
> 14/05/15 10:31:22 WARN Remoting: Tried to associate with unreachable
> remote address [akka.tcp://spark@R9FX97h.local:56720]. Address is now
> gated for 6 ms, all messages to this address will be delivered to dead
> letters.
> 14/05/15 10:31:22 ERROR CoarseGrainedExecutorBackend: Driver Disassociated
> [akka.tcp://sparkExecutor@worker1:55924] ->
> [akka.tcp://spark@R9FX97h.local:56720] disassociated! Shutting down.
>
> I noticed the following messages on the worker console when I attached
> through docker:
>
> 14/05/15 11:24:33 INFO Worker: Asked to launch executor
> app-20140515112408-0005/7 for billingLogProcessor
> 14/05/15 11:24:33 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@worker1:50040] ->
> [akka.tcp://sparkExecutor@worker1:42437]: Error [Association failed with
> [akka.tcp://sparkExecutor@worker1:42437]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkExecutor@worker1:42437]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: worker1/172.17.0.4:42437
> ]
> 14/05/15 11:24:33 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@worker1:50040] ->
> [akka.tcp://sparkExecutor@worker1:42437]: Error [Association failed with
> [akka.tcp://sparkExecutor@worker1:42437]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkExecutor@worker1:42437]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: worker1/172.17.0.4:42437
> ]
> 14/05/15 11:24:33 INFO ExecutorRunner: Launch command:
> "/usr/lib/jvm/java-7-openjdk-amd64/bin/java" "-cp"
> ":/opt/spark-0.9.0/conf:/opt/spark-0.9.0/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar"
> "-Xms512M" "-Xmx512M"
> "org.apache.spark.executor.CoarseGrainedExecutorBackend"
> "akka.tcp://spark@R9FX97h.local:46986/user/CoarseGrainedScheduler" "7"
> "worker1" "1" "akka.tcp://sparkWorker@worker1:50040/user/Worker"
> "app-20140515112408-0005"
> 14/05/15 11:24:35 INFO Worker: Executor app-20140515112408-0005/7 finished
> with state FAILED message Command exited with code 1 exitStatus 1
> 14/05/15 11:24:35 INFO LocalActorRef: Message
> [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
> Actor[akka://sparkWorker/deadLetters] to
> Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40172.17.0.4%3A33648-135#310170905]
> was not delivered. [34] dead letters encountered. This logging can be
> turned off or adjusted with configuration settings 'akka.log-dead-letters'
> and 'akka.log-dead-letters-during-shutdown'.
> 14/05/15 11:24:35 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@worker1:50040] ->
> [akka.tcp://sparkExecutor@worker1:56594]: Error [Association failed with
> [akka.tcp://sparkExecutor@worker1:56594]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkExecutor@worker1:56594]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: worker1/172.17.0.4:56594
> ]
> 14/05/15 11:24:35 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@worker1:50040] ->
> [akka.tcp://sparkExecutor@worker1:56594]: Error [Association failed with
> [akka.tcp://sparkExecutor@worker1:56594]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkExecutor@worker1:56594]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: worker1/172.17.0.4:56594
> ]
>
> The significant code snippets from the standalone java client are as
> follows:
>
> JavaSparkContext ctx = new JavaSparkContext(masterAddr

Standalone client failing with docker deployed cluster

2014-05-16 Thread Bharath Ravi Kumar
Hi,

I'm running the spark server with a single worker on a laptop using the
docker images. The spark shell examples run fine with this setup. However,
a standalone java client that tries to run wordcount on a local files (1 MB
in size), the execution fails with the following error on the stdout of the
worker:

14/05/15 10:31:21 INFO Slf4jLogger: Slf4jLogger started
14/05/15 10:31:21 INFO Remoting: Starting remoting
14/05/15 10:31:22 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkExecutor@worker1:55924]
14/05/15 10:31:22 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkExecutor@worker1:55924]
14/05/15 10:31:22 INFO CoarseGrainedExecutorBackend: Connecting to driver:
akka.tcp://spark@R9FX97h.local:56720/user/CoarseGrainedScheduler
14/05/15 10:31:22 INFO WorkerWatcher: Connecting to worker
akka.tcp://sparkWorker@worker1:50040/user/Worker
14/05/15 10:31:22 WARN Remoting: Tried to associate with unreachable remote
address [akka.tcp://spark@R9FX97h.local:56720]. Address is now gated for
6 ms, all messages to this address will be delivered to dead letters.
14/05/15 10:31:22 ERROR CoarseGrainedExecutorBackend: Driver Disassociated
[akka.tcp://sparkExecutor@worker1:55924] ->
[akka.tcp://spark@R9FX97h.local:56720]
disassociated! Shutting down.

I noticed the following messages on the worker console when I attached
through docker:

14/05/15 11:24:33 INFO Worker: Asked to launch executor
app-20140515112408-0005/7 for billingLogProcessor
14/05/15 11:24:33 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@worker1:50040] ->
[akka.tcp://sparkExecutor@worker1:42437]:
Error [Association failed with [akka.tcp://sparkExecutor@worker1:42437]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@worker1:42437]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: worker1/172.17.0.4:42437
]
14/05/15 11:24:33 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@worker1:50040] ->
[akka.tcp://sparkExecutor@worker1:42437]:
Error [Association failed with [akka.tcp://sparkExecutor@worker1:42437]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@worker1:42437]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: worker1/172.17.0.4:42437
]
14/05/15 11:24:33 INFO ExecutorRunner: Launch command:
"/usr/lib/jvm/java-7-openjdk-amd64/bin/java" "-cp"
":/opt/spark-0.9.0/conf:/opt/spark-0.9.0/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar"
"-Xms512M" "-Xmx512M"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"akka.tcp://spark@R9FX97h.local:46986/user/CoarseGrainedScheduler" "7"
"worker1" "1" "akka.tcp://sparkWorker@worker1:50040/user/Worker"
"app-20140515112408-0005"
14/05/15 11:24:35 INFO Worker: Executor app-20140515112408-0005/7 finished
with state FAILED message Command exited with code 1 exitStatus 1
14/05/15 11:24:35 INFO LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40172.17.0.4%3A33648-135#310170905]
was not delivered. [34] dead letters encountered. This logging can be
turned off or adjusted with configuration settings 'akka.log-dead-letters'
and 'akka.log-dead-letters-during-shutdown'.
14/05/15 11:24:35 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@worker1:50040] ->
[akka.tcp://sparkExecutor@worker1:56594]:
Error [Association failed with [akka.tcp://sparkExecutor@worker1:56594]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@worker1:56594]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: worker1/172.17.0.4:56594
]
14/05/15 11:24:35 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@worker1:50040] ->
[akka.tcp://sparkExecutor@worker1:56594]:
Error [Association failed with [akka.tcp://sparkExecutor@worker1:56594]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@worker1:56594]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: worker1/172.17.0.4:56594
]

The significant code snippets from the standalone java client are as
follows:

JavaSparkContext ctx = new JavaSparkContext(masterAddr, "log_processor",
sparkHome, jarFileLoc);
JavaRDD rawLog = ctx.textFile("/tmp/some.log");
List> topRecords =
rawLog.map(fieldSplitter).map(fieldExtractor).top(5, tupleComparator);


However, running the sample code provided on github (amplab docker page)
over the spark shell went through fine with the following stdout message:

14/05/15 10:39:41 INFO Slf4jLogger: Slf4jLogger started
14/05/15 10:39:42 INFO Remoting: Starting remoting
14/05/15 1