Thanks Timothy,

Setting these four environment variables as you suggested has got the Spark
running

LIBPROCESS_ADVERTISE_IP=<host ip>LIBPROCESS_ADVERTISE_PORT=40286
LIBPROCESS_IP=0.0.0.0 LIBPROCESS_PORT=40286

After that, it seems that Spark cannot accept any offer from mesos. If I
run the same script outside the docker container, Spark can get resource
and the Spark job runs successfully to end.

Here is the mesos master log for running the Spark job inside the Docker
container

I1230 14:29:55.710973  9557 master.cpp:2500] Subscribing framework eval.py
with checkpointing disabled and capabilities [ GPU_RESOURCES ]

I1230 14:29:55.712379  9567 hierarchical.cpp:271] Added framework
993198d1-7393-4656-9f75-4f22702609d0-0251

I1230 14:29:55.713717  9550 master.cpp:5709] Sending 1 offers to framework
993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286

I1230 14:29:55.829774  9549 master.cpp:3951] Processing DECLINE call for
offers: [ 993198d1-7393-4656-9f75-4f22702609d0-O1384 ] for framework
993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286

I1230 14:30:01.055359  9569 http.cpp:381] HTTP GET for /master/state from
172.16.8.140:49406 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
Safari/537.36'

I1230 14:30:01.457598  9553 master.cpp:5709] Sending 1 offers to framework
993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286

I1230 14:30:01.463732  9542 master.cpp:3951] Processing DECLINE call for
offers: [ 993198d1-7393-4656-9f75-4f22702609d0-O1385 ] for framework
993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286

I1230 14:30:02.300915  9562 http.cpp:381] HTTP GET for /master/state from
172.16.1.58:62629 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
Safari/537.36'

I1230 14:30:03.847647  9553 http.cpp:381] HTTP GET for /master/state from
172.16.8.140:49406 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
Safari/537.36'

I1230 14:30:04.431270  9551 http.cpp:381] HTTP GET for /master/state from
172.16.1.58:62629 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
Safari/537.36'

I1230 14:30:07.465801  9549 master.cpp:5709] Sending 1 offers to framework
993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286

I1230 14:30:07.470860  9542 master.cpp:3951] Processing DECLINE call for
offers: [ 993198d1-7393-4656-9f75-4f22702609d0-O1386 ] for framework
993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286

I1230 14:30:11.077518  9572 http.cpp:381] HTTP GET for /master/state from
172.16.8.140:59764 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
Safari/537.36'

I1230 14:30:12.387562  9560 http.cpp:381] HTTP GET for /master/state from
172.16.1.58:62629 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X
10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95
Safari/537.36'

I1230 14:30:12.473937  9572 master.cpp:5709] Sending 1 offers to framework
993198d1-7393-4656-9f75-4f22702609d0-0251 (eval.py) at
scheduler-9300fd07-7cf5-4341-84c9-4f1930e8c145@172.16.1.101:40286


On Fri, Dec 30, 2016 at 1:35 PM, Timothy Chen <tnac...@gmail.com> wrote:

> Hi Ji,
>
> One way to make it fixed is to set LIBPROCESS_PORT environment variable on
> the executor when it is launched.
>
> Tim
>
>
> On Dec 30, 2016, at 1:23 PM, Ji Yan <ji...@drive.ai> wrote:
>
> Dear Spark Users,
>
> We are trying to launch Spark on Mesos from within a docker container. We
> have found that since the Spark executors need to talk back at the Spark
> driver, there is need to do a lot of port mapping to make that happen. We
> seemed to have mapped the ports on what we could find from the
> documentation page on spark configuration.
>
> spark-2.1.0-bin-spark-2.1/bin/spark-submit \
>>   --conf 'spark.driver.host'=<host server ip> \
>>   --conf 'spark.blockManager.port'='40285' \
>>   --conf 'spark.driver.bindAddress'='0.0.0.0' \
>>   --conf 'spark.driver.port'='40284' \
>>   --conf 'spark.mesos.executor.docker.volumes'='spark-2.1.0-bin-
>> spark-2.1:/spark-2.1.0-bin-spark-2.1' \
>>   --conf 'spark.mesos.gpus.max'='2' \
>>   --conf 'spark.mesos.containerizer'='docker' \
>>   --conf 'spark.mesos.executor.docker.image'='docker.drive.ai/spark_
>> gpu_experiment:latest' \
>>   --master 'mesos://mesos_master_dev:5050' \
>>   -v eval.py
>
>
> When we launched Spark this way, from the Mesos master log. It seems that
> the mesos master is trying to make the offer back to the framework at port
> 33978 which turns out to be a dynamic port. The job failed at this point
> because it looks like that the offer cannot reach back to the container. In
> order to expose that port in the container, we'll need to make it fixed
> first, does anyone know how to make that port fixed in spark configuration?
> Any other advice on how to launch Spark on mesos from within docker
> container is greatly appreciated
>
> I1230 12:53:54.758297  9571 master.cpp:2424] Received SUBSCRIBE call for 
> framework 'eval.py' at 
> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
> I1230 12:53:54.758608  9571 master.cpp:2500] Subscribing framework eval.py 
> with checkpointing disabled and capabilities [ GPU_RESOURCES ]
> I1230 12:53:54.760036  9569 hierarchical.cpp:271] Added framework 
> 993198d1-7393-4656-9f75-4f22702609d0-0233I1230 12:53:54.761533  9549 
> master.cpp:5709] Sending 1 offers to framework 
> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at 
> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@<some ip>:33978
> E1230 12:53:57.757814  9573 process.cpp:2105] Failed to shutdown socket with 
> fd 22: Transport endpoint is not connectedI1230 12:53:57.758314  9543 
> master.cpp:1284] Framework 993198d1-7393-4656-9f75-4f22702609d0-0233 
> (eval.py) at scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 
> disconnected
> I1230 12:53:57.758378  9543 master.cpp:2725] Disconnecting framework 
> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at 
> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
> I1230 12:53:57.758411  9543 master.cpp:2749] Deactivating framework 
> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at 
> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
> I1230 12:53:57.758582  9548 hierarchical.cpp:382] Deactivated framework 
> 993198d1-7393-4656-9f75-4f22702609d0-0233
> W1230 12:53:57.758915  9543 master.hpp:2113] Master attempted to send message 
> to disconnected framework 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) 
> at scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
> I1230 12:53:57.759140  9543 master.cpp:1297] Giving framework 
> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at 
> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978 0ns to 
> failover
> I1230 12:53:57.760573  9561 master.cpp:5561] Framework failover timeout, 
> removing framework 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at 
> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
> I1230 12:53:57.760648  9561 master.cpp:6296] Removing framework 
> 993198d1-7393-4656-9f75-4f22702609d0-0233 (eval.py) at 
> scheduler-8a94bc86-c2b3-4c7d-bee7-cfddc8e9a8da@172.17.0.12:33978
> I1230 12:53:57.761493  9571 hierarchical.cpp:333] Removed framework 
> 993198d1-7393-4656-9f75-4f22702609d0-0233
>
>
> The information in this email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful.
>
>

-- 
 

The information in this email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this email 
by anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be 
taken in reliance on it, is prohibited and may be unlawful.

Reply via email to