Re: Mesos + storm on top of Docker

Yaron Rosenbaum Wed, 20 Aug 2014 22:21:55 -0700

Thanks

First fork that I see, that has build instructions.
Do I run all services on the Mesos master, and it spawns executors for each?
I guess I only need one storm-ui for example, but storm-log for every 
storm-supervisor. Does mesos take care of it?


(Y)

On Aug 20, 2014, at 11:18 AM, Tomas Barton <[email protected]> wrote:

> The later is definitely a better choice. Yet another fork of storm-mesos is 
> here:
> 
> https://github.com/deric/storm-mesos
> 
> 
> On 19 August 2014 20:22, Yaron Rosenbaum <[email protected]> wrote:
> I'm not getting it from git, but rather downloading it from: 
> http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz
> 
> And it looks a bit dated.
> Looking at git, there are two forks that seem more or less 'official':
> https://github.com/mesos/storm
> https://github.com/mesosphere/storm-mesos
> 
> The first hasn't been updated in a while.
> 
> 
> (Y)
> 
> On Aug 19, 2014, at 5:54 PM, Brenden Matthews 
> <[email protected]> wrote:
> 
>> What version of the storm on mesos code are you running?  i.e., what is the 
>> git sha?
>> 
>> On Mon, Aug 18, 2014 at 11:53 PM, Yaron Rosenbaum 
>> <[email protected]> wrote:
>> Ok, thanks for the tip!
>> Made some progress. Now this is what I get :
>> stderr on the slave:
>> WARNING: Logging before InitGoogleLogging() is written to STDERR
>> I0818 19:06:55.033699    22 fetcher.cpp:73] Fetching URI 
>> 'http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz'
>> I0818 19:06:55.033994    22 fetcher.cpp:123] Downloading 
>> 'http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz' to 
>> '/tmp/mesos/slaves/20140818-190538-2466255276-5050-11-0/frameworks/20140818-190538-2466255276-5050-11-0002/executors/wordcount-1-1408388814/runs/69496890-fc18-43f3-be87-198bceba7226/storm-mesos-0.9.tgz'
>> I0818 19:07:11.567514    22 fetcher.cpp:61] Extracted resource 
>> '/tmp/mesos/slaves/20140818-190538-2466255276-5050-11-0/frameworks/20140818-190538-2466255276-5050-11-0002/executors/wordcount-1-1408388814/runs/69496890-fc18-43f3-be87-198bceba7226/storm-mesos-0.9.tgz'
>>  into 
>> '/tmp/mesos/slaves/20140818-190538-2466255276-5050-11-0/frameworks/20140818-190538-2466255276-5050-11-0002/executors/wordcount-1-1408388814/runs/69496890-fc18-43f3-be87-198bceba7226'
>> --2014-08-18 19:07:12--  http://master:35468/conf/storm.yaml
>> Resolving master (master)... 172.17.0.147
>> Connecting to master (master)|172.17.0.147|:35468... connected.
>> HTTP request sent, awaiting response... 404 Not Found
>> 2014-08-18 19:07:12 ERROR 404: Not Found.
>> 
>> root@master:/# cat /var/log/supervisor/mesos-master-stderr.log
>> ...
>> I0818 19:11:10.456274    19 master.cpp:2704] Executor wordcount-1-1408388814 
>> of framework 20140818-190538-2466255276-5050-11-0002 on slave 
>> 20140818-190538-2466255276-5050-11-0 at slave(1)@172.17.0.149:5051 (slave) 
>> has exited with status 8
>> I0818 19:11:10.457824    19 master.cpp:2628] Status update TASK_LOST (UUID: 
>> ddd2a5c6-39d6-4450-824b-2ddc5b39869b) for task slave-31000 of framework 
>> 20140818-190538-2466255276-5050-11-0002 from slave 
>> 20140818-190538-2466255276-5050-11-0 at slave(1)@172.17.0.149:5051 (slave)
>> I0818 19:11:10.457898    19 master.hpp:673] Removing task slave-31000 with 
>> resources cpus(*):1; mem(*):1000; ports(*):[31000-31000] on slave 
>> 20140818-190538-2466255276-5050
>> 
>> root@master:/# cat /var/log/supervisor/nimbus-stderr.log 
>> I0818 19:06:23.683955   190 sched.cpp:126] Version: 0.19.1
>> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@712: Client 
>> environment:zookeeper.version=zookeeper C client 3.4.5
>> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@716: Client 
>> environment:host.name=master
>> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@723: Client 
>> environment:os.name=Linux
>> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@724: Client 
>> environment:os.arch=3.15.3-tinycore64
>> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@725: Client 
>> environment:os.version=#1 SMP Fri Aug 15 09:11:44 UTC 2014
>> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@733: Client 
>> environment:user.name=(null)
>> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@741: Client 
>> environment:user.home=/root
>> 2014-08-18 19:06:23,685:26(0x7f3575014700):ZOO_INFO@log_env@753: Client 
>> environment:user.dir=/
>> 2014-08-18 19:06:23,685:26(0x7f3575014700):ZOO_INFO@zookeeper_init@786: 
>> Initiating client connection, host=zookeeper:2181 sessionTimeout=10000 
>> watcher=0x7f3576f9cf80 sessionId=0 sessionPasswd=<null> 
>> context=0x7f3554000e00 flags=0
>> 2014-08-18 19:06:23,712:26(0x7f3573010700):ZOO_INFO@check_events@1703: 
>> initiated connection to server [172.17.0.145:2181]
>> 2014-08-18 19:06:23,724:26(0x7f3573010700):ZOO_INFO@check_events@1750: 
>> session establishment complete on server [172.17.0.145:2181], 
>> sessionId=0x147ea82a658000c, negotiated timeout=10000
>> I0818 19:06:23.729141   242 group.cpp:310] Group process 
>> ((3)@172.17.0.147:49673) connected to ZooKeeper
>> I0818 19:06:23.729308   242 group.cpp:784] Syncing group operations: queue 
>> size (joins, cancels, datas) = (0, 0, 0)
>> I0818 19:06:23.729367   242 group.cpp:382] Trying to create path '/mesos' in 
>> ZooKeeper
>> I0818 19:06:23.745023   242 detector.cpp:135] Detected a new leader: (id='1')
>> I0818 19:06:23.745312   242 group.cpp:655] Trying to get 
>> '/mesos/info_0000000001' in ZooKeeper
>> I0818 19:06:23.752063   242 detector.cpp:377] A new leading master 
>> ([email protected]:5050) is detected
>> I0818 19:06:23.752250   242 sched.cpp:222] New master detected at 
>> [email protected]:5050
>> I0818 19:06:23.752893   242 sched.cpp:230] No credentials provided. 
>> Attempting to register without authentication
>> I0818 19:06:23.755734   242 sched.cpp:397] Framework registered with 
>> 20140818-190538-2466255276-5050-11-0002
>> W0818 19:06:54.991662   245 sched.cpp:901] Attempting to launch task 
>> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-18
>> 2014-08-18 19:09:10,656:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: 
>> Exceeded deadline by 28ms
>> W0818 19:10:58.976002   248 sched.cpp:901] Attempting to launch task 
>> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-57
>> 2014-08-18 19:11:40,927:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: 
>> Exceeded deadline by 107ms
>> 2014-08-18 19:12:07,700:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: 
>> Exceeded deadline by 72ms
>> 2014-08-18 19:15:54,659:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: 
>> Exceeded deadline by 20ms
>> W0818 19:16:41.581099   241 sched.cpp:901] Attempting to launch task 
>> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-259
>> W0818 19:19:52.968051   242 sched.cpp:901] Attempting to launch task 
>> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-367
>> 2014-08-18 19:20:14,970:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: 
>> Exceeded deadline by 24ms
>> 2014-08-18 19:20:31,676:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: 
>> Exceeded deadline by 13ms
>> 2014-08-18 19:20:48,375:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: 
>> Exceeded deadline by 12ms
>> W0818 19:22:33.935534   244 sched.cpp:901] Attempting to launch task 
>> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-395
>> 
>> 
>> (Y)
>> 
>> On Aug 18, 2014, at 7:46 PM, Michael Babineau <[email protected]> 
>> wrote:
>> 
>>> Including --hostname=<host> in your docker run command should help with the 
>>> resolution problem (so long as <host> is resolvable)
>>> 
>>> 
>>> On Mon, Aug 18, 2014 at 9:42 AM, Brenden Matthews 
>>> <[email protected]> wrote:
>>> Is the hostname set correctly on the machine running nimbus?  It looks like 
>>> that may not be correct.
>>> 
>>> 
>>> On Mon, Aug 18, 2014 at 9:39 AM, Yaron Rosenbaum 
>>> <[email protected]> wrote:
>>> @vinodkone
>>> 
>>> Finally found some relevant logs..
>>> Let's start with the slave:
>>> 
>>> slave_1     | I0818 16:18:51.700827     9 slave.cpp:1043] Launching task 
>>> 82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002
>>> slave_1     | I0818 16:18:51.703234     9 slave.cpp:1153] Queuing task 
>>> '82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework 
>>> '20140818-161802-2214597036-5050-10-0002
>>> slave_1     | I0818 16:18:51.703335     8 mesos_containerizer.cpp:537] 
>>> Starting container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor 
>>> 'wordcount-1-1408378726' of framework 
>>> '20140818-161802-2214597036-5050-10-0002'
>>> slave_1     | I0818 16:18:51.703366     9 slave.cpp:1043] Launching task 
>>> 82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002
>>> slave_1     | I0818 16:18:51.706400     9 slave.cpp:1153] Queuing task 
>>> '82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework 
>>> '20140818-161802-2214597036-5050-10-0002
>>> slave_1     | I0818 16:18:51.708044    13 launcher.cpp:117] Forked child 
>>> with pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
>>> slave_1     | I0818 16:18:51.717427    11 mesos_containerizer.cpp:647] 
>>> Fetching URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using 
>>> command '/usr/local/libexec/mesos/mesos-fetcher'
>>> slave_1     | I0818 16:19:01.109644    14 slave.cpp:2873] Current usage 
>>> 37.40%. Max allowed age: 3.681899907883981days
>>> slave_1     | I0818 16:19:09.766845    12 slave.cpp:2355] Monitoring 
>>> executor 'wordcount-1-1408378726' of framework 
>>> '20140818-161802-2214597036-5050-10-0002' in container 
>>> '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
>>> slave_1     | I0818 16:19:10.765058    14 mesos_containerizer.cpp:1112] 
>>> Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited
>>> slave_1     | I0818 16:19:10.765388    14 mesos_containerizer.cpp:996] 
>>> Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
>>> 
>>> So the executor gets started, and then exists.
>>> Found the stderr of the framework/run
>>> I0818 16:23:53.427016    50 fetcher.cpp:61] Extracted resource 
>>> '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz'
>>>  into 
>>> '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0'
>>> --2014-08-18 16:23:54--  http://7df8d3d507a1:41765/conf/storm.yaml
>>> Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not known.
>>> wget: unable to resolve host address '7df8d3d507a1'
>>> 
>>> So the problem is with host resolution. It's trying to resolve 7df8d3d507a1 
>>> and fails.
>>> Obviously this node is not in the /etc/hosts. Why would it be able to 
>>> resolve it?
>>> 
>>> (Y)
>>> 
>>> On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum <[email protected]> 
>>> wrote:
>>> 
>>>> Hi @vinodkone
>>>> 
>>>> nimbus log:
>>>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 
>>>> 2] not alive
>>>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 
>>>> 2] not alive
>>>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 
>>>> 3] not alive
>>>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 
>>>> 3] not alive
>>>> 
>>>> for all the executors.
>>>> On the mesos slave, there are no storm related logs.
>>>> Which leads me to believe that there's no supervisor to be found, 
>>>> even-though there's obviously an executor that's assigned to the job.
>>>> 
>>>> My understanding is that Mesos is responsible for spawning the supervisors 
>>>> (although that's not explicitly stated anywhere). The documentation is not 
>>>> very clear. But if I run the supervisors, then Mesos can't do the resource 
>>>> allocation as it's supposed to.
>>>> 
>>>> (Y)
>>>> 
>>>> On Aug 18, 2014, at 6:13 PM, Vinod Kone <[email protected]> wrote:
>>>> 
>>>>> Can you paste the slave/executor log related to the executor failure?
>>>>> 
>>>>> @vinodkone
>>>>> 
>>>>> On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>>> Hi
>>>>>> 
>>>>>> I have created a Docker based Mesos setup, including chronos, marathon, 
>>>>>> and storm.
>>>>>> Following advice I saw previously on this mailing list, I have run all 
>>>>>> frameworks directly on the Mesos master (is this correct? is it 
>>>>>> guaranteed to have only one master at any given time?)
>>>>>> 
>>>>>> Chronos and marathon work perfectly, but storm doesn't. UI works, but it 
>>>>>> seems like supervisors are not able to communicate with nimbus. I can 
>>>>>> deploy topologies, but the executors fail.
>>>>>> 
>>>>>> Here's the project on github:
>>>>>> https://github.com/yaronr/docker-mesos
>>>>>> 
>>>>>> I've spent over a week on this and I'm hitting a wall.
>>>>>> 
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> (Y)
>>>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: Mesos + storm on top of Docker

Reply via email to