@vinodkone

Finally found some relevant logs..
Let's start with the slave:

slave_1     | I0818 16:18:51.700827     9 slave.cpp:1043] Launching task 
82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002
slave_1     | I0818 16:18:51.703234     9 slave.cpp:1153] Queuing task 
'82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework 
'20140818-161802-2214597036-5050-10-0002
slave_1     | I0818 16:18:51.703335     8 mesos_containerizer.cpp:537] Starting 
container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor 
'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002'
slave_1     | I0818 16:18:51.703366     9 slave.cpp:1043] Launching task 
82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002
slave_1     | I0818 16:18:51.706400     9 slave.cpp:1153] Queuing task 
'82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework 
'20140818-161802-2214597036-5050-10-0002
slave_1     | I0818 16:18:51.708044    13 launcher.cpp:117] Forked child with 
pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
slave_1     | I0818 16:18:51.717427    11 mesos_containerizer.cpp:647] Fetching 
URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using command 
'/usr/local/libexec/mesos/mesos-fetcher'
slave_1     | I0818 16:19:01.109644    14 slave.cpp:2873] Current usage 37.40%. 
Max allowed age: 3.681899907883981days
slave_1     | I0818 16:19:09.766845    12 slave.cpp:2355] Monitoring executor 
'wordcount-1-1408378726' of framework '20140818-161802-2214597036-5050-10-0002' 
in container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
slave_1     | I0818 16:19:10.765058    14 mesos_containerizer.cpp:1112] 
Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited
slave_1     | I0818 16:19:10.765388    14 mesos_containerizer.cpp:996] 
Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'

So the executor gets started, and then exists.
Found the stderr of the framework/run
I0818 16:23:53.427016    50 fetcher.cpp:61] Extracted resource 
'/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz'
 into 
'/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0'
--2014-08-18 16:23:54--  http://7df8d3d507a1:41765/conf/storm.yaml
Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not known.
wget: unable to resolve host address '7df8d3d507a1'

So the problem is with host resolution. It's trying to resolve 7df8d3d507a1 and 
fails.
Obviously this node is not in the /etc/hosts. Why would it be able to resolve 
it?

(Y)

On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum <[email protected]> wrote:

> Hi @vinodkone
> 
> nimbus log:
> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] 
> not alive
> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[2 2] 
> not alive
> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] 
> not alive
> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor wordcount-1-1408376868:[3 3] 
> not alive
> 
> for all the executors.
> On the mesos slave, there are no storm related logs.
> Which leads me to believe that there's no supervisor to be found, even-though 
> there's obviously an executor that's assigned to the job.
> 
> My understanding is that Mesos is responsible for spawning the supervisors 
> (although that's not explicitly stated anywhere). The documentation is not 
> very clear. But if I run the supervisors, then Mesos can't do the resource 
> allocation as it's supposed to.
> 
> (Y)
> 
> On Aug 18, 2014, at 6:13 PM, Vinod Kone <[email protected]> wrote:
> 
>> Can you paste the slave/executor log related to the executor failure?
>> 
>> @vinodkone
>> 
>> On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum <[email protected]> 
>> wrote:
>> 
>>> Hi
>>> 
>>> I have created a Docker based Mesos setup, including chronos, marathon, and 
>>> storm.
>>> Following advice I saw previously on this mailing list, I have run all 
>>> frameworks directly on the Mesos master (is this correct? is it guaranteed 
>>> to have only one master at any given time?)
>>> 
>>> Chronos and marathon work perfectly, but storm doesn't. UI works, but it 
>>> seems like supervisors are not able to communicate with nimbus. I can 
>>> deploy topologies, but the executors fail.
>>> 
>>> Here's the project on github:
>>> https://github.com/yaronr/docker-mesos
>>> 
>>> I've spent over a week on this and I'm hitting a wall.
>>> 
>>> 
>>> Thanks!
>>> 
>>> (Y)
>>> 
> 

Reply via email to