It looks like the executor exited while holding a RUNNING task. Do you have
the executor logs handy? You can find them in the mesos webui or in the
sandbox location of this executor.

On Wed, Sep 17, 2014 at 2:32 PM, Luyi Wang <[email protected]> wrote:

> Here is the slave log.
>
> I0917 00:16:45.759867 10209 slave.cpp:3057] Current usage 43.77%. Max
> allowed age: 3.235877672809236days
> I0917 00:17:31.145267 10212 slave.cpp:1011] Got assigned task
> dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 for framework
> 20140915-230424-326871232-5050-13574-0000
> I0917 00:17:31.145606 10212 gc.cpp:84] Unscheduling
> '/tmp/mesos/slaves/20140915-185627-326871232-5050-8074-2/frameworks/20140915-230424-326871232-5050-13574-0000'
> from gc
> I0917 00:17:31.145678 10212 slave.cpp:1121] Launching task
> dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 for framework
> 20140915-230424-326871232-5050-13574-0000
> I0917 00:17:31.146811 10212 slave.cpp:1231] Queuing task
> 'dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000' for executor
> production-topology-1-1410913050 of framework
> '20140915-230424-326871232-5050-13574-0000
> I0917 00:17:31.146898 10212 containerizer.cpp:394] Starting container
> 'dfbf47af-c8b7-481f-b8fc-6a33a8f115d9' for executor
> 'production-topology-1-1410913050' of framework
> '20140915-230424-326871232-5050-13574-0000'
> I0917 00:17:31.147886 10212 launcher.cpp:137] Forked child with pid
> '20604' for container 'dfbf47af-c8b7-481f-b8fc-6a33a8f115d9'
> I0917 00:17:31.148232 10212 containerizer.cpp:510] Fetching URIs for
> container 'dfbf47af-c8b7-481f-b8fc-6a33a8f115d9' using command
> '/home/ubuntu/mesos/build/src/mesos-fetcher'
> I0917 00:17:45.761070 10210 slave.cpp:3057] Current usage 43.77%. Max
> allowed age: 3.235862073509317days
> I0917 00:17:48.932791 10214 slave.cpp:2542] Monitoring executor
> 'production-topology-1-1410913050' of framework
> '20140915-230424-326871232-5050-13574-0000' in container
> 'dfbf47af-c8b7-481f-b8fc-6a33a8f115d9'
> I0917 00:17:58.360540 10215 slave.cpp:1741] Got registration for executor
> 'production-topology-1-1410913050' of framework
> 20140915-230424-326871232-5050-13574-0000
> I0917 00:17:58.360841 10215 slave.cpp:1859] Flushing queued task
> dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 for executor
> 'production-topology-1-1410913050' of framework
> 20140915-230424-326871232-5050-13574-0000
> I0917 00:17:58.461596 10215 slave.cpp:2093] Handling status update
> TASK_RUNNING (UUID: d05dbbc9-a4a6-4f83-b5a5-4d60079a26c0) for task
> dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 of framework
> 20140915-230424-326871232-5050-13574-0000 from executor(1)@
> 192.168.123.29:41892
> I0917 00:17:58.461767 10215 status_update_manager.cpp:320] Received status
> update TASK_RUNNING (UUID: d05dbbc9-a4a6-4f83-b5a5-4d60079a26c0) for task
> dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 of framework
> 20140915-230424-326871232-5050-13574-0000
> I0917 00:17:58.461879 10215 status_update_manager.cpp:373] Forwarding
> status update TASK_RUNNING (UUID: d05dbbc9-a4a6-4f83-b5a5-4d60079a26c0) for
> task dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 of framework
> 20140915-230424-326871232-5050-13574-0000 to [email protected]:5050
> I0917 00:17:58.462002 10215 slave.cpp:2256] Sending acknowledgement for
> status update TASK_RUNNING (UUID: d05dbbc9-a4a6-4f83-b5a5-4d60079a26c0) for
> task dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 of framework
> 20140915-230424-326871232-5050-13574-0000 to executor(1)@
> 192.168.123.29:41892
> I0917 00:17:58.470211 10213 status_update_manager.cpp:398] Received status
> update acknowledgement (UUID: d05dbbc9-a4a6-4f83-b5a5-4d60079a26c0) for
> task dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 of framework
> 20140915-230424-326871232-5050-13574-0000
> I0917 00:18:11.776726 10209 http.cpp:330] HTTP request for
> '/slave(1)/state.json'
> I0917 00:18:45.616175 10211 http.cpp:330] HTTP request for
> '/slave(1)/state.json'
> I0917 00:18:45.762120 10213 slave.cpp:3057] Current usage 43.92%. Max
> allowed age: 3.225739484327836days
> I0917 00:19:43.761559 10210 http.cpp:330] HTTP request for
> '/slave(1)/state.json'
> I0917 00:19:45.762634 10208 slave.cpp:3057] Current usage 43.92%. Max
> allowed age: 3.225736771406111days
> I0917 00:19:59.021102 10208 containerizer.cpp:997] Executor for container
> 'dfbf47af-c8b7-481f-b8fc-6a33a8f115d9' has exited
> I0917 00:19:59.021178 10208 containerizer.cpp:882] Destroying container
> 'dfbf47af-c8b7-481f-b8fc-6a33a8f115d9'
> I0917 00:19:59.026793 10208 slave.cpp:2600] Executor
> 'production-topology-1-1410913050' of framework
> 20140915-230424-326871232-5050-13574-0000 exited with status 0
> I0917 00:19:59.027725 10208 slave.cpp:2093] Handling status update
> TASK_LOST (UUID: 8e3543de-161d-4a5a-bcd7-b7be6f053b24) for task
> dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 of framework
> 20140915-230424-326871232-5050-13574-0000 from @0.0.0.0:0
> W0917 00:19:59.027933 10213 containerizer.cpp:788] Ignoring update for
> unknown container: dfbf47af-c8b7-481f-b8fc-6a33a8f115d9
> I0917 00:19:59.028105 10213 status_update_manager.cpp:320] Received status
> update TASK_LOST (UUID: 8e3543de-161d-4a5a-bcd7-b7be6f053b24) for task
> dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 of framework
> 20140915-230424-326871232-5050-13574-0000
> I0917 00:19:59.028143 10213 status_update_manager.cpp:373] Forwarding
> status update TASK_LOST (UUID: 8e3543de-161d-4a5a-bcd7-b7be6f053b24) for
> task dev10-cdh5-03.int.dev10.smcl.pure-breeze.com-31000 of framework
> 20140915-230424-326871232-5050-13574-0000 to [email protected]:5050
> I0917 00:19:59.033900 10213 slave.cpp:2736] Cleaning up executor
> 'production-topology-1-1410913050' of framework
> 20140915-230424-326871232-5050-13574-0000
> I0917 00:19:59.034037 10213 slave.cpp:2811] Cleaning up framework
> 20140915-230424-326871232-5050-13574-0000
> I0917 00:19:59.034162 10211 status_update_manager.cpp:282] Closing status
> update streams for framework 20140915-230424-326871232-5050-13574-0000
> I0917 00:19:59.034098 10213 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/20140915-185627-326871232-5050-8074-2/frameworks/20140915-230424-326871232-5050-13574-0000/executors/production-topology-1-1410913050/runs/dfbf47af-c8b7-481f-b8fc-6a33a8f115d9'
> for gc 6.99999960653926days in the future
> I0917 00:19:59.034209 10213 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/20140915-185627-326871232-5050-8074-2/frameworks/20140915-230424-326871232-5050-13574-0000/executors/production-topology-1-1410913050'
> for gc 6.99999960616889days in the future
> I0917 00:19:59.034255 10213 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/20140915-185627-326871232-5050-8074-2/frameworks/20140915-230424-326871232-5050-13574-0000'
> for gc 6.99999960553185days in the future
>
>
>
>
>
>
> -Luyi.
>
>
>
>
> On Wed, Sep 17, 2014 at 12:56 PM, Benjamin Mahler <
> [email protected]> wrote:
>
>> Can you show us the the slave log and more of the master log?
>>
>> There should be a TASK_LOST somewhere within them.
>>
>> On Wed, Sep 17, 2014 at 10:43 AM, Luyi Wang <[email protected]>
>> wrote:
>>
>>> Have anyone experience TASK_LOST status for storm tasks on mesos.
>>>
>>>
>>> I checked the stderr. Everything seems normal.
>>> WARNING: Logging before InitGoogleLogging() is written to STDERR
>>> I0917 00:21:36.164840  4831 fetcher.cpp:76] Fetching URI 'hdfs://
>>> 192.168.123.27/storm-mesos-0.9.2-incubating.tgz'
>>> I0917 00:21:36.165225  4831 fetcher.cpp:105] Downloading resource from
>>> 'hdfs://192.168.123.27/storm-mesos-0.9.2-incubating.tgz' to
>>> '/tmp/mesos/slaves/20140915-185627-326871232-5050-8074-3/frameworks/20140915-230424-326871232-5050-13574-0000/executors/production-topology-1-1410913050/runs/53d06991-cb84-49d3-a530-f83efcf339e9/storm-mesos-0.9.2-incubating.tgz'
>>> I0917 00:21:52.202791  4831 fetcher.cpp:64] Extracted resource
>>> '/tmp/mesos/slaves/20140915-185627-326871232-5050-8074-3/frameworks/20140915-230424-326871232-5050-13574-0000/executors/production-topology-1-1410913050/runs/53d06991-cb84-49d3-a530-f83efcf339e9/storm-mesos-0.9.2-incubating.tgz'
>>> into
>>> '/tmp/mesos/slaves/20140915-185627-326871232-5050-8074-3/frameworks/20140915-230424-326871232-5050-13574-0000/executors/production-topology-1-1410913050/runs/53d06991-cb84-49d3-a530-f83efcf339e9'
>>> I0917 00:21:52.206581  4831 fetcher.cpp:76] Fetching URI '
>>> http://mesos:45579/conf/storm.yaml'
>>> I0917 00:21:52.206626  4831 fetcher.cpp:126] Downloading '
>>> http://mesos:45579/conf/storm.yaml' to
>>> '/tmp/mesos/slaves/20140915-185627-326871232-5050-8074-3/frameworks/20140915-230424-326871232-5050-13574-0000/executors/production-topology-1-/runs/53d06991-cb84-49d3-a530-f83efcf339e9/storm.yaml'
>>> I0917 00:22:01.829298  4984 exec.cpp:132] Version: 0.21.0
>>> I0917 00:22:01.832185  5006 exec.cpp:206] Executor registered on slave
>>> 20140915-185627-326871232-5050-8074-3
>>>
>>>
>>>
>>>
>>> I also checked the mesos info log. Here is what it logged.
>>>
>>> I0917 00:19:59.020755  5145 master.cpp:3261] Executor
>>> production-topology-1-1410913050 of framework
>>> 20140915-230424-326871232-5050-13574-0000 on slave
>>> 20140915-185627-326871232-5050-8074-2 at slave(1)@192.168.123.29:5051 (
>>> dev10-cdh5-03.int.dev10.smcl.pure-breeze.com) exited with status 0
>>>
>>>
>>> Any idea on this?
>>>
>>>
>>>
>>>
>>> -Luyi.
>>>
>>>
>>>
>>>
>>
>

Reply via email to