Re: Custom Scheduler: Diagnosing cause of container task failures

Alex Rukletsov Tue, 25 Aug 2015 22:38:44 -0700

It looks like we can have a better error message here.

@Jay, mind filing a JIRA ticket for with description, status update, and
your fix attached? Thanks!


On Fri, Aug 21, 2015 at 7:36 PM, Jay Taylor <j...@jaytaylor.com> wrote:

> Eventually I was able to isolate what was going on; in this case the
> FrameworkInfo.User was set to an invalid value and setting it to "root" did
> the trick.
>
> My scheduler is now working [in a basic form]!!!
>
> Cheers,
> Jay
>
> On Thu, Aug 20, 2015 at 4:15 PM, Jay Taylor <j...@jaytaylor.com> wrote:
>
>> Hey Tim,
>>
>> Thank you for the quick response!
>>
>> Just checked the sandbox logs and they are all empty (stdout and stderr
>> are both 0 bytes).
>>
>> I have discovered a little bit more information from the StatusUpdate
>> event posted back to my scheduler:
>>
>>     &TaskStatus{
>>         TaskId: &TaskID{
>>             Value:*fluxCapacitor-test-1,XXX_unrecognized:[],
>>         },
>>         State: *TASK_FAILED,
>>         Message: *Abnormal executor termination,
>>         Source: *SOURCE_SLAVE,
>>         Reason: *REASON_COMMAND_EXECUTOR_FAILED,
>>         Data:nil,
>>         SlaveId: &SlaveID{
>>             Value: *20150804-211459-1407297728-5050-5855-S1,
>>             XXX_unrecognized: [],
>>         },
>>         ExecutorId: nil,
>>         Timestamp: *1.440112075509318e+09,
>>         Uuid: *[102 75 82 85 38 139 68 94 153 189 210 87 218 235 147 166],
>>         Healthy: nil,
>>         XXX_unrecognized: [],
>>     }
>>
>> How can I find out what why the command executor is failing?
>>
>>
>> On Thu, Aug 20, 2015 at 4:08 PM, Tim Chen <t...@mesosphere.io> wrote:
>>
>>> It received a TASK_FAILED from the executor, so you'll need to look at
>>> the sandbox logs of your task stdout and stderr files to see what went
>>> wrong.
>>>
>>> These files should be reachable by the Mesos UI.
>>>
>>> Tim
>>>
>>> On Thu, Aug 20, 2015 at 4:01 PM, Jay Taylor <outtat...@gmail.com> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> I am writing a scheduler for Mesos and on of my first goals is to get
>>>> simple a docker container to run.
>>>>
>>>> The tasks get marked as failed with the failure messages originating
>>>> from the slave logs.  Now I'm not sure how to determine exactly what is
>>>> causing the failure.
>>>>
>>>> The most informative log messages I've found were in the slave log:
>>>>
>>>>     ==> /var/log/mesos/mesos-slave.INFO <==
>>>> W0820 20:44:25.242230 29639 docker.cpp:994] Ignoring updating unknown
>>>> container: e190037a-b011-4681-9e10-dcbacf6cb819
>>>>     I0820 20:44:25.242270 29639 status_update_manager.cpp:322] Received
>>>> status update TASK_FAILED (UUID: 17a21cf7-17d1-42dd-92eb-b281396ebf60) for
>>>> task jay-test-29 of framework 20150804-211741-1608624320-5050-18273-0060
>>>>     I0820 20:44:25.242377 29639 slave.cpp:2961] Forwarding the update
>>>> TASK_FAILED (UUID: 17a21cf7-17d1-42dd-92eb-b281396ebf60) for task
>>>> jay-test-29 of framework 20150804-211741-1608624320-5050-18273-0060 to
>>>> master@63.198.215.105:5050
>>>>     I0820 20:44:25.247926 29636 status_update_manager.cpp:394] Received
>>>> status update acknowledgement (UUID: 17a21cf7-17d1-42dd-92eb-b281396ebf60)
>>>> for task jay-test-29 of framework 
>>>> 20150804-211741-1608624320-5050-18273-0060
>>>>     I0820 20:44:25.248108 29636 slave.cpp:3502] Cleaning up executor
>>>> 'jay-test-29' of framework 20150804-211741-1608624320-5050-18273-0060
>>>>     I0820 20:44:25.248342 29636 slave.cpp:3591] Cleaning up framework
>>>> 20150804-211741-1608624320-5050-18273-0060
>>>>
>>>> And this doesn't really tell me much about *why* it's failed.
>>>>
>>>> Is there somewhere else I should be looking or an option that needs to
>>>> be turned on to show more information?
>>>>
>>>> Your assistance is greatly appreciated!
>>>>
>>>> Jay
>>>>
>>>
>>>
>>
>

Re: Custom Scheduler: Diagnosing cause of container task failures

Reply via email to