On it, it's in a weird PoC lab thing and I have to do some gyrations to get
logs off, it will be soon.

On Wed, Oct 21, 2015 at 2:46 PM, Vinod Kone <[email protected]> wrote:

> Logs please.
>
> On Wed, Oct 21, 2015 at 12:44 PM, John Omernik <[email protected]> wrote:
>
>> I am running 0.24.
>>
>> I am running some tasks in marathon, and when they hit an OOM condition a
>> task is killed that is expected. Than I get a bunch of errors related to
>> "Failed to read "meory.limit_in_bytes', 'memory.max_usage_in_bytes' and
>> memory.stat.
>>
>> In addition the task tries to restart but keeps failing.
>>
>> A few notes, when the tasks fails, the sandbox becomes unavailable making
>> troubleshooting difficult. When this has occurred before, it seemed the
>> only way to get things working was to stop the slave, clear out the tmp
>> directory, and start it again. I'd like to understand why my task won't get
>> moving again.
>>
>> There are also lots of errors related to "failed to clean up isolator"
>> and invalid cgroups, I can get specific logs if people think it's needed.
>> I am thinking it's related to checkpointing or something like that? I.e. an
>> executor hit the OOM got killed, and it is trying to start back up, but
>> something isn't right?
>>
>> I know this is a jumped unorganized question, I can logs if needed.
>>
>>
>>
>

Reply via email to