On it, it's in a weird PoC lab thing and I have to do some gyrations to get logs off, it will be soon.
On Wed, Oct 21, 2015 at 2:46 PM, Vinod Kone <[email protected]> wrote: > Logs please. > > On Wed, Oct 21, 2015 at 12:44 PM, John Omernik <[email protected]> wrote: > >> I am running 0.24. >> >> I am running some tasks in marathon, and when they hit an OOM condition a >> task is killed that is expected. Than I get a bunch of errors related to >> "Failed to read "meory.limit_in_bytes', 'memory.max_usage_in_bytes' and >> memory.stat. >> >> In addition the task tries to restart but keeps failing. >> >> A few notes, when the tasks fails, the sandbox becomes unavailable making >> troubleshooting difficult. When this has occurred before, it seemed the >> only way to get things working was to stop the slave, clear out the tmp >> directory, and start it again. I'd like to understand why my task won't get >> moving again. >> >> There are also lots of errors related to "failed to clean up isolator" >> and invalid cgroups, I can get specific logs if people think it's needed. >> I am thinking it's related to checkpointing or something like that? I.e. an >> executor hit the OOM got killed, and it is trying to start back up, but >> something isn't right? >> >> I know this is a jumped unorganized question, I can logs if needed. >> >> >> >

