Logs please. On Wed, Oct 21, 2015 at 12:44 PM, John Omernik <[email protected]> wrote:
> I am running 0.24. > > I am running some tasks in marathon, and when they hit an OOM condition a > task is killed that is expected. Than I get a bunch of errors related to > "Failed to read "meory.limit_in_bytes', 'memory.max_usage_in_bytes' and > memory.stat. > > In addition the task tries to restart but keeps failing. > > A few notes, when the tasks fails, the sandbox becomes unavailable making > troubleshooting difficult. When this has occurred before, it seemed the > only way to get things working was to stop the slave, clear out the tmp > directory, and start it again. I'd like to understand why my task won't get > moving again. > > There are also lots of errors related to "failed to clean up isolator" and > invalid cgroups, I can get specific logs if people think it's needed. I am > thinking it's related to checkpointing or something like that? I.e. an > executor hit the OOM got killed, and it is trying to start back up, but > something isn't right? > > I know this is a jumped unorganized question, I can logs if needed. > > >

