I am running 0.24. I am running some tasks in marathon, and when they hit an OOM condition a task is killed that is expected. Than I get a bunch of errors related to "Failed to read "meory.limit_in_bytes', 'memory.max_usage_in_bytes' and memory.stat.
In addition the task tries to restart but keeps failing. A few notes, when the tasks fails, the sandbox becomes unavailable making troubleshooting difficult. When this has occurred before, it seemed the only way to get things working was to stop the slave, clear out the tmp directory, and start it again. I'd like to understand why my task won't get moving again. There are also lots of errors related to "failed to clean up isolator" and invalid cgroups, I can get specific logs if people think it's needed. I am thinking it's related to checkpointing or something like that? I.e. an executor hit the OOM got killed, and it is trying to start back up, but something isn't right? I know this is a jumped unorganized question, I can logs if needed.

