Jayesh Patel wrote:
Josh, The OOM tserver process was killed by the kernel, it didn't hang around. I tried restarting it manually, but it ran out of memory right away and was killed again leaving the tablet offline. It must have a huge "recovery" log to go through. HDFS /accumulo/wal/instance-accumulo+9997/24e08581-a081-4b41-afc5-d75bdda6cf15 is about 42MB, and machine has about 300MB free and apparently not enough for tserver.
Ok, cool. If you're that constrained on resources, you can also try reducing the property tserver.sort.buffer.size in accumulo-site.xml. It defaults to 200M, you could try 25M or 50M instead.
This is a buffer size that is used for sorting log edits during the recovery process. This might help if you never make it through the recovery process.
300MB is a little low in general as far as headroom goes (especially when you're already not giving Accumulo enough RAM). Typically, you want to ensure that you give the operating system at least 1G of memory for itself.
