Re: OOM at Bootstrap Time

Laing, Michael Sat, 25 Oct 2014 18:02:54 -0700

Since no one else has stepped in...

We have run clusters with ridiculously small nodes - I have a production
cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance
storage. It works fine but you can see those little puppies struggle...


And I ran into problems such as you observe...

Upgrading Java to the latest 1.7 and - most importantly - *reverting to the
default configuration, esp. for heap*, seemed to settle things down
completely. Also make sure that you are using the 'recommended production
settings' from the docs on your boxen.

However we are running 2.0.x not 2.1.0 so YMMV.

And we are switching to 15GB nodes w 2 heftier CPUs each and SSD storage -
still a 'small' machine, but much more reasonable for C*.

However I can't say I am an expert, since I deliberately keep things so
simple that we do not encounter problems - it just works so I dig into
other stuff.

ml


On Sat, Oct 25, 2014 at 5:22 PM, Maxime <maxim...@gmail.com> wrote:

> Hello, I've been trying to add a new node to my cluster ( 4 nodes ) for a
> few days now.
>
> I started by adding a node similar to my current configuration, 4 GB or
> RAM + 2 Cores on DigitalOcean. However every time, I would end up getting
> OOM errors after many log entries of the type:
>
> INFO  [SlabPoolCleaner] 2014-10-25 13:44:57,240 ColumnFamilyStore.java:856
> - Enqueuing flush of mycf: 5383 (0%) on-heap, 0 (0%) off-heap
>
> leading to:
>
> ka-120-Data.db (39291 bytes) for commitlog position
> ReplayPosition(segmentId=1414243978538, position=23699418)
> WARN  [SharedPool-Worker-13] 2014-10-25 13:48:18,032
> AbstractTracingAwareExecutorService.java:167 - Uncaught exception on thread
> Thread[SharedPool-Worker-13,5,main]: {}
> java.lang.OutOfMemoryError: Java heap space
>
> Thinking it had to do with either compaction somehow or streaming, 2
> activities I've had tremendous issues with in the past; I tried to slow
> down the setstreamthroughput to extremely low values all the way to 5. I
> also tried setting setcompactionthoughput to 0, and then reading that in
> some cases it might be too fast, down to 8. Nothing worked, it merely
> vaguely changed the mean time to OOM but not in a way indicating either was
> anywhere a solution.
>
> The nodes were configured with 2 GB of Heap initially, I tried to crank it
> up to 3 GB, stressing the host memory to its limit.
>
> After doing some exploration (I am considering writing a Cassandra Ops
> documentation with lessons learned since there seems to be little of it in
> organized fashions), I read that some people had strange issues on
> lower-end boxes like that, so I bit the bullet and upgraded my new node to
> a 8GB + 4 Core instance, which was anecdotally better.
>
> To my complete shock, exact same issues are present, even raising the Heap
> memory to 6 GB. I figure it can't be a "normal" situation anymore, but must
> be a bug somehow.
>
> My cluster is 4 nodes, RF of 2, about 160 GB of data across all nodes.
> About 10 CF of varying sizes. Runtime writes are between 300 to 900 /
> second. Cassandra 2.1.0, nothing too wild.
>
> Has anyone encountered these kinds of issues before? I would really enjoy
> hearing about the experiences of people trying to run small-sized clusters
> like mine. From everything I read, Cassandra operations go very well on
> large (16 GB + 8 Cores) machines, but I'm sad to report I've had nothing
> but trouble trying to run on smaller machines, perhaps I can learn from
> other's experience?
>
> Full logs can be provided to anyone interested.
>
> Cheers
>

Re: OOM at Bootstrap Time

Reply via email to