Re: Cassandra OOM on joining existing ring

Kunal Gangakhedkar Fri, 10 Jul 2015 11:35:28 -0700

>From jhat output, top 10 entries for "Instance Count for All Classes
(excluding platform)" shows:


2088223 instances of class org.apache.cassandra.db.BufferCell
1983245 instances of class
org.apache.cassandra.db.composites.CompoundSparseCellName
1885974 instances of class
org.apache.cassandra.db.composites.CompoundDenseCellName
630000 instances of class
org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
503687 instances of class org.apache.cassandra.db.BufferDeletedCell
378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier
101800 instances of class org.apache.cassandra.utils.concurrent.Ref
101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State
90704 instances of class
org.apache.cassandra.utils.concurrent.Ref$GlobalState
71123 instances of class org.apache.cassandra.db.BufferDecoratedKey

At the bottom of the page, it shows:
Total of 8739510 instances occupying 193607512 bytes.
JFYI.

Kunal

On 10 July 2015 at 23:49, Kunal Gangakhedkar <kgangakhed...@gmail.com>
wrote:

> Thanks for quick reply.
>
> 1. I don't know what are the thresholds that I should look for. So, to
> save this back-and-forth, I'm attaching the cfstats output for the keyspace.
>
> There is one table - daily_challenges - which shows compacted partition
> max bytes as ~460M and another one - daily_guest_logins - which shows
> compacted partition max bytes as ~36M.
>
> Can that be a problem?
> Here is the CQL schema for the daily_challenges column family:
>
> CREATE TABLE app_10001.daily_challenges (
>     segment_type text,
>     date timestamp,
>     user_id int,
>     sess_id text,
>     data text,
>     deleted boolean,
>     PRIMARY KEY (segment_type, date, user_id, sess_id)
> ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';
>
> CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted);
>
>
> 2. I don't know - how do I check? As I mentioned, I just installed the
> dsc21 update from datastax's debian repo (ver 2.1.7).
>
> Really appreciate your help.
>
> Thanks,
> Kunal
>
> On 10 July 2015 at 23:33, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> 1. You want to look at # of sstables in cfhistograms or in cfstats look
>> at:
>> Compacted partition maximum bytes
>> Maximum live cells per slice
>>
>> 2) No, here's the env.sh from 3.0 which should work with some tweaks:
>>
>> https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh
>>
>> You'll at least have to modify the jamm version to what's in yours. I
>> think it's 2.5
>>
>>
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax>
>>
>> <http://cassandrasummit-datastax.com/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar <
>> kgangakhed...@gmail.com> wrote:
>>
>>> Thanks, Sebastian.
>>>
>>> Couple of questions (I'm really new to cassandra):
>>> 1. How do I interpret the output of 'nodetool cfstats' to figure out the
>>> issues? Any documentation pointer on that would be helpful.
>>>
>>> 2. I'm primarily a python/c developer - so, totally clueless about JVM
>>> environment. So, please bare with me as I would need a lot of hand-holding.
>>> Should I just copy+paste the settings you gave and try to restart the
>>> failing cassandra server?
>>>
>>> Thanks,
>>> Kunal
>>>
>>> On 10 July 2015 at 22:35, Sebastian Estevez <
>>> sebastian.este...@datastax.com> wrote:
>>>
>>>> #1 You need more information.
>>>>
>>>> a) Take a look at your .hprof file (memory heap from the OOM) with an
>>>> introspection tool like jhat or visualvm or java flight recorder and see
>>>> what is using up your RAM.
>>>>
>>>> b) How big are your large rows (use nodetool cfstats on each node). If
>>>> your data model is bad, you are going to have to re-design it no matter
>>>> what.
>>>>
>>>> #2 As a possible workaround try using the G1GC allocator with the
>>>> settings from c* 3.0 instead of CMS. I've seen lots of success with it
>>>> lately (tl;dr G1GC is much simpler than CMS and almost as good as a finely
>>>> tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do *not*
>>>> set the newgen size for G1 sets it dynamically:
>>>>
>>>> # min and max heap sizes should be set to the same value to avoid
>>>>> # stop-the-world GC pauses during resize, and so that we can lock the
>>>>> # heap in memory on startup to prevent any of it from being swapped
>>>>> # out.
>>>>> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
>>>>> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
>>>>>
>>>>> # Per-thread stack size.
>>>>> JVM_OPTS="$JVM_OPTS -Xss256k"
>>>>>
>>>>> # Use the Hotspot garbage-first collector.
>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
>>>>>
>>>>> # Have the JVM do less remembered set work during STW, instead
>>>>> # preferring concurrent GC. Reduces p99.9 latency.
>>>>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
>>>>>
>>>>> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.
>>>>> # Machines with > 10 cores may need additional threads.
>>>>> # Increase to <= full cores (do not count HT cores).
>>>>> #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16"
>>>>> #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16"
>>>>>
>>>>> # Main G1GC tunable: lowering the pause target will lower throughput
>>>>> and vise versa.
>>>>> # 200ms is the JVM default and lowest viable setting
>>>>> # 1000ms increases throughput. Keep it smaller than the timeouts in
>>>>> cassandra.yaml.
>>>>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
>>>>> # Do reference processing in parallel GC.
>>>>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
>>>>>
>>>>> # This may help eliminate STW.
>>>>> # The default in Hotspot 8u40 is 40%.
>>>>> #JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>>>>>
>>>>> # For workloads that do large allocations, increasing the region
>>>>> # size may make things more efficient. Otherwise, let the JVM
>>>>> # set this automatically.
>>>>> #JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m"
>>>>>
>>>>> # Make sure all memory is faulted and zeroed on startup.
>>>>> # This helps prevent soft faults in containers and makes
>>>>> # transparent hugepage allocation more effective.
>>>>> JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
>>>>>
>>>>> # Biased locking does not benefit Cassandra.
>>>>> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
>>>>>
>>>>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
>>>>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003"
>>>>>
>>>>> # Enable thread-local allocation blocks and allow the JVM to
>>>>> automatically
>>>>> # resize them at runtime.
>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB"
>>>>>
>>>>> # http://www.evanjones.ca/jvm-mmap-pause.html
>>>>> JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"
>>>>
>>>>
>>>> All the best,
>>>>
>>>>
>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>
>>>> Sebastián Estévez
>>>>
>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>>
>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>> <https://twitter.com/datastax> [image: g+.png]
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax>
>>>>
>>>> <http://cassandrasummit-datastax.com/>
>>>>
>>>> DataStax is the fastest, most scalable distributed database
>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>> DataStax
>>>> is the database technology and transactional backbone of choice for the
>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>
>>>> On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar <
>>>> kgangakhed...@gmail.com> wrote:
>>>>
>>>>> I upgraded my instance from 8GB to a 14GB one.
>>>>> Allocated 8GB to jvm heap in cassandra-env.sh.
>>>>>
>>>>> And now, it crashes even faster with an OOM..
>>>>>
>>>>> Earlier, with 4GB heap, I could go upto ~90% replication completion
>>>>> (as reported by nodetool netstats); now, with 8GB heap, I cannot even get
>>>>> there. I've already restarted cassandra service 4 times with 8GB heap.
>>>>>
>>>>> No clue what's going on.. :(
>>>>>
>>>>> Kunal
>>>>>
>>>>> On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> You, and only you, are responsible for knowing your data and data
>>>>>> model.
>>>>>>
>>>>>> If columns per row or rows per partition can be large, then an 8GB
>>>>>> system is probably too small. But the real issue is that you need to keep
>>>>>> your partition size from getting too large.
>>>>>>
>>>>>> Generally, an 8GB system is okay, but only for reasonably-sized
>>>>>> partitions, like under 10MB.
>>>>>>
>>>>>>
>>>>>> -- Jack Krupansky
>>>>>>
>>>>>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar <
>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>
>>>>>>> I'm new to cassandra
>>>>>>> How do I find those out? - mainly, the partition params that you
>>>>>>> asked for. Others, I think I can figure out.
>>>>>>>
>>>>>>> We don't have any large objects/blobs in the column values - it's
>>>>>>> all textual, date-time, numeric and uuid data.
>>>>>>>
>>>>>>> We use cassandra to primarily store segmentation data - with segment
>>>>>>> type as partition key. That is again divided into two separate column
>>>>>>> families; but they have similar structure.
>>>>>>>
>>>>>>> Columns per row can be fairly large - each segment type as the row
>>>>>>> key and associated user ids and timestamp as column value.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kunal
>>>>>>>
>>>>>>> On 10 July 2015 at 16:36, Jack Krupansky <jack.krupan...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> What does your data and data model look like - partition size, rows
>>>>>>>> per partition, number of columns per row, any large values/blobs in 
>>>>>>>> column
>>>>>>>> values?
>>>>>>>>
>>>>>>>> You could run fine on an 8GB system, but only if your rows and
>>>>>>>> partitions are reasonably small. Any large partitions could blow you 
>>>>>>>> away.
>>>>>>>>
>>>>>>>> -- Jack Krupansky
>>>>>>>>
>>>>>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar <
>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Attaching the stack dump captured from the last OOM.
>>>>>>>>>
>>>>>>>>> Kunal
>>>>>>>>>
>>>>>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar <
>>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Forgot to mention: the data size is not that big - it's barely
>>>>>>>>>> 10GB in all.
>>>>>>>>>>
>>>>>>>>>> Kunal
>>>>>>>>>>
>>>>>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar <
>>>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have a 2 node setup on Azure (east us region) running Ubuntu
>>>>>>>>>>> server 14.04LTS.
>>>>>>>>>>> Both nodes have 8GB RAM.
>>>>>>>>>>>
>>>>>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying to
>>>>>>>>>>> add a replacement node with same configuration.
>>>>>>>>>>>
>>>>>>>>>>> The problem is this new node also keeps dying with OOM - I've
>>>>>>>>>>> restarted the cassandra service like 8-10 times hoping that it 
>>>>>>>>>>> would finish
>>>>>>>>>>> the replication. But it didn't help.
>>>>>>>>>>>
>>>>>>>>>>> The one node that is still up is happily chugging along.
>>>>>>>>>>> All nodes have similar configuration - with libjna installed.
>>>>>>>>>>>
>>>>>>>>>>> Cassandra is installed from datastax's debian repo - pkg: dsc21
>>>>>>>>>>> version 2.1.7.
>>>>>>>>>>> I started off with the default configuration - i.e. the default
>>>>>>>>>>> cassandra-env.sh - which calculates the heap size automatically 
>>>>>>>>>>> (1/4 * RAM
>>>>>>>>>>> = 2GB)
>>>>>>>>>>>
>>>>>>>>>>> But, that didn't help. So, I then tried to increase the heap to
>>>>>>>>>>> 4GB manually and restarted. It still keeps crashing.
>>>>>>>>>>>
>>>>>>>>>>> Any clue as to why it's happening?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Kunal
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra OOM on joining existing ring

Reply via email to