Re: Why and How is Cassandra using all my ram ?

2018-07-23 Thread Mark Rose
On 19 July 2018 at 10:43, Léo FERLIN SUTTON  wrote:
> Hello list !
>
> I have a question about cassandra memory usage.
>
> My cassandra nodes are slowly using up all my ram until they get OOM-Killed.
>
> When I check the memory usage with nodetool info the memory
> (off-heap+heap) doesn't match what the java process is really using.

Hi Léo,

It's possible that glibc is creating too many memory arenas. Are you
setting/exporting MALLOC_ARENA_MAX to something sane before calling
the JVM? You can check that in /proc//environ.

I would also turn on -XX:NativeMemoryTracking=summary and use jcmd to
check out native memory usage from the JVM's perspective.

-Mark

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Migrating to LCS : Disk Size recommendation clashes

2017-04-20 Thread Mark Rose
Hi Amit,

The size recommendations are based on balancing CPU and the amount of data
stored on a node. LCS requires less disk space but generally requires much
more CPU to keep up with compaction for the same amount of data, which is
why the size recommendation is smaller. There is nothing wrong with
attaching a larger disk, of course. The sizes are recommendations to start
with when you have nothing else to go by. If your cluster is light on
writes, you may be able to add much larger amounts data than the suggested
sizes and have no problem keeping up with LCS compaction. If your cluster
is heavy on writes, you may find you can only store a small fraction of the
data per node you were able to store with STCS. You will have to benchmark
for your use-case.

The 10 TB number is from a theoretical situation where LCS would result in
reading a maximum of 7 SSTables to return a read -- if LCS compaction can
keep up.

Cheers,
Mark

On Thu, Apr 13, 2017 at 8:23 AM, Amit Singh F 
wrote:

> Hi All,
>
>
>
> We are in process of migrating from STCS to LCS and was just doing few
> reads on line . Below is the excerpt from Datastax recommendation on data
> size  :
>
>
>
> Doc link : https://docs.datastax.com/en/landing_page/doc/landing_page/
> planning/planningHardware.html
>
>
>
>
>
> Also there is one more recommendation where it hints down to disk size can
> be limited to 10 TB (worst case) . Below is also excerpt also :
>
>
>
> Doc link : http://www.datastax.com/dev/blog/leveled-compaction-in-
> apache-cassandra
>
>
>
>
>
> So are there any restrictions/scenarios due to which 600GB is the
> preferred one in LCS.
>
>
>
> Thanks & Regards
>
> Amit Singh
>
>
>


Re: Maximum memory usage reached in cassandra!

2017-04-03 Thread Mark Rose
You may have better luck switching to G1GC and using a much larger
heap (16 to 30GB). 4GB is likely too small for your amount of data,
especially if you have a lot of sstables. Then try increasing
file_cache_size_in_mb further.

Cheers,
Mark

On Tue, Mar 28, 2017 at 3:01 AM, Mokkapati, Bhargav (Nokia -
IN/Chennai)  wrote:
> Hi Cassandra users,
>
>
>
> I am getting “Maximum memory usage reached (536870912 bytes), cannot
> allocate chunk of 1048576 bytes” . As a remedy I have changed the off heap
> memory usage limit cap i.e file_cache_size_in_mb parameter in cassandra.yaml
> from 512 to 1024.
>
>
>
> But now again the increased limit got filled up and throwing a message
> “Maximum memory usage reached (1073741824 bytes), cannot allocate chunk of
> 1048576 bytes”
>
>
>
> This issue occurring when redistribution of index’s happening ,due to this
> Cassandra nodes are still UP but read requests are getting failed from
> application side.
>
>
>
> My configuration details are as below:
>
>
>
> 5 node cluster , each node with 68 disks, each disk is 3.7 TB
>
>
>
> Total CPU cores - 8
>
>
>
> total  Mem:377G
>
> used  265G
>
> free   58G
>
> shared  378M
>
> buff/cache 53G
>
> available 104G
>
>
>
> MAX_HEAP_SIZE is 4GB
>
> file_cache_size_in_mb: 1024
>
>
>
> memtable heap space is commented in yaml file as below:
>
> # memtable_heap_space_in_mb: 2048
>
> # memtable_offheap_space_in_mb: 2048
>
>
>
> Can anyone please suggest the solution for this issue. Thanks in advance !
>
>
>
> Thanks,
>
> Bhargav M
>
>
>
>
>
>
>
>


Re: Adding disk capacity to a running node

2016-10-17 Thread Mark Rose
I've had luck using the st1 EBS type, too, for situations where reads
are rare (the commit log still needs to be on its own high IOPS
volume; I like using ephemeral storage for that).

On Mon, Oct 17, 2016 at 3:03 PM, Branton Davis
 wrote:
> I doubt that's true anymore.  EBS volumes, while previously discouraged, are
> the most flexible way to go, and are very reliable.  You can attach, detach,
> and snapshot them too.  If you don't need provisioned IOPS, the GP2 SSDs are
> more cost-effective and allow you to balance IOPS with cost.
>
> On Mon, Oct 17, 2016 at 1:55 PM, Jonathan Haddad  wrote:
>>
>> Vladimir,
>>
>> *Most* people are running Cassandra are doing so using ephemeral disks.
>> Instances are not arbitrarily moved to different hosts.  Yes, instances can
>> be shut down, but that's why you distribute across AZs.
>>
>> On Mon, Oct 17, 2016 at 11:48 AM Vladimir Yudovin 
>> wrote:
>>>
>>> It's extremely unreliable to use ephemeral (local) disks. Even if you
>>> don't stop instance by yourself, it can be restarted on different server in
>>> case of some hardware failure or AWS initiated update. So all node data will
>>> be lost.
>>>
>>> Best regards, Vladimir Yudovin,
>>> Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
>>> Launch your cluster in minutes.
>>>
>>>
>>>  On Mon, 17 Oct 2016 14:45:00 -0400Seth Edwards 
>>> wrote 
>>>
>>> These are i2.2xlarge instances so the disks currently configured as
>>> ephemeral dedicated disks.
>>>
>>> On Mon, Oct 17, 2016 at 11:34 AM, Laing, Michael
>>>  wrote:
>>>
>>> You could just expand the size of your ebs volume and extend the file
>>> system. No data is lost - assuming you are running Linux.
>>>
>>>
>>> On Monday, October 17, 2016, Seth Edwards  wrote:
>>>
>>> We're running 2.0.16. We're migrating to a new data model but we've had
>>> an unexpected increase in write traffic that has caused us some capacity
>>> issues when we encounter compactions. Our old data model is on STCS. We'd
>>> like to add another ebs volume (we're on aws) to our JBOD config and
>>> hopefully avoid any situation where we run out of disk space during a large
>>> compaction. It appears that the behavior we are hoping to get is actually
>>> undesirable and removed in 3.2. It still might be an option for us until we
>>> can finish the migration.
>>>
>>> I'm not familiar with LVM so it may be a bit risky to try at this point.
>>>
>>> On Mon, Oct 17, 2016 at 9:42 AM, Yabin Meng  wrote:
>>>
>>> I assume you're talking about Cassandra JBOD (just a bunch of disk) setup
>>> because you do mention it as adding it to the list of data directories. If
>>> this is the case, you may run into issues, depending on your C* version.
>>> Check this out: http://www.datastax.com/dev/blog/improving-jbod.
>>>
>>> Or another approach is to use LVM to manage multiple devices into a
>>> single mount point. If you do so, from what Cassandra can see is just simply
>>> increased disk storage space and there should should have no problem.
>>>
>>> Hope this helps,
>>>
>>> Yabin
>>>
>>> On Mon, Oct 17, 2016 at 11:54 AM, Vladimir Yudovin 
>>> wrote:
>>>
>>>
>>> Yes, Cassandra should keep percent of disk usage equal for all disk.
>>> Compaction process and SSTable flushes will use new disk to distribute both
>>> new and existing data.
>>>
>>> Best regards, Vladimir Yudovin,
>>> Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
>>> Launch your cluster in minutes.
>>>
>>>
>>>  On Mon, 17 Oct 2016 11:43:27 -0400Seth Edwards 
>>> wrote 
>>>
>>> We have a few nodes that are running out of disk capacity at the moment
>>> and instead of adding more nodes to the cluster, we would like to add
>>> another disk to the server and add it to the list of data directories. My
>>> question, is, will Cassandra use the new disk for compactions on sstables
>>> that already exist in the primary directory?
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>


Re: Is to ok restart DECOMMISION

2016-09-15 Thread Mark Rose
I've done that several times. Kill the process, restart it, let it
sync, decommission.

You'll need enough space on the receiving nodes for the full set of
data, on top of the other data that was already sent earlier, plus
room to cleanup/compact it.

Before you kill, check system.log to see if it died on anything. If
so, the decommission process will never finish. If not, let it
continue. Of particular note is that by default transferring large
sstables will timeout. You can fix that by adjusting
streaming_socket_timeout_in_ms to a sufficiently large value (I set it
to a day).

-Mark

On Thu, Sep 15, 2016 at 9:28 AM, laxmikanth sadula
 wrote:
> I started decommssioned a node in our cassandra cluster.
> But its taking too long time (more than 12 hrs) , so I would like to
> restart(stop/kill the node & restart 'node decommission' again)..
>
> Does killing node/stopping decommission and restarting decommission will
> cause any issues to cluster?
>
> Using c*-2.0.17 , 2 Data centers, each DC with 3 groups each , each group
> with 3 nodes with RF-3
>
> --
> Thanks...!


Re: large number of pending compactions, sstables steadily increasing

2016-08-19 Thread Mark Rose
Hi Ezra,

Are you making frequent changes to your rows (including TTL'ed
values), or mostly inserting new ones? If you're only inserting new
data, it's probable using size-tiered compaction would work better for
you. If you are TTL'ing whole rows, consider date-tiered.

If leveled compaction is still the best strategy, one way to catch up
with compactions is to have less data per partition -- in other words,
use more machines. Leveled compaction is CPU expensive. You are CPU
bottlenecked currently, or from the other perspective, you have too
much data per node for leveled compaction.

At this point, compaction is so far behind that you'll likely be
getting high latency if you're reading old rows (since dozens to
hundreds of uncompacted sstables will likely need to be checked for
matching rows). You may be better off with size tiered compaction,
even if it will mean always reading several sstables per read (higher
latency than when leveled can keep up).

How much data do you have per node? Do you update/insert to/delete
rows? Do you TTL?

Cheers,
Mark

On Wed, Aug 17, 2016 at 2:39 PM, Ezra Stuetzel  wrote:
> I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to fix
> issue) which seems to be stuck in a weird state -- with a large number of
> pending compactions and sstables. The node is compacting about 500gb/day,
> number of pending compactions is going up at about 50/day. It is at about
> 2300 pending compactions now. I have tried increasing number of compaction
> threads and the compaction throughput, which doesn't seem to help eliminate
> the many pending compactions.
>
> I have tried running 'nodetool cleanup' and 'nodetool compact'. The latter
> has fixed the issue in the past, but most recently I was getting OOM errors,
> probably due to the large number of sstables. I upgraded to 2.2.7 and am no
> longer getting OOM errors, but also it does not resolve the issue. I do see
> this message in the logs:
>
>> INFO  [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985
>> CompactionManager.java:610 - Cannot perform a full major compaction as
>> repaired and unrepaired sstables cannot be compacted together. These two set
>> of sstables will be compacted separately.
>
> Below are the 'nodetool tablestats' comparing a normal and the problematic
> node. You can see problematic node has many many more sstables, and they are
> all in level 1. What is the best way to fix this? Can I just delete those
> sstables somehow then run a repair?
>>
>> Normal node
>>>
>>> keyspace: mykeyspace
>>>
>>> Read Count: 0
>>>
>>> Read Latency: NaN ms.
>>>
>>> Write Count: 31905656
>>>
>>> Write Latency: 0.051713177939359714 ms.
>>>
>>> Pending Flushes: 0
>>>
>>> Table: mytable
>>>
>>> SSTable count: 1908
>>>
>>> SSTables in each level: [11/4, 20/10, 213/100, 1356/1000, 306, 0,
>>> 0, 0, 0]
>>>
>>> Space used (live): 301894591442
>>>
>>> Space used (total): 301894591442
>>>
>>>
>>>
>>> Problematic node
>>>
>>> Keyspace: mykeyspace
>>>
>>> Read Count: 0
>>>
>>> Read Latency: NaN ms.
>>>
>>> Write Count: 30520190
>>>
>>> Write Latency: 0.05171286705620116 ms.
>>>
>>> Pending Flushes: 0
>>>
>>> Table: mytable
>>>
>>> SSTable count: 14105
>>>
>>> SSTables in each level: [13039/4, 21/10, 206/100, 831, 0, 0, 0,
>>> 0, 0]
>>>
>>> Space used (live): 561143255289
>>>
>>> Space used (total): 561143255289
>
> Thanks,
>
> Ezra


Re: My cluster shows high system load without any apparent reason

2016-07-25 Thread Mark Rose
Hi Garo,

I haven't had this issue on SSDs, but I have definitely seen it with
spinning drives. I would think that SSDs would have more than enough
bandwidth to keep up with requests, but you may be running into issues
with Cassandra calling fsync on the commitlog.

What are your settings for the following?

commitlog_sync
commitlog_sync_period_in_ms
commitlog_sync_batch_window_in_ms

If you're using periodic, you could try changing
commitlog_sync_period_in_ms to something smaller like 1000 ms and
seeing if the problem is reduced (the theory is that there would be
less pending data to sync). If you are using batch, switch to
periodic. You could try mounting a GP2 volume and putting the commit
log directory on it and see if the problem goes away (say 200 GB for
sufficient IOPS). I'm guessing you don't have much in the way of
unallocated blocks in your LVM vg.

Writing to the commit log is single threaded, and if the commit log is
tied up waiting for IO during an fsync, it will block writes to the
node. If the threads are blocked on writing, the nodes will also be
stall for reading. The symptoms you are seeing are exactly the same as
I saw with spinning rust. I'm not sure why you didn't see this problem
with EBS.

-Mark

On Sat, Jul 23, 2016 at 7:21 AM, Juho Mäkinen <juho.maki...@gmail.com> wrote:
> Hi Mark.
>
> I have an LVM volume which stripes the four ephemeral SSD:s in the system
> and we use that for both data and commit log. I've used similar setup in the
> past (but with EBS) and we didn't see this behavior. Each node gets just
> around 250 writes per second. It is possible that the commit log is the
> issue here, but could I somehow measure it from the JMX metrics without the
> need of restructuring my entire cluster?
>
> Here's a screenshot from the latencies from our application point of view,
> which uses the Cassandra cluster to do reads. I started a rolling restart at
> around 09:30 and you can clearly see how the system latency dropped.
> http://imgur.com/a/kaPG7
>
> On Sat, Jul 23, 2016 at 2:25 AM, Mark Rose <markr...@markrose.ca> wrote:
>>
>> Hi Garo,
>>
>> Did you put the commit log on its own drive? Spiking CPU during stalls
>> is a symptom of not doing that. The commitlog is very latency
>> sensitive, even under low load. Do be sure you're using the deadline
>> or noop scheduler for that reason, too.
>>
>> -Mark
>>
>> On Fri, Jul 22, 2016 at 4:44 PM, Juho Mäkinen <juho.maki...@gmail.com>
>> wrote:
>> >> Are you using XFS or Ext4 for data?
>> >
>> >
>> > We are using XFS. Many nodes have a couple large SSTables (in order of
>> > 20-50
>> > GiB), but I havent cross checked if the load spikes happen only on
>> > machines
>> > which have these tables.
>> >
>> >>
>> >> As an aside, for the amount of reads/writes you're doing, I've found
>> >> using c3/m3 instances with the commit log on the ephemeral storage and
>> >> data on st1 EBS volumes to be much more cost effective. It's something
>> >> to look into if you haven't already.
>> >
>> >
>> > Thanks for the idea! I previously used c4.4xlarge instances with two
>> > 1500 GB
>> > GP2 volumes, but I found out that we maxed out their bandwidth too
>> > easily,
>> > so that's why my newest cluster is based on i2.4xlarge instances.
>> >
>> > And to answer Ryan: No, we are not using counters.
>> >
>> > I was thinking that could the big amount (100+ GiB) of mmap'ed files
>> > somehow
>> > cause some inefficiencies on the kernel side. That's why I started to
>> > learn
>> > on kernel huge pages and came up with the idea of disabling the huge
>> > page
>> > defrag, but nothing what I've found indicates that this can be a real
>> > problem. After all, Linux fs cache is a really old feature, so I expect
>> > it
>> > to be pretty bug free.
>> >
>> > I guess that I have to next learn how the load value itself is
>> > calculated. I
>> > know about the basic idea that when load is below the number of CPUs
>> > then
>> > the system should still be fine, but there's at least the iowait which
>> > is
>> > also used to calculate the load. So because I am not seeing any
>> > extensive
>> > iowait, and my userland CPU usage is well below what my 16 cores should
>> > handle, then what else contributes to the system load? Can I somehow
>> > make
>> > any educated guess what the high load might tell me if it's not iowait
>> > and
>> > it's not purely userland process CPU usage?

Re: My cluster shows high system load without any apparent reason

2016-07-22 Thread Mark Rose
Hi Garo,

Did you put the commit log on its own drive? Spiking CPU during stalls
is a symptom of not doing that. The commitlog is very latency
sensitive, even under low load. Do be sure you're using the deadline
or noop scheduler for that reason, too.

-Mark

On Fri, Jul 22, 2016 at 4:44 PM, Juho Mäkinen  wrote:
>> Are you using XFS or Ext4 for data?
>
>
> We are using XFS. Many nodes have a couple large SSTables (in order of 20-50
> GiB), but I havent cross checked if the load spikes happen only on machines
> which have these tables.
>
>>
>> As an aside, for the amount of reads/writes you're doing, I've found
>> using c3/m3 instances with the commit log on the ephemeral storage and
>> data on st1 EBS volumes to be much more cost effective. It's something
>> to look into if you haven't already.
>
>
> Thanks for the idea! I previously used c4.4xlarge instances with two 1500 GB
> GP2 volumes, but I found out that we maxed out their bandwidth too easily,
> so that's why my newest cluster is based on i2.4xlarge instances.
>
> And to answer Ryan: No, we are not using counters.
>
> I was thinking that could the big amount (100+ GiB) of mmap'ed files somehow
> cause some inefficiencies on the kernel side. That's why I started to learn
> on kernel huge pages and came up with the idea of disabling the huge page
> defrag, but nothing what I've found indicates that this can be a real
> problem. After all, Linux fs cache is a really old feature, so I expect it
> to be pretty bug free.
>
> I guess that I have to next learn how the load value itself is calculated. I
> know about the basic idea that when load is below the number of CPUs then
> the system should still be fine, but there's at least the iowait which is
> also used to calculate the load. So because I am not seeing any extensive
> iowait, and my userland CPU usage is well below what my 16 cores should
> handle, then what else contributes to the system load? Can I somehow make
> any educated guess what the high load might tell me if it's not iowait and
> it's not purely userland process CPU usage? This is starting to get really
> deep really fast :/
>
>  - Garo
>
>
>>
>>
>> -Mark
>>
>> On Fri, Jul 22, 2016 at 8:10 AM, Juho Mäkinen 
>> wrote:
>> > After a few days I've also tried disabling Linux kernel huge pages
>> > defragement (echo never > /sys/kernel/mm/transparent_hugepage/defrag)
>> > and
>> > turning coalescing off (otc_coalescing_strategy: DISABLED), but either
>> > did
>> > do any good. I'm using LCS, there are no big GC pauses, and I have set
>> > "concurrent_compactors: 5" (machines have 16 CPUs), but there are
>> > usually
>> > not any compactions running when the load spike comes. "nodetool
>> > tpstats"
>> > shows no running thread pools except on the Native-Transport-Requests
>> > (usually 0-4) and perhaps ReadStage (usually 0-1).
>> >
>> > The symptoms are the same: after about 12-24 hours increasingly number
>> > of
>> > nodes start to show short CPU load spikes and this affects the median
>> > read
>> > latencies. I ran a dstat when a load spike was already under way (see
>> > screenshot http://i.imgur.com/B0S5Zki.png), but any other column than
>> > the
>> > load itself doesn't show any major change except the system/kernel CPU
>> > usage.
>> >
>> > All further ideas how to debug this are greatly appreciated.
>> >
>> >
>> > On Wed, Jul 20, 2016 at 7:13 PM, Juho Mäkinen 
>> > wrote:
>> >>
>> >> I just recently upgraded our cluster to 2.2.7 and after turning the
>> >> cluster under production load the instances started to show high load
>> >> (as
>> >> shown by uptime) without any apparent reason and I'm not quite sure
>> >> what
>> >> could be causing it.
>> >>
>> >> We are running on i2.4xlarge, so we have 16 cores, 120GB of ram, four
>> >> 800GB SSDs (set as lvm stripe into one big lvol). Running
>> >> 3.13.0-87-generic
>> >> on HVM virtualisation. Cluster has 26 TiB of data stored in two tables.
>> >>
>> >> Symptoms:
>> >>  - High load, sometimes up to 30 for a short duration of few minutes,
>> >> then
>> >> the load drops back to the cluster average: 3-4
>> >>  - Instances might have one compaction running, but might not have any
>> >> compactions.
>> >>  - Each node is serving around 250-300 reads per second and around 200
>> >> writes per second.
>> >>  - Restarting node fixes the problem for around 18-24 hours.
>> >>  - No or very little IO-wait.
>> >>  - top shows that around 3-10 threads are running on high cpu, but that
>> >> alone should not cause a load of 20-30.
>> >>  - Doesn't seem to be GC load: A system starts to show symptoms so that
>> >> it
>> >> has ran only one CMS sweep. Not like it would do constant
>> >> stop-the-world
>> >> gc's.
>> >>  - top shows that the C* processes use 100G of RSS memory. I assume
>> >> that
>> >> this is because cassandra opens all SSTables with mmap() so that they
>> >> will
>> >> pop up in the RSS count because of this.

Re: My cluster shows high system load without any apparent reason

2016-07-22 Thread Mark Rose
Hi Garo,

Are you using XFS or Ext4 for data? XFS is much better at deleting
large files, such as may happen after a compaction. If you have 26 TB
in just two tables, I bet you have some massive sstables which may
take a while for Ext4 to delete, which may be causing the stalls. The
underlying block layers will not show high IO-wait. See if the stall
times line up with large compactions in system.log.

If you must use Ext4, another way to avoid issues with massive
sstables is to run more, smaller instances.

As an aside, for the amount of reads/writes you're doing, I've found
using c3/m3 instances with the commit log on the ephemeral storage and
data on st1 EBS volumes to be much more cost effective. It's something
to look into if you haven't already.

-Mark

On Fri, Jul 22, 2016 at 8:10 AM, Juho Mäkinen  wrote:
> After a few days I've also tried disabling Linux kernel huge pages
> defragement (echo never > /sys/kernel/mm/transparent_hugepage/defrag) and
> turning coalescing off (otc_coalescing_strategy: DISABLED), but either did
> do any good. I'm using LCS, there are no big GC pauses, and I have set
> "concurrent_compactors: 5" (machines have 16 CPUs), but there are usually
> not any compactions running when the load spike comes. "nodetool tpstats"
> shows no running thread pools except on the Native-Transport-Requests
> (usually 0-4) and perhaps ReadStage (usually 0-1).
>
> The symptoms are the same: after about 12-24 hours increasingly number of
> nodes start to show short CPU load spikes and this affects the median read
> latencies. I ran a dstat when a load spike was already under way (see
> screenshot http://i.imgur.com/B0S5Zki.png), but any other column than the
> load itself doesn't show any major change except the system/kernel CPU
> usage.
>
> All further ideas how to debug this are greatly appreciated.
>
>
> On Wed, Jul 20, 2016 at 7:13 PM, Juho Mäkinen 
> wrote:
>>
>> I just recently upgraded our cluster to 2.2.7 and after turning the
>> cluster under production load the instances started to show high load (as
>> shown by uptime) without any apparent reason and I'm not quite sure what
>> could be causing it.
>>
>> We are running on i2.4xlarge, so we have 16 cores, 120GB of ram, four
>> 800GB SSDs (set as lvm stripe into one big lvol). Running 3.13.0-87-generic
>> on HVM virtualisation. Cluster has 26 TiB of data stored in two tables.
>>
>> Symptoms:
>>  - High load, sometimes up to 30 for a short duration of few minutes, then
>> the load drops back to the cluster average: 3-4
>>  - Instances might have one compaction running, but might not have any
>> compactions.
>>  - Each node is serving around 250-300 reads per second and around 200
>> writes per second.
>>  - Restarting node fixes the problem for around 18-24 hours.
>>  - No or very little IO-wait.
>>  - top shows that around 3-10 threads are running on high cpu, but that
>> alone should not cause a load of 20-30.
>>  - Doesn't seem to be GC load: A system starts to show symptoms so that it
>> has ran only one CMS sweep. Not like it would do constant stop-the-world
>> gc's.
>>  - top shows that the C* processes use 100G of RSS memory. I assume that
>> this is because cassandra opens all SSTables with mmap() so that they will
>> pop up in the RSS count because of this.
>>
>> What I've done so far:
>>  - Rolling restart. Helped for about one day.
>>  - Tried doing manual GC to the cluster.
>>  - Increased heap from 8 GiB with CMS to 16 GiB with G1GC.
>>  - sjk-plus shows bunch of SharedPool workers. Not sure what to make of
>> this.
>>  - Browsed over
>> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html but didn't
>> find any apparent
>>
>> I know that the general symptom of "system shows high load" is not very
>> good and informative, but I don't know how to better describe what's going
>> on. I appreciate all ideas what to try and how to debug this further.
>>
>>  - Garo
>>
>