Re: "Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB"

2019-12-02 Thread Rajsekhar Mallick
Hello Rahul,

I would request Hossein to correct me if I am wrong. Below is how it works

How will a application/database read something from the disk
A request comes in for read> the application code internally would be
invoking upon system calls-> these kernel level system calls will
schedule a job with io-scheduler--> the data is then read and  returned
by the device drivers-> this fetched data from the disk is a
accumulated in a memory location ( file buffer) until the entire read
operation is complete-> then i guess the data is uncompressed>
processed inside jvm as JAVA objects-> handed over to the application
logic to transmit it over the network interface.

This is my understanding of file_cache_size_in_mb. Basically caching disk
data onto the file system cache.
The alert you are getting is an INFO level log.
I would recommend try understanding why is it that this cache is filling up
fast. Increasing the cache size is a solution but as i remember there are
some impact if this is increased. I faced a similar issue and increased the
cache size. Eventually it happened that the increased size started falling
short.

You have the right question of how cache is being recycled. If you find an
answer do post the same. But that is something Cassandra doesn't have a
control on ( that is what i understand) .
 Investigating your reads,if a lot of data is being read to satisfy few
queries, might be another way to start troubleshooting

Thanks,
Rajsekhar








On Mon, 2 Dec, 2019, 8:18 PM Rahul Reddy,  wrote:

> Thanks Hossein,
>
> How does the chunks are moved out of memory (LRU?) if it want to make room
> for new requests to get chunks?if it has mechanism to clear chunks from
> cache what causes to cannot allocate chunk? Can you point me to any
> documention?
>
> On Sun, Dec 1, 2019, 12:03 PM Hossein Ghiyasi Mehr 
> wrote:
>
>> Chunks are part of sstables. When there is enough space in memory to
>> cache them, read performance will increase if application requests it again.
>>
>> Your real answer is application dependent. For example write heavy
>> applications are different than read heavy or read-write heavy. Real time
>> applications are different than time series data environments and ... .
>>
>>
>>
>> On Sun, Dec 1, 2019 at 7:09 PM Rahul Reddy 
>> wrote:
>>
>>> Hello,
>>>
>>> We are seeing memory usage reached 512 mb and cannot allocate 1MB.  I
>>> see this because file_cache_size_mb by default set to 512MB.
>>>
>>> Datastax document recommends to increase the file_cache_size.
>>>
>>> We have 32G over all memory allocated 16G to Cassandra. What is the
>>> recommended value in my case. And also when does this memory gets filled up
>>> frequent does nodeflush helps in avoiding this info messages?
>>>
>>


Re: Dropped mutations

2019-07-25 Thread Rajsekhar Mallick
Hello Jeff,

Request you to help on how to visualise the terms
1. Internal mutations
2. Cross node mutations
3. Mean internal dropped latency
4. Cross node dropped latency

Thanks,
Rajsekhar

On Thu, 25 Jul, 2019, 9:21 PM Jeff Jirsa,  wrote:

> This means your database is seeing commands that have already timed out by
> the time it goes to execute them, so it ignores them and gives up instead
> of working on work items that have already expired.
>
> The first log line shows 5 second latencies, the second line 6s and 8s
> latencies, which sounds like either really bad disks or really bad JVM GC
> pauses.
>
>
> On Thu, Jul 25, 2019 at 8:45 AM Ayub M  wrote:
>
>> Hello, how do I read dropped mutations error messages - whats internal
>> and cross node? For mutations it fails on cross-node and read_repair/read
>> it fails on internal. What does it mean?
>>
>> INFO  [ScheduledTasks:1] 2019-07-21 11:44:46,150
>> MessagingService.java:1281 - MUTATION messages were dropped in last 5000
>> ms: 0 internal and 65 cross node. Mean internal dropped latency: 0 ms and
>> Mean cross-node dropped latency: 4966 ms
>> INFO  [ScheduledTasks:1] 2019-07-19 05:01:10,620
>> MessagingService.java:1281 - READ_REPAIR messages were dropped in last 5000
>> ms: 9 internal and 8 cross node. Mean internal dropped latency: 6013 ms and
>> Mean cross-node dropped latency: 8164 ms
>>
>> --
>>
>> Regards,
>> Ayub
>>
>


Re: high write latency on a single table

2019-07-22 Thread Rajsekhar Mallick
  Read Count: 208257486
>>> Read Latency: 7.655137315414438 ms
>>> Write Count: 2218218966
>>> Write Latency: 1.7825896304427324 ms
>>> Pending Flushes: 0
>>> Table: MESSAGE_HISTORY_STATE
>>> SSTable count: 5
>>> Space used (live): 6403033568
>>> Space used (total): 6403033568
>>> Space used by snapshots (total): 19086872706
>>> Off heap memory used (total): 6727565
>>> SSTable Compression Ratio: 0.271857664111622
>>> Number of partitions (estimate): 1396462
>>> Memtable cell count: 77450
>>> Memtable data size: 620776
>>> Memtable off heap memory used: 1338914
>>> Memtable switch count: 1616
>>> Local read count: 988278
>>> Local read latency: 0.518 ms
>>> Local write count: 109292691
>>> Local write latency: 11.353 ms
>>> Pending flushes: 0
>>> Percent repaired: 0.0
>>> Bloom filter false positives: 0
>>> Bloom filter false ratio: 0.0
>>> Bloom filter space used: 1876208
>>> Bloom filter off heap memory used: 1876168
>>> Index summary off heap memory used: 410747
>>> Compression metadata off heap memory used: 3101736
>>> Compacted partition minimum bytes: 36
>>> Compacted partition maximum bytes: 129557750
>>> Compacted partition mean bytes: 17937
>>> Average live cells per slice (last five minutes): 
>>> 4.692893401015229
>>> Maximum live cells per slice (last five minutes): 258
>>> Average tombstones per slice (last five minutes): 1.0
>>> Maximum tombstones per slice (last five minutes): 1
>>> Dropped Mutations: 1344158
>>>
>>> 2)
>>>
>>> [cassadm@bipcas00 conf]$ nodetool tablehistograms tims MESSAGE_HISTORY
>>> tims/MESSAGE_HISTORY histograms
>>> Percentile  SSTables Write Latency  Read LatencyPartition Size  
>>>   Cell Count
>>>   (micros)  (micros)   (bytes)
>>> 50% 3.00 20.50454.83 14237  
>>>   17
>>> 75%17.00 24.60   2346.80 88148  
>>>  103
>>> 95%17.00 35.43  14530.76454826  
>>>  924
>>> 98%17.00 42.51  20924.30   1131752  
>>> 2299
>>> 99%17.00 42.51  30130.99   1955666  
>>> 4768
>>> Min 0.00  3.97 73.4636  
>>>0
>>> Max20.00263.21  74975.55 386857368  
>>>   943127
>>>
>>> [cassadm@bipcas00 conf]$ nodetool tablehistograms tims MESSAGE_HISTORY_STATE
>>> tims/MESSAGE_HISTORY_STATE histograms
>>> Percentile  SSTables Write Latency  Read LatencyPartition Size  
>>>   Cell Count
>>>   (micros)  (micros)   (bytes)
>>> 50% 5.00 20.50315.85   924  
>>>1
>>> 75% 6.00 35.43379.02  5722  
>>>7
>>> 95%10.00   4055.27785.94 61214  
>>>  310
>>> 98%10.00  74975.55   3379.39182785  
>>>  924
>>> 99%10.00 107964.79  10090.81315852  
>>> 1916
>>> Min 0.00  3.31 42.5136  
>>>0
>>> Max10.00 322381.14  25109.16 129557750  
>>>  1629722
>>>
>>> 3) RF=3
>>>
>>> 4)CL  QUORUM
>>>
>>> 5) Single insert prepared statement. no LOGGED/UNLOGGED batch or LWT
>>>
>>>
>>> On Thu, 18 Jul 2019 at 20:51, Rajsekhar Mallick 
>>> wro

Re: Rebooting one Cassandra node caused all the application nodes go down

2019-07-19 Thread Rajsekhar Mallick
Hello Rahul,
 As per your description, Cassandra process is up and running as you
verified from the logs.
But nodetool and grafana arnt fetching data.
This points to the suspect being jmx port 7199.

Do run and check 'netstat -anp | egrep"7199|9042|7070" ' on the impacted
and other hosts in the cluster.
There has to be some difference . Observe The ip address to which the jmx
port 7199 is binding to. Is it the same as it was prior to reboot.

Thanks


On Fri, 19 Jul, 2019, 10:28 PM Rahul Reddy, 
wrote:

> Raj,
>
> No that was not the case in system.log I see the started listening to call
> client at 16:42 but some how it still unreachable to 16:50 below grafana
> dashboard shows it. Once everything up in logs why would it still show down
> in nodetool status and grafana.
>
> Zaidi,
>
> In latest aws Linux Ami they took care of this bug . And also changing the
> Ami needs rebuild of all the nodes so didn't took that route.
>
> On Fri, Jul 19, 2019, 12:32 PM ZAIDI, ASAD A  wrote:
>
>> “aws asked to set nvme_timeout to higher number in etc/grub.conf.”
>>
>>
>>
>> Did you ask AWS if setting higher value is real solution to bug - Is
>> there not any patch available to address the bug?   - just curios to know
>>
>>
>>
>> *From:* Rahul Reddy [mailto:rahulreddy1...@gmail.com]
>> *Sent:* Friday, July 19, 2019 10:49 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Rebooting one Cassandra node caused all the application nodes
>> go down
>>
>>
>>
>> Here ,
>>
>>
>>
>> We have 6 nodes each in 2 data centers us-east-1 and us-west-2  . We have
>> RF 3 and  cl set to local quorum. And gossip snitch. All our instance are
>> c5.2xlarge and data files and comit logs are stored in gp2 ebs.  C5
>> instance type had a bug which aws asked to set nvme_timeout to higher
>> number in etc/grub.conf. after setting the parameter and did run nodetool
>> drain and reboot the node in east
>>
>>
>>
>> Instance cameup but Cassandra didn't come up normal had to start the
>> Cassandra. Cassandra cameup but it shows other instances down. Even though
>> didn't reboot the other node down same was observed in one other node. How
>> could that happen and don't any errors in system.log which is set to info.
>>
>> Without any intervention gossip settled in 10 mins entire cluster became
>> normal.
>>
>>
>>
>> Tried same thing West it happened again
>>
>>
>>
>>
>>
>>
>>
>> I'm concerned how to check what caused it and if a reboot happens again
>> how to avoid this.
>>
>>  If I just  STOP Cassandra instead of reboot I don't see this issue.
>>
>>
>>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org


Re: Rebooting one Cassandra node caused all the application nodes go down

2019-07-19 Thread Rajsekhar Mallick
Hello Rahul,

Basically the issue is running nodetool status on the host rebooted node,
shows itself as UN and all other nodes in the cluster as DN.

And running nodetool status on any other node in the cluster shows this
rebooted node as DN.
Correct me if I am wrong. Is this the issue.
Also attach screenshot of the observation you are talking about. You may
choose to replace the ip address of the hosts

Thanks

On Fri, 19 Jul, 2019, 9:36 PM Rahul Reddy,  wrote:

> Thanks for quick response rajshekar.
>
> Correct same cassandra.yml and same java
>
> On Fri, Jul 19, 2019, 11:56 AM Rajsekhar Mallick 
> wrote:
>
>> Hello Rahul,
>>
>> May you please confirm on below things.
>>
>> 1. Cassandra.yaml file of the node which was started after the machine
>> reboot is same as that of rest of the nodes in the cluster.
>> 2. Java version is consistent across all nodes in the cluster.
>>
>> Do check and revert
>>
>> Thanks
>>
>> On Fri, 19 Jul, 2019, 9:19 PM Rahul Reddy, 
>> wrote:
>>
>>> Here ,
>>>
>>> We have 6 nodes each in 2 data centers us-east-1 and us-west-2  . We
>>> have RF 3 and  cl set to local quorum. And gossip snitch. All our instance
>>> are c5.2xlarge and data files and comit logs are stored in gp2 ebs.  C5
>>> instance type had a bug which aws asked to set nvme_timeout to higher
>>> number in etc/grub.conf. after setting the parameter and did run nodetool
>>> drain and reboot the node in east
>>>
>>> Instance cameup but Cassandra didn't come up normal had to start the
>>> Cassandra. Cassandra cameup but it shows other instances down. Even though
>>> didn't reboot the other node down same was observed in one other node. How
>>> could that happen and don't any errors in system.log which is set to info.
>>> Without any intervention gossip settled in 10 mins entire cluster became
>>> normal.
>>>
>>> Tried same thing West it happened again
>>>
>>>
>>>
>>> I'm concerned how to check what caused it and if a reboot happens again
>>> how to avoid this.
>>>  If I just  STOP Cassandra instead of reboot I don't see this issue.
>>>
>>>


Re: Rebooting one Cassandra node caused all the application nodes go down

2019-07-19 Thread Rajsekhar Mallick
Hello Rahul,

May you please confirm on below things.

1. Cassandra.yaml file of the node which was started after the machine
reboot is same as that of rest of the nodes in the cluster.
2. Java version is consistent across all nodes in the cluster.

Do check and revert

Thanks

On Fri, 19 Jul, 2019, 9:19 PM Rahul Reddy,  wrote:

> Here ,
>
> We have 6 nodes each in 2 data centers us-east-1 and us-west-2  . We have
> RF 3 and  cl set to local quorum. And gossip snitch. All our instance are
> c5.2xlarge and data files and comit logs are stored in gp2 ebs.  C5
> instance type had a bug which aws asked to set nvme_timeout to higher
> number in etc/grub.conf. after setting the parameter and did run nodetool
> drain and reboot the node in east
>
> Instance cameup but Cassandra didn't come up normal had to start the
> Cassandra. Cassandra cameup but it shows other instances down. Even though
> didn't reboot the other node down same was observed in one other node. How
> could that happen and don't any errors in system.log which is set to info.
> Without any intervention gossip settled in 10 mins entire cluster became
> normal.
>
> Tried same thing West it happened again
>
>
>
> I'm concerned how to check what caused it and if a reboot happens again
> how to avoid this.
>  If I just  STOP Cassandra instead of reboot I don't see this issue.
>
>


Re: high write latency on a single table

2019-07-18 Thread Rajsekhar Mallick
Hello,

Kindly post below details

1. Nodetool cfstats for both the tables.
2. Nodetool cfhistograms for both the tables.
3. Replication factor of the tables.
4. Consistency with which write requests are sent
5. Also the type of write queries for the table  if handy would also help 
(Light weight transactions or Batch writes or Prepared statements)

Thanks

On 2019/07/18 15:48:09, CPC  wrote: 
> Hi all,> 
> 
> Our cassandra cluster consist of two dc and every dc we have 10 nodes. We> 
> are using DSE 5.1.12 (cassandra 3.11).We have a high local write latency on> 
> a single table. All other tables in our keyspace  have normal latencies> 
> like 0.02 msec,even tables that have more write tps and more data. Below> 
> you can find two table descriptions and their latencies.> 
> message_history_state have high local write latency. This is not node> 
> specific every node have this high local write latency for> 
> message_history_state. Have you ever see such a behavior or any clue why> 
> this could happen?> 
> 
>  CREATE TABLE tims."MESSAGE_HISTORY" (> 
> > username text,> 
> > date_partition text,> 
> > jid text,> 
> > sent_time timestamp,> 
> > message_id text,> 
> > stanza text,> 
> > PRIMARY KEY ((username, date_partition), jid, sent_time, message_id)> 
> > ) WITH CLUSTERING ORDER BY (jid ASC, sent_time DESC, message_id ASC)> 
> > AND bloom_filter_fp_chance = 0.01> 
> > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}> 
> > AND comment = ''> 
> > AND compaction = {'bucket_high': '1.5', 'bucket_low': '0.5', 'class':> 
> > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',> 
> > 'enabled': 'true', 'max_threshold': '32', 'min_sstable_size': '50',> 
> > 'min_threshold': '4', 'tombstone_compaction_interval': '86400',> 
> > 'tombstone_threshold': '0.2', 'unchecked_tombstone_compaction': 'false'}> 
> > AND compression = {'chunk_length_in_kb': '64', 'class':> 
> > 'org.apache.cassandra.io.compress.LZ4Compressor'}> 
> > AND crc_check_chance = 1.0> 
> > AND dclocal_read_repair_chance = 0.0> 
> > AND default_time_to_live = 0> 
> > AND gc_grace_seconds = 86400> 
> > AND max_index_interval = 2048> 
> > AND memtable_flush_period_in_ms = 0> 
> > AND min_index_interval = 128> 
> > AND read_repair_chance = 0.0> 
> > AND speculative_retry = '99PERCENTILE';> 
> >> 
> > CREATE TABLE tims."MESSAGE_HISTORY_STATE" (> 
> > username text,> 
> > date_partition text,> 
> > message_id text,> 
> > jid text,> 
> > state text,> 
> > sent_time timestamp,> 
> > PRIMARY KEY ((username, date_partition), message_id, jid, state)> 
> > ) WITH CLUSTERING ORDER BY (message_id ASC, jid ASC, state ASC)> 
> > AND bloom_filter_fp_chance = 0.01> 
> > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}> 
> > AND comment = ''> 
> > AND compaction = {'class':> 
> > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',> 
> > 'max_threshold': '32', 'min_threshold': '4'}> 
> > AND compression = {'chunk_length_in_kb': '64', 'class':> 
> > 'org.apache.cassandra.io.compress.LZ4Compressor'}> 
> > AND crc_check_chance = 1.0> 
> > AND dclocal_read_repair_chance = 0.1> 
> > AND default_time_to_live = 0> 
> > AND gc_grace_seconds = 864000> 
> > AND max_index_interval = 2048> 
> > AND memtable_flush_period_in_ms = 0> 
> > AND min_index_interval = 128> 
> > AND read_repair_chance = 0.0> 
> > AND speculative_retry = '99PERCENTILE';> 
> >> 
> 
> message_history Local write latency: 0.021 ms> 
> message_history_state Local write latency: 11.353 ms> 
> 
> Thanks in advance.> 
> 
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: ReadRepairStage

2019-07-18 Thread Rajsekhar Mallick
Hello Bobbie,

Do revert with below details:

1. Replication factor of the keyspace.
2. Consistency level used for read requests
3. Nodetool netstats output
4. grep “DigestMismatch”  /log/directory/path/debug.log

Thanks

On 2019/07/18 17:19:10, Bobbie Haynes  wrote: 
> I have updated all the nodes in the cluster dcread_repair_chance to 0.0> 
> stilll i see lot of messages.> 
> 
> On Thu, Jul 18, 2019 at 10:13 AM Bobbie Haynes > 
> wrote:> 
> 
> > Hi,> 
> >  We are using Apache Cassandra 3.11.4 .Recently we are seeing> 
> > overloaded readrepair ERROR messages in the entire cluster because that we> 
> > are getting timeouts ..I'm not able to find the root cause for this .> 
> > Appreciate any inputs on this issue ..> 
> >> 
> >> 
> >> 
> >> 
> > ERROR [ReadRepairStage:2537] 2019-07-18 17:08:15,119> 
> > CassandraDaemon.java:228 - Exception in thread> 
> > Thread[ReadRepairStage:2537,5,main]> 
> > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out> 
> > - received only 1 responses.> 
> > at> 
> > org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:202)>
> >  
> > ~[apache-cassandra-3.11.3.jar:3.11.3]> 
> > at> 
> > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175)>
> >  
> > ~[apache-cassandra-3.11.3.jar:3.11.3]> 
> > at> 
> > org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92)> 
> > ~[apache-cassandra-3.11.3.jar:3.11.3]> 
> > at> 
> > org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:79)>
> >  
> > ~[apache-cassandra-3.11.3.jar:3.11.3]> 
> > at> 
> > org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)>
> >  
> > ~[apache-cassandra-3.11.3.jar:3.11.3]> 
> > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)> 
> > ~[apache-cassandra-3.11.3.jar:3.11.3]> 
> > at> 
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)>
> >  
> > ~[na:1.8.0_212]> 
> > at> 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)>
> >  
> > ~[na:1.8.0_212]> 
> > at> 
> > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)>
> >  
> > ~[apache-cassandra-3.11.3.jar:3.11.3]> 
> > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_212]> 
> >> 
> 
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Commit sync issues and maximum memory usage reached messages seen in system log

2019-05-09 Thread Rajsekhar Mallick
Hello team,

I am observing below warn and info message in system.log

1. Info log: maximum memory usage reached (1.000GiB), cannot allocate chunk
of 1 MiB.

I tried by increasing the file_cache_size_in_mb in Cassandra.yaml from 512
to 1024. But still this message shows up in logs.

2. Warn log: Out of 25 commit log syncs over the past 223 second with
average duration of 36.28 ms, 2 have exceeded the configured commit
interval by an average of 200.44ms

Commitlog sync is periodic
Commitlog_sync_period_in_ms is set to 10 seconds
Commitlog_segment_size_in_mb is set to 32
Commitlog_total_size_in_mb is set to 1024

Kindly comment on what may I conclude from above logs


Unable to track compaction completion

2019-02-15 Thread Rajsekhar Mallick
Hello team,

I have been trying to figure out, how to track the completion of a
compaction on a node.
Nodetool compactionstats show instantaneous results.

I found that system.compaction_in_progress gets me the same details as that
of compactionstats. Also it gets me a id for running compaction.
I was of the view that checking for the same id in
system.compaction_history would fetch me the compaction details after a
running compaction ends.
But no such relationship exists I see.
Please do confirm on the above.

Thanks,
Rajsekhar Mallick


Tracking live and complete compaction

2019-02-15 Thread Rajsekhar Mallick
Hello team,

I have been trying to figure out, how to track the completion of a
compaction on a node.
Nodetool compactionstats show instantaneous results.

I found that system.compaction_in_progress gets me the same details as that
of compactionstats. Also it gets me a id for running compaction.
I was of the view that checking for the same id in
system.compaction_history would fetch me the compaction details after a
running compaction ends.
But no such relationship exists I see.
Please do confirm on the above.

Thanks,
Rajsekhar Mallick


Local jmx changes get reverted after restart of a neighbouring node in Cassandra cluster

2019-02-11 Thread Rajsekhar Mallick
Hello Team,

I have been trying to use sjk tool/jmxterm jar utilities to change
compaction strategy of a table locally from STCS to LCS, without changing
the schema.
I have been trying this on a lower environment first before implementing
the same in production environment.
The change did work on one of the node. Autocompaction was triggered after
flush for the table.
After making changes on one node, I made the same changes on another node
in the cluster.
The change again went through. Then to verify,if local changes revert after
restart, I restarted one of 2 nodes where changes were made.
The change on that node got reverted, but the change also rolled back on
other node too (which wasn't restarted).
I did check for datastax blogs,but didn't find any such explainations.
Kindly help me understand why restart on one node would revert jmx local
changes made on another node.
Does a node restart in the cluster,trigger a schema update for the cluster?

Thanks,
Rajsekhar Mallick


High GC pauses leading to client seeing impact

2019-02-10 Thread Rajsekhar Mallick
Hello Team,

I have a cluster of 17 nodes in production.(8 and 9 nodes in 2 DC).
Cassandra version: 2.0.11
Client connecting using thrift over port 9160
Jdk version : 1.8.066
GC used : G1GC (16GB heap)
Other GC settings:
Maxgcpausemillis=200
Parallels gc threads=32
Concurrent gc threads= 10
Initiatingheapoccupancypercent=50
Number of cpu cores for each system : 40
Memory size: 185 GB
Read/sec : 300 /sec on each node
Writes/sec : 300/sec on each node
Compaction strategy used : Size tiered compaction strategy

Identified issues in the cluster:
1. Disk space usage across all nodes in the cluster is 80%. We are currently 
working on adding more storage on each node
2. There are 2 tables for which we keep on seeing large number of tombstones. 
One of table has read requests seeing 120 tombstones cells in last 5 mins as 
compared to 4 live cells. Tombstone warns and Error messages of query getting 
aborted is also seen.

Current issue sen:
1. We keep on seeing GC pauses of few minutes randomly across nodes in the 
cluster. GC pauses of 120 seconds, even 770 seconds are also seen.
2. This leads to nodes getting stalled and client seeing direct impact
3. The GC pause we see, are not during any of G1GC phases. The GC log message 
prints “Time to stop threads took 770 seconds”. So it is not the garbage 
collector doing any work but stopping the threads at a safe point is taking so 
much of time.
4. This issue has surfaced recently after we changed 8GB(CMS) to 16GB(G1GC) 
across all nodes in the cluster.

Kindly do help on the above issue. I am not able to exactly understand if the 
GC is wrongly tuned, other if this is something else.

Thanks,
Rajsekhar Mallick



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: ReadStage filling up and leading to Read Timeouts

2019-02-05 Thread Rajsekhar Mallick
Thank you Jeff for the link.
Please do comment on the G1GC settings,if they are ok for the cluster.
Also comment on reducing the concurrent reads to 32 on all nodes in the
cluster.
As has earlier lead to reads getting dropped.
Will adding nodes to the cluster be helpful.

Thanks,
Rajsekhar Mallick



On Wed, 6 Feb, 2019, 1:12 PM Jeff Jirsa 
> https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/
>
>
> --
> Jeff Jirsa
>
>
> On Feb 5, 2019, at 11:33 PM, Rajsekhar Mallick 
> wrote:
>
> Hello Jeff,
>
> Thanks for the reply.
> We do have GC logs enabled.
> We do observe gc pauses upto 2 seconds but quite often we see this issue
> even when the gc log reads good and clear.
>
> JVM Flags related to G1GC:
>
> Xms: 48G
> Xmx:48G
> Maxgcpausemillis=200
> Parallels gc threads=32
> Concurrent gc threads= 10
> Initiatingheapoccupancypercent=50
>
> You talked about dropping application page size. Please do elaborate on
> how to change the same.
> Reducing the concurrent reads to 32 does help as we have tried the
> same...the cpu load average remains under threshold...but read timeout
> keeps on happening.
>
> We will definitely try increasing the key cache sizes after verifying the
> current max heap usage in the cluster.
>
> Thanks,
> Rajsekhar Mallick
>
> On Wed, 6 Feb, 2019, 11:17 AM Jeff Jirsa 
>> What you're potentially seeing is the GC impact of reading a large
>> partition - do you have GC logs or StatusLogger output indicating you're
>> pausing? What are you actual JVM flags you're using?
>>
>> Given your heap size, the easiest mitigation may be significantly
>> increasing your key cache size (up to a gigabyte or two, if needed).
>>
>> Yes, when you read data, it's materialized in memory (iterators from each
>> sstable are merged and sent to the client), so reading lots of rows from a
>> wide partition can cause GC pressure just from materializing the responses.
>> Dropping your application's paging size could help if this is the problem.
>>
>> You may be able to drop concurrent reads from 64 to something lower
>> (potentially 48 or 32, given your core count) to mitigate GC impact from
>> lots of objects when you have a lot of concurrent reads, or consider
>> upgrading to 3.11.4 (when it's out) to take advantage of CASSANDRA-11206
>> (which made reading wide partitions less expensive). STCS especially wont
>> help here - a large partition may be larger than you think, if it's
>> spanning a lot of sstables.
>>
>>
>>
>>
>> On Tue, Feb 5, 2019 at 9:30 PM Rajsekhar Mallick 
>> wrote:
>>
>>> Hello Team,
>>>
>>> Cluster Details:
>>> 1. Number of Nodes in cluster : 7
>>> 2. Number of CPU cores: 48
>>> 3. Swap is enabled on all nodes
>>> 4. Memory available on all nodes : 120GB
>>> 5. Disk space available : 745GB
>>> 6. Cassandra version: 2.1
>>> 7. Active tables are using size-tiered compaction strategy
>>> 8. Read Throughput: 6000 reads/s on each node (42000 reads/s cluster
>>> wide)
>>> 9. Read latency 99%: 300 ms
>>> 10. Write Throughput : 1800 writes/s
>>> 11. Write Latency 99%: 50 ms
>>> 12. Known issues in the cluster ( Large Partitions(upto 560MB, observed
>>> when they get compacted), tombstones)
>>> 13. To reduce the impact of tombstones, gc_grace_seconds set to 0 for
>>> the active tables
>>> 14. Heap size: 48 GB G1GC
>>> 15. Read timeout : 5000ms , Write timeouts: 2000ms
>>> 16. Number of concurrent reads: 64
>>> 17. Number of connections from clients on port 9042 stays almost
>>> constant (close to 1800)
>>> 18. Cassandra thread count also stays almost constant (close to 2000)
>>>
>>> Problem Statement:
>>> 1. ReadStage often gets full (reaches max size 64) on 2 to 3 nodes and
>>> pending reads go upto 4000.
>>> 2. When the above happens Native-Transport-Stage gets full on
>>> neighbouring nodes(1024 max) and pending threads are also observed.
>>> 3. During this time, CPU load average rises, user % for Cassandra
>>> process reaches 90%
>>> 4. We see Read getting dropped, org.apache.cassandra.transport package
>>> errors of reads getting timeout is seen.
>>> 5. Read latency 99% reached 5seconds, client starts seeing impact.
>>> 6. No IOwait observed on any of the virtual cores, sjk ttop command
>>> shows max us% being used by “Worker Threads”
>>>
>>> I have trying hard to zero upon what is the exact issue.
>>> What I make out of these above observatio

Re: ReadStage filling up and leading to Read Timeouts

2019-02-05 Thread Rajsekhar Mallick
Hello Jeff,

Thanks for the reply.
We do have GC logs enabled.
We do observe gc pauses upto 2 seconds but quite often we see this issue
even when the gc log reads good and clear.

JVM Flags related to G1GC:

Xms: 48G
Xmx:48G
Maxgcpausemillis=200
Parallels gc threads=32
Concurrent gc threads= 10
Initiatingheapoccupancypercent=50

You talked about dropping application page size. Please do elaborate on how
to change the same.
Reducing the concurrent reads to 32 does help as we have tried the
same...the cpu load average remains under threshold...but read timeout
keeps on happening.

We will definitely try increasing the key cache sizes after verifying the
current max heap usage in the cluster.

Thanks,
Rajsekhar Mallick

On Wed, 6 Feb, 2019, 11:17 AM Jeff Jirsa  What you're potentially seeing is the GC impact of reading a large
> partition - do you have GC logs or StatusLogger output indicating you're
> pausing? What are you actual JVM flags you're using?
>
> Given your heap size, the easiest mitigation may be significantly
> increasing your key cache size (up to a gigabyte or two, if needed).
>
> Yes, when you read data, it's materialized in memory (iterators from each
> sstable are merged and sent to the client), so reading lots of rows from a
> wide partition can cause GC pressure just from materializing the responses.
> Dropping your application's paging size could help if this is the problem.
>
> You may be able to drop concurrent reads from 64 to something lower
> (potentially 48 or 32, given your core count) to mitigate GC impact from
> lots of objects when you have a lot of concurrent reads, or consider
> upgrading to 3.11.4 (when it's out) to take advantage of CASSANDRA-11206
> (which made reading wide partitions less expensive). STCS especially wont
> help here - a large partition may be larger than you think, if it's
> spanning a lot of sstables.
>
>
>
>
> On Tue, Feb 5, 2019 at 9:30 PM Rajsekhar Mallick 
> wrote:
>
>> Hello Team,
>>
>> Cluster Details:
>> 1. Number of Nodes in cluster : 7
>> 2. Number of CPU cores: 48
>> 3. Swap is enabled on all nodes
>> 4. Memory available on all nodes : 120GB
>> 5. Disk space available : 745GB
>> 6. Cassandra version: 2.1
>> 7. Active tables are using size-tiered compaction strategy
>> 8. Read Throughput: 6000 reads/s on each node (42000 reads/s cluster wide)
>> 9. Read latency 99%: 300 ms
>> 10. Write Throughput : 1800 writes/s
>> 11. Write Latency 99%: 50 ms
>> 12. Known issues in the cluster ( Large Partitions(upto 560MB, observed
>> when they get compacted), tombstones)
>> 13. To reduce the impact of tombstones, gc_grace_seconds set to 0 for the
>> active tables
>> 14. Heap size: 48 GB G1GC
>> 15. Read timeout : 5000ms , Write timeouts: 2000ms
>> 16. Number of concurrent reads: 64
>> 17. Number of connections from clients on port 9042 stays almost constant
>> (close to 1800)
>> 18. Cassandra thread count also stays almost constant (close to 2000)
>>
>> Problem Statement:
>> 1. ReadStage often gets full (reaches max size 64) on 2 to 3 nodes and
>> pending reads go upto 4000.
>> 2. When the above happens Native-Transport-Stage gets full on
>> neighbouring nodes(1024 max) and pending threads are also observed.
>> 3. During this time, CPU load average rises, user % for Cassandra process
>> reaches 90%
>> 4. We see Read getting dropped, org.apache.cassandra.transport package
>> errors of reads getting timeout is seen.
>> 5. Read latency 99% reached 5seconds, client starts seeing impact.
>> 6. No IOwait observed on any of the virtual cores, sjk ttop command shows
>> max us% being used by “Worker Threads”
>>
>> I have trying hard to zero upon what is the exact issue.
>> What I make out of these above observations is…there might be some slow
>> queries, which get stuck on few nodes.
>> Then there is a cascading effect wherein other queries get lined up.
>> Unable to figure out any such slow queries up till now.
>> As I mentioned, there are large partitions. We using size-tiered
>> compaction strategy, hence a large partition might be spread across
>> multiple stables.
>> Can this fact lead to slow queries. I also tried to understand, that data
>> in stables is stored in serialized format and when read into memory, it is
>> unseralized. This would lead to a large object in memory which then needs
>> to be transferred across the wire to the client.
>>
>> Not sure what might be the reason. Kindly help on helping me understand
>> what might be the impact on read performance when we have large partitions.
>> Kindly Suggest ways to catch these slow queries.
>> Also do add i

ReadStage filling up and leading to Read Timeouts

2019-02-05 Thread Rajsekhar Mallick
Hello Team,

Cluster Details:
1. Number of Nodes in cluster : 7
2. Number of CPU cores: 48
3. Swap is enabled on all nodes
4. Memory available on all nodes : 120GB 
5. Disk space available : 745GB
6. Cassandra version: 2.1
7. Active tables are using size-tiered compaction strategy
8. Read Throughput: 6000 reads/s on each node (42000 reads/s cluster wide)
9. Read latency 99%: 300 ms
10. Write Throughput : 1800 writes/s
11. Write Latency 99%: 50 ms
12. Known issues in the cluster ( Large Partitions(upto 560MB, observed when 
they get compacted), tombstones)
13. To reduce the impact of tombstones, gc_grace_seconds set to 0 for the 
active tables
14. Heap size: 48 GB G1GC
15. Read timeout : 5000ms , Write timeouts: 2000ms
16. Number of concurrent reads: 64
17. Number of connections from clients on port 9042 stays almost constant 
(close to 1800)
18. Cassandra thread count also stays almost constant (close to 2000)

Problem Statement:
1. ReadStage often gets full (reaches max size 64) on 2 to 3 nodes and pending 
reads go upto 4000.
2. When the above happens Native-Transport-Stage gets full on neighbouring 
nodes(1024 max) and pending threads are also observed.
3. During this time, CPU load average rises, user % for Cassandra process 
reaches 90%
4. We see Read getting dropped, org.apache.cassandra.transport package errors 
of reads getting timeout is seen.
5. Read latency 99% reached 5seconds, client starts seeing impact.
6. No IOwait observed on any of the virtual cores, sjk ttop command shows max 
us% being used by “Worker Threads”

I have trying hard to zero upon what is the exact issue.
What I make out of these above observations is…there might be some slow 
queries, which get stuck on few nodes.
Then there is a cascading effect wherein other queries get lined up.
Unable to figure out any such slow queries up till now.
As I mentioned, there are large partitions. We using size-tiered compaction 
strategy, hence a large partition might be spread across multiple stables.
Can this fact lead to slow queries. I also tried to understand, that data in 
stables is stored in serialized format and when read into memory, it is 
unseralized. This would lead to a large object in memory which then needs to be 
transferred across the wire to the client.

Not sure what might be the reason. Kindly help on helping me understand what 
might be the impact on read performance when we have large partitions.
Kindly Suggest ways to catch these slow queries.
Also do add if you see any other issues from the above details
We are now considering to expand our cluster. Is the cluster under-sized. Will 
addition of nodes help resolve the issue.

Thanks,
Rajsekhar Mallick





-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org