Re: ReadStage filling up and leading to Read Timeouts

Jeff Jirsa Tue, 05 Feb 2019 23:43:17 -0800

https://docs.datastax.com/en/developer/java-driver/3.2/manual/paging/



-- 
Jeff Jirsa


> On Feb 5, 2019, at 11:33 PM, Rajsekhar Mallick <raj.mallic...@gmail.com> 
> wrote:
> 
> Hello Jeff,
> 
> Thanks for the reply.
> We do have GC logs enabled.
> We do observe gc pauses upto 2 seconds but quite often we see this issue even 
> when the gc log reads good and clear.
> 
> JVM Flags related to G1GC:
> 
> Xms: 48G
> Xmx:48G
> Maxgcpausemillis=200
> Parallels gc threads=32
> Concurrent gc threads= 10
> Initiatingheapoccupancypercent=50
> 
> You talked about dropping application page size. Please do elaborate on how 
> to change the same.
> Reducing the concurrent reads to 32 does help as we have tried the same...the 
> cpu load average remains under threshold...but read timeout keeps on 
> happening.
> 
> We will definitely try increasing the key cache sizes after verifying the 
> current max heap usage in the cluster.
> 
> Thanks,
> Rajsekhar Mallick
> 
>> On Wed, 6 Feb, 2019, 11:17 AM Jeff Jirsa <jji...@gmail.com wrote:
>> What you're potentially seeing is the GC impact of reading a large partition 
>> - do you have GC logs or StatusLogger output indicating you're pausing? What 
>> are you actual JVM flags you're using? 
>> 
>> Given your heap size, the easiest mitigation may be significantly increasing 
>> your key cache size (up to a gigabyte or two, if needed).
>> 
>> Yes, when you read data, it's materialized in memory (iterators from each 
>> sstable are merged and sent to the client), so reading lots of rows from a 
>> wide partition can cause GC pressure just from materializing the responses. 
>> Dropping your application's paging size could help if this is the problem. 
>> 
>> You may be able to drop concurrent reads from 64 to something lower 
>> (potentially 48 or 32, given your core count) to mitigate GC impact from 
>> lots of objects when you have a lot of concurrent reads, or consider 
>> upgrading to 3.11.4 (when it's out) to take advantage of CASSANDRA-11206 
>> (which made reading wide partitions less expensive). STCS especially wont 
>> help here - a large partition may be larger than you think, if it's spanning 
>> a lot of sstables. 
>> 
>> 
>> 
>> 
>>> On Tue, Feb 5, 2019 at 9:30 PM Rajsekhar Mallick <raj.mallic...@gmail.com> 
>>> wrote:
>>> Hello Team,
>>> 
>>> Cluster Details:
>>> 1. Number of Nodes in cluster : 7
>>> 2. Number of CPU cores: 48
>>> 3. Swap is enabled on all nodes
>>> 4. Memory available on all nodes : 120GB 
>>> 5. Disk space available : 745GB
>>> 6. Cassandra version: 2.1
>>> 7. Active tables are using size-tiered compaction strategy
>>> 8. Read Throughput: 6000 reads/s on each node (42000 reads/s cluster wide)
>>> 9. Read latency 99%: 300 ms
>>> 10. Write Throughput : 1800 writes/s
>>> 11. Write Latency 99%: 50 ms
>>> 12. Known issues in the cluster ( Large Partitions(upto 560MB, observed 
>>> when they get compacted), tombstones)
>>> 13. To reduce the impact of tombstones, gc_grace_seconds set to 0 for the 
>>> active tables
>>> 14. Heap size: 48 GB G1GC
>>> 15. Read timeout : 5000ms , Write timeouts: 2000ms
>>> 16. Number of concurrent reads: 64
>>> 17. Number of connections from clients on port 9042 stays almost constant 
>>> (close to 1800)
>>> 18. Cassandra thread count also stays almost constant (close to 2000)
>>> 
>>> Problem Statement:
>>> 1. ReadStage often gets full (reaches max size 64) on 2 to 3 nodes and 
>>> pending reads go upto 4000.
>>> 2. When the above happens Native-Transport-Stage gets full on neighbouring 
>>> nodes(1024 max) and pending threads are also observed.
>>> 3. During this time, CPU load average rises, user % for Cassandra process 
>>> reaches 90%
>>> 4. We see Read getting dropped, org.apache.cassandra.transport package 
>>> errors of reads getting timeout is seen.
>>> 5. Read latency 99% reached 5seconds, client starts seeing impact.
>>> 6. No IOwait observed on any of the virtual cores, sjk ttop command shows 
>>> max us% being used by “Worker Threads”
>>> 
>>> I have trying hard to zero upon what is the exact issue.
>>> What I make out of these above observations is…there might be some slow 
>>> queries, which get stuck on few nodes.
>>> Then there is a cascading effect wherein other queries get lined up.
>>> Unable to figure out any such slow queries up till now.
>>> As I mentioned, there are large partitions. We using size-tiered compaction 
>>> strategy, hence a large partition might be spread across multiple stables.
>>> Can this fact lead to slow queries. I also tried to understand, that data 
>>> in stables is stored in serialized format and when read into memory, it is 
>>> unseralized. This would lead to a large object in memory which then needs 
>>> to be transferred across the wire to the client.
>>> 
>>> Not sure what might be the reason. Kindly help on helping me understand 
>>> what might be the impact on read performance when we have large partitions.
>>> Kindly Suggest ways to catch these slow queries.
>>> Also do add if you see any other issues from the above details
>>> We are now considering to expand our cluster. Is the cluster under-sized. 
>>> Will addition of nodes help resolve the issue.
>>> 
>>> Thanks,
>>> Rajsekhar Mallick
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>

Re: ReadStage filling up and leading to Read Timeouts

Reply via email to