Re: Write performance degradation

2018-06-18 Thread onmstester onmstester
I think that could have pinpoint the problem, i have a table with a partition 
key related to timestamp so for one hour so many data would be inserted at one 
single node, this table creates a very big partitions (300MB-600MB), whatever 
node the current partition of that table would be inserted to, reports too many 
DroppedMutations (sometimes 6M in 5 minutes) and when the load increases it 
would slow down a single node in my cluster.

So i think that i should change my data model and use sharding in partition key 
of problematic table.


Sent using Zoho Mail






 On Mon, 18 Jun 2018 16:24:48 +0430 DuyHai Doan 
doanduy...@gmail.com wrote 




Maybe the disk I/O cannot keep up with the high mutation rate ? 



Check the number of pending compactions




On Sun, Jun 17, 2018 at 9:24 AM, onmstester onmstester 
onmstes...@zoho.com wrote:








Hi, 



I was doing 500K inserts + 100K counter update in seconds on my cluster of 12 
nodes (20 core/128GB ram/4 * 600 HDD 10K) using batch statements

with no problem.

I saw a lot of warning show that most of batches not concerning a single node, 
so they should not be in a batch, on the other hand input load of my application

increased by 50%, so i switched to non-batch async inserts and increased number 
of client threads so the load increased by 50%.

The system worked for 2 days with no problem with load of 750K inserts + 150K 
counter updates per seconds but suddendly a lot of timeout on insert generated 
in log files

Decreasing input load to previous load, even less than that did not help.

When i restart my client (after some hours that its been started log timeouts 
and erros) it works with no problem for 20 minutes but again starts logging 
timeout errors.

CPU load of nodes in cluster is less than 25%.

How can i solve this problem? I'm saving all jmx metrics of cassande\ra by 
monitoring system, What should i check?



Sent using Zoho Mail












Re: Write performance degradation

2018-06-18 Thread DuyHai Doan
Maybe the disk I/O cannot keep up with the high mutation rate ?

Check the number of pending compactions

On Sun, Jun 17, 2018 at 9:24 AM, onmstester onmstester 
wrote:

> Hi,
>
> I was doing 500K inserts + 100K counter update in seconds on my cluster of
> 12 nodes (20 core/128GB ram/4 * 600 HDD 10K) using batch statements
> with no problem.
> I saw a lot of warning show that most of batches not concerning a single
> node, so they should not be in a batch, on the other hand input load of my
> application
> increased by 50%, so i switched to non-batch async inserts and increased
> number of client threads so the load increased by 50%.
> The system worked for 2 days with no problem with load of 750K inserts +
> 150K counter updates per seconds but suddendly a lot of timeout on insert
> generated in log files
> Decreasing input load to previous load, even less than that did not help.
> When i restart my client (after some hours that its been started log
> timeouts and erros) it works with no problem for 20 minutes but again
> starts logging timeout errors.
> CPU load of nodes in cluster is less than 25%.
> How can i solve this problem? I'm saving all jmx metrics of cassande\ra by
> monitoring system, What should i check?
>
> Sent using Zoho Mail 
>
>
>


Write performance degradation

2018-06-17 Thread onmstester onmstester
Hi, 



I was doing 500K inserts + 100K counter update in seconds on my cluster of 12 
nodes (20 core/128GB ram/4 * 600 HDD 10K) using batch statements

with no problem.

I saw a lot of warning show that most of batches not concerning a single node, 
so they should not be in a batch, on the other hand input load of my application

increased by 50%, so i switched to non-batch async inserts and increased number 
of client threads so the load increased by 50%.

The system worked for 2 days with no problem with load of 750K inserts + 150K 
counter updates per seconds but suddendly a lot of timeout on insert generated 
in log files

Decreasing input load to previous load, even less than that did not help.

When i restart my client (after some hours that its been started log timeouts 
and erros) it works with no problem for 20 minutes but again starts logging 
timeout errors.
CPU load of nodes in cluster is less than 25%.

How can i solve this problem? I'm saving all jmx metrics of cassande\ra by 
monitoring system, What should i check?



Sent using Zoho Mail