Re: Cassandra: Inconsistent data on reads (LOCAL_QUORUM)

2018-10-20 Thread Mick Semb Wever
> Thanks James. Yeah, we're using the datastax java driver. But we're on 
> version 2.1.10.2. And we are not using the client side timestamps.


Just to check Ninad. If you are using Cassandra-2.1 (native protocol
v3) and the java driver version 3.0 or above, then you would be using
client-side timestamps by default.
https://github.com/datastax/java-driver/tree/3.x/manual/query_timestamps

With client-side timestamps all client servers and all C* nodes must
be kept tightly in-sync, as Elliot said. Monitoring and alerting on
any clock skew on any of these machines is important.

Also worth checking that any local_quorum requests are not
accidentally go to the wrong datacenter.

regards,
Mick

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: High CPU usage on some of the nodes due to message coalesce

2018-10-20 Thread Chris Lohfink
1s young gcs are horrible and likely cause of some of your bad metrics. How 
large are your mutations/query results and what gc/heap settings are you using?

You can use https://github.com/aragozin/jvm-tools 
 to see the threads generating 
allocation pressure and using the cpu (ttop) and what garbage is being created 
(hh --dead-young).

Just a shot in the dark, I would guess you have rather large mutations putting 
pressure on commitlog and heap. G1 with a larger heap might help in that 
scenario to reduce fragmentation and adjust its eden and survivor regions to 
the allocation rate better (but give it a bigger reserve space) but theres 
limits to what can help if you cant change your workload. Without more info on 
schema etc its hard to tell but maybe that can help give you some ideas on 
places to look. It could just as likely be repair coordination, wide partition 
reads, or compactions so need to look more at what within the app is causing 
the pressure to know if its possible to improve with settings or if the load 
your application is producing exceeds what your cluster can handle (needs more 
nodes).

Chris

> On Oct 20, 2018, at 5:18 AM, onmstester onmstester 
>  wrote:
> 
> 3 nodes in my cluster have 100% cpu usage and most of it is used by 
> org.apache.cassandra.util.coalesceInternal and SepWorker.run?
> The most active threads are the messaging-service-incomming.
> Other nodes are normal, having 30 nodes, using Rack Aware strategy. with 10 
> rack each having 3 nodes. The problematic nodes are configured for one rack, 
> on normal write load, system.log reports too many hint message dropped (cross 
> node). also there are alot of parNewGc with about 700-1000ms and commit log 
> isolated disk, is utilized about 80-90%. on startup of these 3 nodes, there 
> are alot of "updateing topology" logs (1000s of them pending). 
> Using iperf, i'm sure that network is OK
> checking NTPs and mutations on each node, load is balanced among the nodes.
> using apache cassandra 3.11.2
> I can not not figure out the root cause of the problem, although there are 
> some obvious symptoms.
> 
> Best Regards
> Sent using Zoho Mail 
> 
> 



High CPU usage on some of the nodes due to message coalesce

2018-10-20 Thread onmstester onmstester
3 nodes in my cluster have 100% cpu usage and most of it is used by 
org.apache.cassandra.util.coalesceInternal and SepWorker.run? The most active 
threads are the messaging-service-incomming. Other nodes are normal, having 30 
nodes, using Rack Aware strategy. with 10 rack each having 3 nodes. The 
problematic nodes are configured for one rack, on normal write load, system.log 
reports too many hint message dropped (cross node). also there are alot of 
parNewGc with about 700-1000ms and commit log isolated disk, is utilized about 
80-90%. on startup of these 3 nodes, there are alot of "updateing topology" 
logs (1000s of them pending). Using iperf, i'm sure that network is OK checking 
NTPs and mutations on each node, load is balanced among the nodes. using apache 
cassandra 3.11.2 I can not not figure out the root cause of the problem, 
although there are some obvious symptoms. Best Regards Sent using Zoho Mail