Re: Safe read request timeout value

2018-06-15 Thread anil patimidi
Hi,

Depends on how much latency can your app tolerate.  You always want to make
sure that client side timeouts are set greater than server side timeout
value. So that client is n't timing out while server is still serving the
requests.

You might want to start with defaults and try bumping up.  Something like
2000 ( 2 ms)

-Anil


On Fri, Jun 15, 2018 at 4:31 AM, Vsevolod Filaretov 
wrote:

> Good time of day everyone,
>
> I've got a question on timeouts setting practice.
>
> I've got a 4-node cluster with only a handful of users, constant data
> inserts and very large partitions (up to 450+mb, which is 4 times larger
> than general cassandra manuals recommend). Data is held on hdd.
>
> What are general practices on read request timeout settings?
>
> How large is my theoretical maximum of server-side user read timeout which
> does not compromise cluster availability, prioritizing inserts stability?
>


Re: Options to replace hardware of the cluster

2018-06-15 Thread anil patimidi
Hi Christian,

You can do host replacement host by host keeping the replace args of the
old node.

https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceLiveNode.html

- Anil

On Fri, Jun 15, 2018 at 12:43 AM, Christian Lorenz <
christian.lor...@webtrekk.com> wrote:

> Hi Rahul,
>
>
>
> thanks for your suggestions. The Node size is ~600 GB, cluster size ~4,5
> TB. There is no strict time limitation as long as the data move can be done
> while the application is still online.
>
> Do you have any gut feeling how long a dc-sync of this cluster size would
> take?
>
>
>
> Regards,
>
> Christian
>
>
>
> *Von: *Rahul Singh 
> *Antworten an: *"user@cassandra.apache.org" 
> *Datum: *Donnerstag, 14. Juni 2018 um 14:21
> *An: *"user@cassandra.apache.org" , "
> user@cassandra.apache.org" 
> *Betreff: *Re: Options to replace hardware of the cluster
>
>
>
> How much daa do you have and what is the timeline? If you can manage with
> a maintenance window the snapshot / move and restore method may be the
> fastest. Streaming data can take a long time to sync two DCs if there is a
> lot of data.
>
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Jun 14, 2018, 4:11 AM -0400, Christian Lorenz <
> christian.lor...@webtrekk.com>, wrote:
>
> Hi,
>
>
>
> we need to move our existing cassandra cluster to new hardware nodes.
> Currently the cluster size is 8 members, they need to be moved to 8 new
> machines. Cassandra version in use is 3.11.1.  Unfortunately we use
> materialized views in production. I know that they have been marked
> retroactively as experimental.
>
> What is a good way to move to the new machines? One-by-One, or setup a new
> cluster as a separate DC? The move should be done without downtime of the
> application.
>
>
>
> Do you have some advice for this kind of maintenance task?
>
>
>
> Kind regards,
>
> Christian
>
>


Re: G1GC CPU Spike

2018-06-15 Thread Chris Lohfink
There are no bad GCs in the gclog (worse is like 100ms). Everything looks great 
actually from what I see. CPU utilization isn't inherently a bad thing for what 
its worth.

Chris

> On Jun 14, 2018, at 1:18 PM, rajpal reddy  wrote:
> 
> Hey Chris,
> 
> Sorry to bother you. Did you get a chance to look at the gclog file I sent 
> last night.
> 
> On Wed, Jun 13, 2018, 8:44 PM rajpal reddy  > wrote:
> Chris,
> 
> sorry attached wrong log file. attaching gc collection seconds and cpu. there 
> were going high at the same time and also attached the gc.log. grafana 
> dashboard and gc.log timing are 4hours apart gc can be see 06/12th around 
> 22:50
> 
> rate(jvm_gc_collection_seconds_sum{"}[5m])
> 
> > On Jun 13, 2018, at 5:26 PM, Chris Lohfink  > > wrote:
> > 
> > There are not even a 100ms GC pause in that, are you certain theres a 
> > problem?
> > 
> >> On Jun 13, 2018, at 3:00 PM, rajpal reddy  >> > wrote:
> >> 
> >> Thanks Chris I did attached the gc logs already. reattaching them 
> >> now.
> >> 
> >> it started yesterday around 11:54PM 
> >>> On Jun 13, 2018, at 3:56 PM, Chris Lohfink  >>> > wrote:
> >>> 
>  What is the criteria for picking up the value for G1ReservePercent?
> >>> 
> >>> 
> >>> it depends on the object allocation rate vs the size of the heap. 
> >>> Cassandra ideally would be sub 500-600mb/s allocations but it can spike 
> >>> pretty high with something like reading a wide partition or repair 
> >>> streaming which might exceed what the g1 ygcs tenuring and timing is 
> >>> prepared for from previous steady rate. Giving it a bigger buffer is a 
> >>> nice safety net for allocation spikes.
> >>> 
>  is the HEAP_NEWSIZE is required only for CMS
> >>> 
> >>> 
> >>> it should only set Xmn with that if using CMS, with G1 it should be 
> >>> ignored or else yes it would be bad to set Xmn. Giving the gc logs will 
> >>> give the results of all the bash scripts along with details of whats 
> >>> happening so its your best option if you want help to share that.
> >>> 
> >>> Chris
> >>> 
>  On Jun 13, 2018, at 12:17 PM, Subroto Barua 
>   wrote:
>  
>  Chris,
>  What is the criteria for picking up the value for G1ReservePercent?
>  
>  Subroto 
>  
> > On Jun 13, 2018, at 6:52 AM, Chris Lohfink  > > wrote:
> > 
> > G1ReservePercent
>  
>  -
>  To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>  
>  For additional commands, e-mail: user-h...@cassandra.apache.org 
>  
>  
> >>> 
> >>> 
> >>> -
> >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> >>> 
> >>> For additional commands, e-mail: user-h...@cassandra.apache.org 
> >>> 
> >>> 
> >> 
> >> 
> >> 
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> >> 
> >> For additional commands, e-mail: user-h...@cassandra.apache.org 
> >> 
> > 
> > 
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> > 
> > For additional commands, e-mail: user-h...@cassandra.apache.org 
> > 
> > 
> 



Re: High load, low IO wait, moderate CPU usage

2018-06-15 Thread Elliott Sims
Do you have an actual performance issue anywhere at the application level?
If not, I wouldn't spend too much time on it - load avg is a sort of odd
indirect metric that may or may not mean anything depending on the
situation.

On Fri, Jun 15, 2018 at 6:49 AM, Igor Leão  wrote:

> Hi there,
>
> I have a Cassandra cluster running on Kubernetes. This cluster has 8
> running instances with 8Gb of memory and 5 CPU cores. I can see a high
> load avg in multiple instances, but no IO wait and moderate CPU usage.
>
> Do you know how I can solve this issue?
>
> Best,
> Igor
>


High load, low IO wait, moderate CPU usage

2018-06-15 Thread Igor Leão
Hi there,

I have a Cassandra cluster running on Kubernetes. This cluster has 8
running instances with 8Gb of memory and 5 CPU cores. I can see a high load
avg in multiple instances, but no IO wait and moderate CPU usage.

Do you know how I can solve this issue?

Best,
Igor


Safe read request timeout value

2018-06-15 Thread Vsevolod Filaretov
Good time of day everyone,

I've got a question on timeouts setting practice.

I've got a 4-node cluster with only a handful of users, constant data
inserts and very large partitions (up to 450+mb, which is 4 times larger
than general cassandra manuals recommend). Data is held on hdd.

What are general practices on read request timeout settings?

How large is my theoretical maximum of server-side user read timeout which
does not compromise cluster availability, prioritizing inserts stability?


[no subject]

2018-06-15 Thread Vsevolod Filaretov
Good time of day everyone,

I've got three questions on Cassandra paging mechanics and cluster usage
regulation.

1) Am I correct to assume that the larger page size some user session has
set - the larger portion of cluster/coordinator node resources will be
hogged by the corresponding session?

2) Do I understand correctly that page size (imagine we have no timeout
settings) is limited by RAM and iops which I want to hand down to a single
user session?

3) Am I correct to assume that the page size/read request timeout allowance
I set is direct representation of chance to lock some node to single user's
requests?


Best regards,

Vsevolod.


Re: Options to replace hardware of the cluster

2018-06-15 Thread Christian Lorenz
Hi Rahul,

thanks for your suggestions. The Node size is ~600 GB, cluster size ~4,5 TB. 
There is no strict time limitation as long as the data move can be done while 
the application is still online.
Do you have any gut feeling how long a dc-sync of this cluster size would take?

Regards,
Christian

Von: Rahul Singh 
Antworten an: "user@cassandra.apache.org" 
Datum: Donnerstag, 14. Juni 2018 um 14:21
An: "user@cassandra.apache.org" , 
"user@cassandra.apache.org" 
Betreff: Re: Options to replace hardware of the cluster

How much daa do you have and what is the timeline? If you can manage with a 
maintenance window the snapshot / move and restore method may be the fastest. 
Streaming data can take a long time to sync two DCs if there is a lot of data.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation
On Jun 14, 2018, 4:11 AM -0400, Christian Lorenz 
, wrote:

Hi,

we need to move our existing cassandra cluster to new hardware nodes. Currently 
the cluster size is 8 members, they need to be moved to 8 new machines. 
Cassandra version in use is 3.11.1.  Unfortunately we use materialized views in 
production. I know that they have been marked retroactively as experimental.
What is a good way to move to the new machines? One-by-One, or setup a new 
cluster as a separate DC? The move should be done without downtime of the 
application.

Do you have some advice for this kind of maintenance task?

Kind regards,
Christian