Re: Overload because of hint pressure + MVs

2020-02-11 Thread Surbhi Gupta
We are using G1 ...

On Tue, 11 Feb 2020 at 08:51, Reid Pinchback 
wrote:

> A caveat to the 31GB recommendation for G1GC.  If you have tight latency
> SLAs instead of throughput SLAs then this doesn’t necessary pan out to be
> beneficial.
>
>
>
> Yes the GCs are less frequent, but they can hurt more when they do happen.
> The win is if your usage pattern is such that the added time helps you
> squeak past deciding copying into old gen when a smaller heap/more frequent
> GC cycle would have decided it had to do promotions.  C* tends to have a
> lot of medium-lifetime objects on the heap so it can really come down to
> the specifics of what your clients are typically doing.
>
>
>
> Also, reallocation of RAM from O/S buffer cache to Java heap will also
> change the dynamics of dirty page flushes from your writes, which again
> directly surfaces in C* read latency numbers during I/O stalls from the
> write spikes in the background.  So really bumping up heap is an alteration
> that can be a double-whammy for the latency sensitive.  Those only caring
> about throughput won’t care and it’s probably unconditionally a win to go
> to 31GB.
>
>
>
> R
>
>
>
> *From: *Erick Ramirez 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, February 10, 2020 at 3:55 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: Overload because of hint pressure + MVs
>
>
>
> *Message from External Sender*
>
> Currently the value of phi_convict_threshold is not set which makes it to
> 8 (default) .
> Can this also cause hints buildup even when we can see that all nodes are
> UP ?
>
>
>
> You can bump it up to 12 to reduce the sensitivity but it's likely GC
> pauses causing it. Phi convict is the side-effect, not the cause.
>
>
>
> Just to add , we are using 24GB heap size.
>
>
>
> Are you using CMS? If using G1, I'd recommend bumping it up to 31GB if the
> servers have 40+ GB of RAM. Cheers!
>


Re: Overload because of hint pressure + MVs

2020-02-11 Thread Reid Pinchback
A caveat to the 31GB recommendation for G1GC.  If you have tight latency SLAs 
instead of throughput SLAs then this doesn’t necessary pan out to be beneficial.

Yes the GCs are less frequent, but they can hurt more when they do happen. The 
win is if your usage pattern is such that the added time helps you squeak past 
deciding copying into old gen when a smaller heap/more frequent GC cycle would 
have decided it had to do promotions.  C* tends to have a lot of 
medium-lifetime objects on the heap so it can really come down to the specifics 
of what your clients are typically doing.

Also, reallocation of RAM from O/S buffer cache to Java heap will also change 
the dynamics of dirty page flushes from your writes, which again directly 
surfaces in C* read latency numbers during I/O stalls from the write spikes in 
the background.  So really bumping up heap is an alteration that can be a 
double-whammy for the latency sensitive.  Those only caring about throughput 
won’t care and it’s probably unconditionally a win to go to 31GB.

R

From: Erick Ramirez 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, February 10, 2020 at 3:55 PM
To: "user@cassandra.apache.org" 
Subject: Re: Overload because of hint pressure + MVs

Message from External Sender
Currently the value of phi_convict_threshold is not set which makes it to 8 
(default) .
Can this also cause hints buildup even when we can see that all nodes are UP ?

You can bump it up to 12 to reduce the sensitivity but it's likely GC pauses 
causing it. Phi convict is the side-effect, not the cause.

Just to add , we are using 24GB heap size.

Are you using CMS? If using G1, I'd recommend bumping it up to 31GB if the 
servers have 40+ GB of RAM. Cheers!


Re: Overload because of hint pressure + MVs

2020-02-10 Thread Erick Ramirez
>
> Currently the value of phi_convict_threshold is not set which makes it to
> 8 (default) .
> Can this also cause hints buildup even when we can see that all nodes are
> UP ?


You can bump it up to 12 to reduce the sensitivity but it's likely GC
pauses causing it. Phi convict is the side-effect, not the cause.

Just to add , we are using 24GB heap size.


Are you using CMS? If using G1, I'd recommend bumping it up to 31GB if the
servers have 40+ GB of RAM. Cheers!


Re: Overload because of hint pressure + MVs

2020-02-10 Thread Surbhi Gupta
Just to add , we are using 24GB heap size.

On Mon, 10 Feb 2020 at 09:08, Surbhi Gupta  wrote:

> Hi Jon,
>
> We are on multi datacenter(On Prim) setup.
> We also noticed too many messages like below:
>
> DEBUG [GossipStage:1] 2020-02-10 09:38:52,953 FailureDetector.java:457 -
> Ignoring interval time of 3258125997 for /10.x.x.x
>
> DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 -
> Ignoring interval time of 2045630029 for /10.y.y.y
>
> DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 -
> Ignoring interval time of 2045416737 for /10.z.z.z
>
>
>
> Currently the value of phi_convict_threshold is not set which makes it to
> 8 (default) .
> Can this also cause hints buildup even when we can see that all nodes are
> UP ?
> Recommended value of phi_convict_threshold  is 12 in AWS multi datacenter
> environment.
>
> Thanks
> Surbhi
>
> On Sun, 9 Feb 2020 at 21:42, Surbhi Gupta 
> wrote:
>
>> Thanks a lot Jon..
>> Will try the recommendations and let you know the results
>>
>> On Fri, Feb 7, 2020 at 10:52 AM Jon Haddad  wrote:
>>
>>> There's a few things you can do here that might help.
>>>
>>> First off, if you're using the default heap settings, that's a serious
>>> problem.  If you've got the head room, my recommendation is to use 16GB
>>> heap with 12 GB new gen and pin your memtable heap space to 2GB.  Set your
>>> max tenuring threshold to 6 and your survivor ratio to 6.  You don't need a
>>> lot of old gen space with cassandra, almost everything that will show up
>>> there is memtable related, and we allocate a *lot* whenever we read data
>>> off disk.
>>>
>>> Most folks use the default disk read ahead setting of 128KB.  You can
>>> check this setting using blockdev --report, under the RA column.  You'll
>>> see 256 there, that's in 512 byte sectors.  MVs rely on a read before a
>>> write, so for every read off disk you do, you'll pull additional 128KB into
>>> your page cache.  This is usually a waste and puts WAY too much pressure on
>>> your disk.  On SSD, I always change this to 4KB.
>>>
>>> Next, be sure you're setting your compression rate accordingly.  I wrote
>>> a long post on the topic here:
>>> https://thelastpickle.com/blog/2018/08/08/compression_performance.html.
>>> Our default compression is very unfriendly for read heavy workloads if
>>> you're reading small rows.  If your records are small, 4KB compression
>>> chunk length is your friend.
>>>
>>> I have some slides showing pretty good performance improvements from the
>>> above 2 changes.  Specifically, I went from 16K reads a second at 180ms p99
>>> latency up to 63K reads / second at 21ms p99.  Disk usage dropped by a
>>> factor of 10.  Throw in those JVM changes I recommended and things should
>>> improve even further.
>>>
>>> Generally speaking, I recommend avoiding MVs, as they can be a giant
>>> mine if you aren't careful.  They're not doing any magic behind the scenes
>>> that makes scaling easier, and in a lot of cases they're a hinderance.  You
>>> still need to understand the underlying data and how it's laid out to use
>>> them properly, which is 99% of the work.
>>>
>>> Jon
>>>
>>> On Fri, Feb 7, 2020 at 10:32 AM Michael Shuler 
>>> wrote:
>>>
 That JIRA still says Open, so no, it has not been fixed (unless there's
 a fixed duplicate in JIRA somewhere).

 For clarification, you could update that ticket with a comment
 including
 your environmental details, usage of MV, etc. I'll bump the priority up
 and include some possible branchX fixvers.

 Michael

 On 2/7/20 10:53 AM, Surbhi Gupta wrote:
 > Hi,
 >
 > We are getting hit by the below bug.
 > Other than lowering hinted_handoff_throttle_in_kb to 100 any other
 work
 > around ?
 >
 > https://issues.apache.org/jira/browse/CASSANDRA-13810
 >
 > Any idea if it got fixed in later version.
 > We are on Open source Cassandra 3.11.1  .
 >
 > Thanks
 > Surbhi
 >
 >

 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org




Re: Overload because of hint pressure + MVs

2020-02-10 Thread Surbhi Gupta
Hi Jon,

We are on multi datacenter(On Prim) setup.
We also noticed too many messages like below:

DEBUG [GossipStage:1] 2020-02-10 09:38:52,953 FailureDetector.java:457 -
Ignoring interval time of 3258125997 for /10.x.x.x

DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 -
Ignoring interval time of 2045630029 for /10.y.y.y

DEBUG [GossipStage:1] 2020-02-10 09:38:52,954 FailureDetector.java:457 -
Ignoring interval time of 2045416737 for /10.z.z.z



Currently the value of phi_convict_threshold is not set which makes it to 8
(default) .
Can this also cause hints buildup even when we can see that all nodes are
UP ?
Recommended value of phi_convict_threshold  is 12 in AWS multi datacenter
environment.

Thanks
Surbhi

On Sun, 9 Feb 2020 at 21:42, Surbhi Gupta  wrote:

> Thanks a lot Jon..
> Will try the recommendations and let you know the results
>
> On Fri, Feb 7, 2020 at 10:52 AM Jon Haddad  wrote:
>
>> There's a few things you can do here that might help.
>>
>> First off, if you're using the default heap settings, that's a serious
>> problem.  If you've got the head room, my recommendation is to use 16GB
>> heap with 12 GB new gen and pin your memtable heap space to 2GB.  Set your
>> max tenuring threshold to 6 and your survivor ratio to 6.  You don't need a
>> lot of old gen space with cassandra, almost everything that will show up
>> there is memtable related, and we allocate a *lot* whenever we read data
>> off disk.
>>
>> Most folks use the default disk read ahead setting of 128KB.  You can
>> check this setting using blockdev --report, under the RA column.  You'll
>> see 256 there, that's in 512 byte sectors.  MVs rely on a read before a
>> write, so for every read off disk you do, you'll pull additional 128KB into
>> your page cache.  This is usually a waste and puts WAY too much pressure on
>> your disk.  On SSD, I always change this to 4KB.
>>
>> Next, be sure you're setting your compression rate accordingly.  I wrote
>> a long post on the topic here:
>> https://thelastpickle.com/blog/2018/08/08/compression_performance.html.
>> Our default compression is very unfriendly for read heavy workloads if
>> you're reading small rows.  If your records are small, 4KB compression
>> chunk length is your friend.
>>
>> I have some slides showing pretty good performance improvements from the
>> above 2 changes.  Specifically, I went from 16K reads a second at 180ms p99
>> latency up to 63K reads / second at 21ms p99.  Disk usage dropped by a
>> factor of 10.  Throw in those JVM changes I recommended and things should
>> improve even further.
>>
>> Generally speaking, I recommend avoiding MVs, as they can be a giant mine
>> if you aren't careful.  They're not doing any magic behind the scenes that
>> makes scaling easier, and in a lot of cases they're a hinderance.  You
>> still need to understand the underlying data and how it's laid out to use
>> them properly, which is 99% of the work.
>>
>> Jon
>>
>> On Fri, Feb 7, 2020 at 10:32 AM Michael Shuler 
>> wrote:
>>
>>> That JIRA still says Open, so no, it has not been fixed (unless there's
>>> a fixed duplicate in JIRA somewhere).
>>>
>>> For clarification, you could update that ticket with a comment including
>>> your environmental details, usage of MV, etc. I'll bump the priority up
>>> and include some possible branchX fixvers.
>>>
>>> Michael
>>>
>>> On 2/7/20 10:53 AM, Surbhi Gupta wrote:
>>> > Hi,
>>> >
>>> > We are getting hit by the below bug.
>>> > Other than lowering hinted_handoff_throttle_in_kb to 100 any other
>>> work
>>> > around ?
>>> >
>>> > https://issues.apache.org/jira/browse/CASSANDRA-13810
>>> >
>>> > Any idea if it got fixed in later version.
>>> > We are on Open source Cassandra 3.11.1  .
>>> >
>>> > Thanks
>>> > Surbhi
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>


Re: Overload because of hint pressure + MVs

2020-02-09 Thread Surbhi Gupta
Thanks a lot Jon..
Will try the recommendations and let you know the results

On Fri, Feb 7, 2020 at 10:52 AM Jon Haddad  wrote:

> There's a few things you can do here that might help.
>
> First off, if you're using the default heap settings, that's a serious
> problem.  If you've got the head room, my recommendation is to use 16GB
> heap with 12 GB new gen and pin your memtable heap space to 2GB.  Set your
> max tenuring threshold to 6 and your survivor ratio to 6.  You don't need a
> lot of old gen space with cassandra, almost everything that will show up
> there is memtable related, and we allocate a *lot* whenever we read data
> off disk.
>
> Most folks use the default disk read ahead setting of 128KB.  You can
> check this setting using blockdev --report, under the RA column.  You'll
> see 256 there, that's in 512 byte sectors.  MVs rely on a read before a
> write, so for every read off disk you do, you'll pull additional 128KB into
> your page cache.  This is usually a waste and puts WAY too much pressure on
> your disk.  On SSD, I always change this to 4KB.
>
> Next, be sure you're setting your compression rate accordingly.  I wrote a
> long post on the topic here:
> https://thelastpickle.com/blog/2018/08/08/compression_performance.html.
> Our default compression is very unfriendly for read heavy workloads if
> you're reading small rows.  If your records are small, 4KB compression
> chunk length is your friend.
>
> I have some slides showing pretty good performance improvements from the
> above 2 changes.  Specifically, I went from 16K reads a second at 180ms p99
> latency up to 63K reads / second at 21ms p99.  Disk usage dropped by a
> factor of 10.  Throw in those JVM changes I recommended and things should
> improve even further.
>
> Generally speaking, I recommend avoiding MVs, as they can be a giant mine
> if you aren't careful.  They're not doing any magic behind the scenes that
> makes scaling easier, and in a lot of cases they're a hinderance.  You
> still need to understand the underlying data and how it's laid out to use
> them properly, which is 99% of the work.
>
> Jon
>
> On Fri, Feb 7, 2020 at 10:32 AM Michael Shuler 
> wrote:
>
>> That JIRA still says Open, so no, it has not been fixed (unless there's
>> a fixed duplicate in JIRA somewhere).
>>
>> For clarification, you could update that ticket with a comment including
>> your environmental details, usage of MV, etc. I'll bump the priority up
>> and include some possible branchX fixvers.
>>
>> Michael
>>
>> On 2/7/20 10:53 AM, Surbhi Gupta wrote:
>> > Hi,
>> >
>> > We are getting hit by the below bug.
>> > Other than lowering hinted_handoff_throttle_in_kb to 100 any other work
>> > around ?
>> >
>> > https://issues.apache.org/jira/browse/CASSANDRA-13810
>> >
>> > Any idea if it got fixed in later version.
>> > We are on Open source Cassandra 3.11.1  .
>> >
>> > Thanks
>> > Surbhi
>> >
>> >
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>


Re: Overload because of hint pressure + MVs

2020-02-07 Thread Jon Haddad
There's a few things you can do here that might help.

First off, if you're using the default heap settings, that's a serious
problem.  If you've got the head room, my recommendation is to use 16GB
heap with 12 GB new gen and pin your memtable heap space to 2GB.  Set your
max tenuring threshold to 6 and your survivor ratio to 6.  You don't need a
lot of old gen space with cassandra, almost everything that will show up
there is memtable related, and we allocate a *lot* whenever we read data
off disk.

Most folks use the default disk read ahead setting of 128KB.  You can check
this setting using blockdev --report, under the RA column.  You'll see 256
there, that's in 512 byte sectors.  MVs rely on a read before a write, so
for every read off disk you do, you'll pull additional 128KB into your page
cache.  This is usually a waste and puts WAY too much pressure on your
disk.  On SSD, I always change this to 4KB.

Next, be sure you're setting your compression rate accordingly.  I wrote a
long post on the topic here:
https://thelastpickle.com/blog/2018/08/08/compression_performance.html.
Our default compression is very unfriendly for read heavy workloads if
you're reading small rows.  If your records are small, 4KB compression
chunk length is your friend.

I have some slides showing pretty good performance improvements from the
above 2 changes.  Specifically, I went from 16K reads a second at 180ms p99
latency up to 63K reads / second at 21ms p99.  Disk usage dropped by a
factor of 10.  Throw in those JVM changes I recommended and things should
improve even further.

Generally speaking, I recommend avoiding MVs, as they can be a giant mine
if you aren't careful.  They're not doing any magic behind the scenes that
makes scaling easier, and in a lot of cases they're a hinderance.  You
still need to understand the underlying data and how it's laid out to use
them properly, which is 99% of the work.

Jon

On Fri, Feb 7, 2020 at 10:32 AM Michael Shuler 
wrote:

> That JIRA still says Open, so no, it has not been fixed (unless there's
> a fixed duplicate in JIRA somewhere).
>
> For clarification, you could update that ticket with a comment including
> your environmental details, usage of MV, etc. I'll bump the priority up
> and include some possible branchX fixvers.
>
> Michael
>
> On 2/7/20 10:53 AM, Surbhi Gupta wrote:
> > Hi,
> >
> > We are getting hit by the below bug.
> > Other than lowering hinted_handoff_throttle_in_kb to 100 any other work
> > around ?
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-13810
> >
> > Any idea if it got fixed in later version.
> > We are on Open source Cassandra 3.11.1  .
> >
> > Thanks
> > Surbhi
> >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Overload because of hint pressure + MVs

2020-02-07 Thread Michael Shuler
That JIRA still says Open, so no, it has not been fixed (unless there's 
a fixed duplicate in JIRA somewhere).


For clarification, you could update that ticket with a comment including 
your environmental details, usage of MV, etc. I'll bump the priority up 
and include some possible branchX fixvers.


Michael

On 2/7/20 10:53 AM, Surbhi Gupta wrote:

Hi,

We are getting hit by the below bug.
Other than lowering hinted_handoff_throttle_in_kb to 100 any other work 
around ?


https://issues.apache.org/jira/browse/CASSANDRA-13810

Any idea if it got fixed in later version.
We are on Open source Cassandra 3.11.1  .

Thanks
Surbhi




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org