Re: Read Latency Doubles After Shrinking Cluster and Never Recovers

2018-06-11 Thread Nicolas Guyomar
Really wild guess : do you monitor I/O performance and are positive those
are the same over the past year ? (network becoming a little busier, hard
drive a bit slower and so on) ?
Wild guess 2 : a new 'monitoring' software (log shipping agent for
instance) added meanwhile on the box ?

On 11 June 2018 at 16:56, Jeff Jirsa  wrote:

> No
>
> --
> Jeff Jirsa
>
>
> On Jun 11, 2018, at 7:49 AM, Fd Habash  wrote:
>
> I will check for both.
>
>
>
> On a different subject, I have read some user testimonies that running
> ‘nodetool cleanup’ requires a C* process reboot at least around 2.2.8. Is
> this true?
>
>
>
>
>
> 
> Thank you
>
>
>
> *From: *Nitan Kainth 
> *Sent: *Monday, June 11, 2018 10:40 AM
> *To: *user@cassandra.apache.org
> *Subject: *Re: Read Latency Doubles After Shrinking Cluster and Never
> Recovers
>
>
>
> I think it would because it Cassandra will process more sstables to create
> response to read queries.
>
>
>
> Now after clean if the data volume is same and compaction has been
> running, I can’t think of any more diagnostic step. Let’s wait for other
> experts to comment.
>
>
>
> Can you also check sstable count for each table just to be sure that they
> are not extraordinarily high?
>
> Sent from my iPhone
>
>
> On Jun 11, 2018, at 10:21 AM, Fd Habash  wrote:
>
> Yes we did after adding the three nodes back and a full cluster repair as
> well.
>
>
>
> But even it we didn’t run cleanup, would it have impacted read latency the
> fact that some nodes still have sstables that they no longer need?
>
>
>
> Thanks
>
>
>
> ----
> Thank you
>
>
>
> *From: *Nitan Kainth 
> *Sent: *Monday, June 11, 2018 10:18 AM
> *To: *user@cassandra.apache.org
> *Subject: *Re: Read Latency Doubles After Shrinking Cluster and Never
> Recovers
>
>
>
> Did you run cleanup too?
>
>
>
> On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash  wrote:
>
> I have hit dead-ends every where I turned on this issue.
>
>
>
> We had a 15-node cluster  that was doing 35 ms all along for years. At
> some point, we made a decision to shrink it to 13. Read latency rose to
> near 70 ms. Shortly after, we decided this was not acceptable, so we added
> the three nodes back in. Read latency dropped to near 50 ms and it has been
> hovering around this value for over 6 months now.
>
>
>
> Repairs run regularly, load on cluster nodes is even,  application
> activity profile has not changed.
>
>
>
> Why are we unable to get back the same read latency now that the cluster
> is 15 nodes large same as it was before?
>
>
>
> --
>
>
>
> 
> Thank you
>
>
>
>
>
>
>
>
>


Re: Read Latency Doubles After Shrinking Cluster and Never Recovers

2018-06-11 Thread Jeff Jirsa
No

-- 
Jeff Jirsa


> On Jun 11, 2018, at 7:49 AM, Fd Habash  wrote:
> 
> I will check for both.
>  
> On a different subject, I have read some user testimonies that running 
> ‘nodetool cleanup’ requires a C* process reboot at least around 2.2.8. Is 
> this true?
>  
>  
> 
> Thank you
>  
> From: Nitan Kainth
> Sent: Monday, June 11, 2018 10:40 AM
> To: user@cassandra.apache.org
> Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
>  
> I think it would because it Cassandra will process more sstables to create 
> response to read queries.
>  
> Now after clean if the data volume is same and compaction has been running, I 
> can’t think of any more diagnostic step. Let’s wait for other experts to 
> comment.
>  
> Can you also check sstable count for each table just to be sure that they are 
> not extraordinarily high?
> 
> Sent from my iPhone
> 
> On Jun 11, 2018, at 10:21 AM, Fd Habash  wrote:
> 
> Yes we did after adding the three nodes back and a full cluster repair as 
> well.
>  
> But even it we didn’t run cleanup, would it have impacted read latency the 
> fact that some nodes still have sstables that they no longer need?
>  
> Thanks
>  
> 
> Thank you
>  
> From: Nitan Kainth
> Sent: Monday, June 11, 2018 10:18 AM
> To: user@cassandra.apache.org
> Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
>  
> Did you run cleanup too? 
>  
> On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash  wrote:
> I have hit dead-ends every where I turned on this issue. 
>  
> We had a 15-node cluster  that was doing 35 ms all along for years. At some 
> point, we made a decision to shrink it to 13. Read latency rose to near 70 
> ms. Shortly after, we decided this was not acceptable, so we added the three 
> nodes back in. Read latency dropped to near 50 ms and it has been hovering 
> around this value for over 6 months now.
>  
> Repairs run regularly, load on cluster nodes is even,  application activity 
> profile has not changed. 
>  
> Why are we unable to get back the same read latency now that the cluster is 
> 15 nodes large same as it was before?
>  
> --
>  
> 
> Thank you
> 
> 
> 
>  
>  
>  


RE: Read Latency Doubles After Shrinking Cluster and Never Recovers

2018-06-11 Thread Fd Habash
I will check for both.

On a different subject, I have read some user testimonies that running 
‘nodetool cleanup’ requires a C* process reboot at least around 2.2.8. Is this 
true?



Thank you

From: Nitan Kainth
Sent: Monday, June 11, 2018 10:40 AM
To: user@cassandra.apache.org
Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers

I think it would because it Cassandra will process more sstables to create 
response to read queries.

Now after clean if the data volume is same and compaction has been running, I 
can’t think of any more diagnostic step. Let’s wait for other experts to 
comment.

Can you also check sstable count for each table just to be sure that they are 
not extraordinarily high?
Sent from my iPhone

On Jun 11, 2018, at 10:21 AM, Fd Habash  wrote:
Yes we did after adding the three nodes back and a full cluster repair as well. 
 
But even it we didn’t run cleanup, would it have impacted read latency the fact 
that some nodes still have sstables that they no longer need? 
 
Thanks 
 

Thank you
 
From: Nitan Kainth
Sent: Monday, June 11, 2018 10:18 AM
To: user@cassandra.apache.org
Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
 
Did you run cleanup too? 
 
On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash  wrote:
I have hit dead-ends every where I turned on this issue. 
 
We had a 15-node cluster  that was doing 35 ms all along for years. At some 
point, we made a decision to shrink it to 13. Read latency rose to near 70 ms. 
Shortly after, we decided this was not acceptable, so we added the three nodes 
back in. Read latency dropped to near 50 ms and it has been hovering around 
this value for over 6 months now.
 
Repairs run regularly, load on cluster nodes is even,  application activity 
profile has not changed. 
 
Why are we unable to get back the same read latency now that the cluster is 15 
nodes large same as it was before?
 
-- 
 

Thank you


 
 



Re: Read Latency Doubles After Shrinking Cluster and Never Recovers

2018-06-11 Thread Nitan Kainth
I think it would because it Cassandra will process more sstables to create 
response to read queries.

Now after clean if the data volume is same and compaction has been running, I 
can’t think of any more diagnostic step. Let’s wait for other experts to 
comment.

Can you also check sstable count for each table just to be sure that they are 
not extraordinarily high?

Sent from my iPhone

> On Jun 11, 2018, at 10:21 AM, Fd Habash  wrote:
> 
> Yes we did after adding the three nodes back and a full cluster repair as 
> well.
>  
> But even it we didn’t run cleanup, would it have impacted read latency the 
> fact that some nodes still have sstables that they no longer need?
>  
> Thanks
>  
> 
> Thank you
>  
> From: Nitan Kainth
> Sent: Monday, June 11, 2018 10:18 AM
> To: user@cassandra.apache.org
> Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
>  
> Did you run cleanup too? 
>  
> On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash  wrote:
> I have hit dead-ends every where I turned on this issue. 
>  
> We had a 15-node cluster  that was doing 35 ms all along for years. At some 
> point, we made a decision to shrink it to 13. Read latency rose to near 70 
> ms. Shortly after, we decided this was not acceptable, so we added the three 
> nodes back in. Read latency dropped to near 50 ms and it has been hovering 
> around this value for over 6 months now.
>  
> Repairs run regularly, load on cluster nodes is even,  application activity 
> profile has not changed. 
>  
> Why are we unable to get back the same read latency now that the cluster is 
> 15 nodes large same as it was before?
>  
> --
>  
> 
> Thank you
> 
> 
>  
>  


RE: Read Latency Doubles After Shrinking Cluster and Never Recovers

2018-06-11 Thread Fd Habash
Yes we did after adding the three nodes back and a full cluster repair as well. 

But even it we didn’t run cleanup, would it have impacted read latency the fact 
that some nodes still have sstables that they no longer need? 

Thanks 


Thank you

From: Nitan Kainth
Sent: Monday, June 11, 2018 10:18 AM
To: user@cassandra.apache.org
Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers

Did you run cleanup too? 

On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash  wrote:
I have hit dead-ends every where I turned on this issue. 

We had a 15-node cluster  that was doing 35 ms all along for years. At some 
point, we made a decision to shrink it to 13. Read latency rose to near 70 ms. 
Shortly after, we decided this was not acceptable, so we added the three nodes 
back in. Read latency dropped to near 50 ms and it has been hovering around 
this value for over 6 months now.

Repairs run regularly, load on cluster nodes is even,  application activity 
profile has not changed. 

Why are we unable to get back the same read latency now that the cluster is 15 
nodes large same as it was before?

-- 


Thank you





Re: Read Latency Doubles After Shrinking Cluster and Never Recovers

2018-06-11 Thread Nitan Kainth
Did you run cleanup too?

On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash  wrote:

> I have hit dead-ends every where I turned on this issue.
>
> We had a 15-node cluster  that was doing 35 ms all along for years. At
> some point, we made a decision to shrink it to 13. Read latency rose to
> near 70 ms. Shortly after, we decided this was not acceptable, so we added
> the three nodes back in. Read latency dropped to near 50 ms and it has been
> hovering around this value for over 6 months now.
>
> Repairs run regularly, load on cluster nodes is even,  application
> activity profile has not changed.
>
> Why are we unable to get back the same read latency now that the cluster
> is 15 nodes large same as it was before?
>
> --
>
> 
> Thank you
>
>
>