Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
Really wild guess : do you monitor I/O performance and are positive those are the same over the past year ? (network becoming a little busier, hard drive a bit slower and so on) ? Wild guess 2 : a new 'monitoring' software (log shipping agent for instance) added meanwhile on the box ? On 11 June 2018 at 16:56, Jeff Jirsa wrote: > No > > -- > Jeff Jirsa > > > On Jun 11, 2018, at 7:49 AM, Fd Habash wrote: > > I will check for both. > > > > On a different subject, I have read some user testimonies that running > ‘nodetool cleanup’ requires a C* process reboot at least around 2.2.8. Is > this true? > > > > > > > Thank you > > > > *From: *Nitan Kainth > *Sent: *Monday, June 11, 2018 10:40 AM > *To: *user@cassandra.apache.org > *Subject: *Re: Read Latency Doubles After Shrinking Cluster and Never > Recovers > > > > I think it would because it Cassandra will process more sstables to create > response to read queries. > > > > Now after clean if the data volume is same and compaction has been > running, I can’t think of any more diagnostic step. Let’s wait for other > experts to comment. > > > > Can you also check sstable count for each table just to be sure that they > are not extraordinarily high? > > Sent from my iPhone > > > On Jun 11, 2018, at 10:21 AM, Fd Habash wrote: > > Yes we did after adding the three nodes back and a full cluster repair as > well. > > > > But even it we didn’t run cleanup, would it have impacted read latency the > fact that some nodes still have sstables that they no longer need? > > > > Thanks > > > > ---- > Thank you > > > > *From: *Nitan Kainth > *Sent: *Monday, June 11, 2018 10:18 AM > *To: *user@cassandra.apache.org > *Subject: *Re: Read Latency Doubles After Shrinking Cluster and Never > Recovers > > > > Did you run cleanup too? > > > > On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash wrote: > > I have hit dead-ends every where I turned on this issue. > > > > We had a 15-node cluster that was doing 35 ms all along for years. At > some point, we made a decision to shrink it to 13. Read latency rose to > near 70 ms. Shortly after, we decided this was not acceptable, so we added > the three nodes back in. Read latency dropped to near 50 ms and it has been > hovering around this value for over 6 months now. > > > > Repairs run regularly, load on cluster nodes is even, application > activity profile has not changed. > > > > Why are we unable to get back the same read latency now that the cluster > is 15 nodes large same as it was before? > > > > -- > > > > > Thank you > > > > > > > > >
Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
No -- Jeff Jirsa > On Jun 11, 2018, at 7:49 AM, Fd Habash wrote: > > I will check for both. > > On a different subject, I have read some user testimonies that running > ‘nodetool cleanup’ requires a C* process reboot at least around 2.2.8. Is > this true? > > > > Thank you > > From: Nitan Kainth > Sent: Monday, June 11, 2018 10:40 AM > To: user@cassandra.apache.org > Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers > > I think it would because it Cassandra will process more sstables to create > response to read queries. > > Now after clean if the data volume is same and compaction has been running, I > can’t think of any more diagnostic step. Let’s wait for other experts to > comment. > > Can you also check sstable count for each table just to be sure that they are > not extraordinarily high? > > Sent from my iPhone > > On Jun 11, 2018, at 10:21 AM, Fd Habash wrote: > > Yes we did after adding the three nodes back and a full cluster repair as > well. > > But even it we didn’t run cleanup, would it have impacted read latency the > fact that some nodes still have sstables that they no longer need? > > Thanks > > > Thank you > > From: Nitan Kainth > Sent: Monday, June 11, 2018 10:18 AM > To: user@cassandra.apache.org > Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers > > Did you run cleanup too? > > On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash wrote: > I have hit dead-ends every where I turned on this issue. > > We had a 15-node cluster that was doing 35 ms all along for years. At some > point, we made a decision to shrink it to 13. Read latency rose to near 70 > ms. Shortly after, we decided this was not acceptable, so we added the three > nodes back in. Read latency dropped to near 50 ms and it has been hovering > around this value for over 6 months now. > > Repairs run regularly, load on cluster nodes is even, application activity > profile has not changed. > > Why are we unable to get back the same read latency now that the cluster is > 15 nodes large same as it was before? > > -- > > > Thank you > > > > > >
RE: Read Latency Doubles After Shrinking Cluster and Never Recovers
I will check for both. On a different subject, I have read some user testimonies that running ‘nodetool cleanup’ requires a C* process reboot at least around 2.2.8. Is this true? Thank you From: Nitan Kainth Sent: Monday, June 11, 2018 10:40 AM To: user@cassandra.apache.org Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers I think it would because it Cassandra will process more sstables to create response to read queries. Now after clean if the data volume is same and compaction has been running, I can’t think of any more diagnostic step. Let’s wait for other experts to comment. Can you also check sstable count for each table just to be sure that they are not extraordinarily high? Sent from my iPhone On Jun 11, 2018, at 10:21 AM, Fd Habash wrote: Yes we did after adding the three nodes back and a full cluster repair as well. But even it we didn’t run cleanup, would it have impacted read latency the fact that some nodes still have sstables that they no longer need? Thanks Thank you From: Nitan Kainth Sent: Monday, June 11, 2018 10:18 AM To: user@cassandra.apache.org Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers Did you run cleanup too? On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash wrote: I have hit dead-ends every where I turned on this issue. We had a 15-node cluster that was doing 35 ms all along for years. At some point, we made a decision to shrink it to 13. Read latency rose to near 70 ms. Shortly after, we decided this was not acceptable, so we added the three nodes back in. Read latency dropped to near 50 ms and it has been hovering around this value for over 6 months now. Repairs run regularly, load on cluster nodes is even, application activity profile has not changed. Why are we unable to get back the same read latency now that the cluster is 15 nodes large same as it was before? -- Thank you
Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
I think it would because it Cassandra will process more sstables to create response to read queries. Now after clean if the data volume is same and compaction has been running, I can’t think of any more diagnostic step. Let’s wait for other experts to comment. Can you also check sstable count for each table just to be sure that they are not extraordinarily high? Sent from my iPhone > On Jun 11, 2018, at 10:21 AM, Fd Habash wrote: > > Yes we did after adding the three nodes back and a full cluster repair as > well. > > But even it we didn’t run cleanup, would it have impacted read latency the > fact that some nodes still have sstables that they no longer need? > > Thanks > > > Thank you > > From: Nitan Kainth > Sent: Monday, June 11, 2018 10:18 AM > To: user@cassandra.apache.org > Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers > > Did you run cleanup too? > > On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash wrote: > I have hit dead-ends every where I turned on this issue. > > We had a 15-node cluster that was doing 35 ms all along for years. At some > point, we made a decision to shrink it to 13. Read latency rose to near 70 > ms. Shortly after, we decided this was not acceptable, so we added the three > nodes back in. Read latency dropped to near 50 ms and it has been hovering > around this value for over 6 months now. > > Repairs run regularly, load on cluster nodes is even, application activity > profile has not changed. > > Why are we unable to get back the same read latency now that the cluster is > 15 nodes large same as it was before? > > -- > > > Thank you > > > >
RE: Read Latency Doubles After Shrinking Cluster and Never Recovers
Yes we did after adding the three nodes back and a full cluster repair as well. But even it we didn’t run cleanup, would it have impacted read latency the fact that some nodes still have sstables that they no longer need? Thanks Thank you From: Nitan Kainth Sent: Monday, June 11, 2018 10:18 AM To: user@cassandra.apache.org Subject: Re: Read Latency Doubles After Shrinking Cluster and Never Recovers Did you run cleanup too? On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash wrote: I have hit dead-ends every where I turned on this issue. We had a 15-node cluster that was doing 35 ms all along for years. At some point, we made a decision to shrink it to 13. Read latency rose to near 70 ms. Shortly after, we decided this was not acceptable, so we added the three nodes back in. Read latency dropped to near 50 ms and it has been hovering around this value for over 6 months now. Repairs run regularly, load on cluster nodes is even, application activity profile has not changed. Why are we unable to get back the same read latency now that the cluster is 15 nodes large same as it was before? -- Thank you
Re: Read Latency Doubles After Shrinking Cluster and Never Recovers
Did you run cleanup too? On Mon, Jun 11, 2018 at 10:16 AM, Fred Habash wrote: > I have hit dead-ends every where I turned on this issue. > > We had a 15-node cluster that was doing 35 ms all along for years. At > some point, we made a decision to shrink it to 13. Read latency rose to > near 70 ms. Shortly after, we decided this was not acceptable, so we added > the three nodes back in. Read latency dropped to near 50 ms and it has been > hovering around this value for over 6 months now. > > Repairs run regularly, load on cluster nodes is even, application > activity profile has not changed. > > Why are we unable to get back the same read latency now that the cluster > is 15 nodes large same as it was before? > > -- > > > Thank you > > >