[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862660#comment-15862660 ] Chris Lohfink commented on CASSANDRA-12876: --- +1 from me, could reproduce the negative means before and couldn't afterwards. > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO >Assignee: Per Otterström > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: 12876-2.2.txt, 12876-2.2-v2.txt, > negative_mean_details.PNG, negative_mean_periodicity.PNG, negative_mean.png, > negative_mean_read_latency.png > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862627#comment-15862627 ] Chris Lohfink commented on CASSANDRA-12876: --- I actually missed that I was reviewer, I'm testing now. > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO >Assignee: Per Otterström > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: 12876-2.2.txt, 12876-2.2-v2.txt, > negative_mean_details.PNG, negative_mean_periodicity.PNG, negative_mean.png, > negative_mean_read_latency.png > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862229#comment-15862229 ] Jeff Jirsa commented on CASSANDRA-12876: [~cnlwsu] still on your radar? > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO >Assignee: Per Otterström > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: 12876-2.2.txt, 12876-2.2-v2.txt, > negative_mean_details.PNG, negative_mean_periodicity.PNG, negative_mean.png, > negative_mean_read_latency.png > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769926#comment-15769926 ] Aleksandr Ivanov commented on CASSANDRA-12876: -- Same problem with mean Read Latency on v3.0.9 > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO >Assignee: Per Otterström > Fix For: 2.2.9 > > Attachments: 12876-2.2.txt, negative_mean.png, > negative_mean_details.PNG, negative_mean_periodicity.PNG > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663718#comment-15663718 ] Per Otterström commented on CASSANDRA-12876: Overflow don't happen at the start of the 30 minute cycle because 1) it takes a while to build up high enough values in the individual buckets and 2) values collected at the end of the 30-minute cycle will have much higher forward-decay weight which cause bucket values to build up much faster. In your case you are seeing this at the last 3 minutes of the 30-minute cycle, because only then will the forward-decay weight be high enough. > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO >Assignee: Per Otterström > Fix For: 2.2.9 > > Attachments: 12876-2.2.txt, negative_mean.png, > negative_mean_details.PNG, negative_mean_periodicity.PNG > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658190#comment-15658190 ] Kévin LOVATO commented on CASSANDRA-12876: -- I updated a node of a test cluster with your patch and it works, no more negative values, thanks :) The code in the patch was clear, but I'm a bit confused regarding your explanation. You mention that the overflow happens because of the addition of all the buckets during the mean computation, only to be fixed by the rescaling (that happens every 30 minutes). But then why do I only observe negative values every 30 minutes ? Shouldn't I observe negative values all the time, and positive ones just after the rescale ? > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO >Assignee: Per Otterström > Fix For: 2.2.9 > > Attachments: 12876-2.2.txt, negative_mean.png, > negative_mean_details.PNG, negative_mean_periodicity.PNG > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655141#comment-15655141 ] Per Otterström commented on CASSANDRA-12876: Thanks for the input @alprema! So, it appears that overflow doesn't happen during rescale, but when mean value is calculated. Triggered when all bucket values are added together. The sudden stop of negative values happened when rescale was triggerd. The attached patch will perform a rescale of the snapshot values when the snapshot is created, thereby avoiding overflow. As a side effect, this will make the max value to behave better. Before max value would continue to raise and then get a sudden drop every 30 minutes when rescale happend. Now, max value will appear to decay on a minute basis, whether value is peaking just before or after rescale. The patch includes a missing part from CASSANDRA-11752 that never were merged into 2.2 branch. It seem to apply clean on 3.0.x as well. > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO >Assignee: Per Otterström > Attachments: 12876-2.2.txt, negative_mean.png, > negative_mean_details.PNG, negative_mean_periodicity.PNG > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651382#comment-15651382 ] Kévin LOVATO commented on CASSANDRA-12876: -- I looked at the code as well and can't see an obvious way to have negative values in decayingBuckets, even after a rescale(). Also, I noticed that the time during which the metric is negative lasts for about 3 minutes and it goes down under zero multiple times during this period of time (cf. [^negative_mean_details.PNG]), where I would expect to see only one dip under zero if it was caused by the rescale. Hope those details will put you in the right direction. > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO >Assignee: Per Otterström > Attachments: negative_mean.png, negative_mean_details.PNG, > negative_mean_periodicity.PNG > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646791#comment-15646791 ] Per Otterström commented on CASSANDRA-12876: Yes, must be related to the 30 minute rescaling. I'm able o reproduce in a test cluster. I'm seeing a similar pattern for the standard deviation as well. Can't make out why this happens just by reviewing code, but I'll run some more tests. > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO > Attachments: negative_mean.png, negative_mean_periodicity.PNG > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633356#comment-15633356 ] Chris Lohfink commented on CASSANDRA-12876: --- [~eperott] any ideas? > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO > Attachments: negative_mean.png, negative_mean_periodicity.PNG > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12876) Negative mean write latency
[ https://issues.apache.org/jira/browse/CASSANDRA-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633349#comment-15633349 ] Chris Lohfink commented on CASSANDRA-12876: --- I am certain this is related to CASSANDRA-11752 and its 30 min rescaling. if you access the values() of the histogram directly you wont see this for what its worth > Negative mean write latency > --- > > Key: CASSANDRA-12876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12876 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Kévin LOVATO > Attachments: negative_mean.png, negative_mean_periodicity.PNG > > > The mean write latency returned by JMX turns negative every 30 minutes. As > the attached screenshots show, the value turns negative every 30 minutes > after the startup of the node. > We did not experience this behavior in 2.1.16. -- This message was sent by Atlassian JIRA (v6.3.4#6332)