[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] C. Scott Andreas updated CASSANDRA-7402: Component/s: Metrics > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement > Components: Metrics >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: ops, performance, stability > Fix For: 4.x > > Attachments: 7402.txt > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > It would be very useful for operators to see how much garbage the system is > using to handle in flight mutations and queries. > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-7402: - Reviewer: (was: Aleksey Yeschenko) > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: ops, performance, stability > Fix For: 3.x > > Attachments: 7402.txt > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > It would be very useful for operators to see how much garbage the system is > using to handle in flight mutations and queries. > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-7402: -- Reviewer: Aleksey Yeschenko Priority: Minor (was: Major) > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: ops, performance, stability > Fix For: 2.1.4 > > Attachments: 7402.txt > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > It would be very useful for operators to see how much garbage the system is > using to handle in flight mutations and queries. > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-7402: - Reviewer: (was: Aleksey Yeschenko) > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: T Jake Luciani > Labels: ops, performance, stability > Fix For: 2.1.3 > > Attachments: 7402.txt > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > It would be very useful for operators to see how much garbage the system is > using to handle in flight mutations and queries. > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-7402: -- Fix Version/s: (was: 2.1.2) 2.1.3 > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: T Jake Luciani > Labels: ops, performance, stability > Fix For: 2.1.3 > > Attachments: 7402.txt > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > It would be very useful for operators to see how much garbage the system is > using to handle in flight mutations and queries. > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-7402: - Fix Version/s: (was: 3.0) 2.1.1 > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: T Jake Luciani > Labels: ops, performance, stability > Fix For: 2.1.1 > > Attachments: 7402.txt > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > It would be very useful for operators to see how much garbage the system is > using to handle in flight mutations and queries. > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-7402: -- Reviewer: Aleksey Yeschenko > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: T Jake Luciani > Labels: ops, performance, stability > Fix For: 3.0 > > Attachments: 7402.txt > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > It would be very useful for operators to see how much garbage the system is > using to handle in flight mutations and queries. > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-7402: -- Attachment: 7402.txt Patch to add a histogram and meter for reads and writes. These metrics exist per column family and are rolled up to keyspace level. For reads the histogram track for the heap size of query responses (both per partition and across partitions (for range queries)) For writes the histogram tracks the heap size of single mutations (we already track and warn users on large batches). The meters track the aggregate heap usage of reads and writes per node. This is valuable to track since you can see that you are generating too many aggregate operations at once. I changed nodetool cfstats to expose these per column family. Most operators would want to track this stat in their system and pick values to alert on. {code} Average read response bytes per query (last five minutes): 620 Maximum read response bytes per query (last five minutes): 620 Total read response rate bytes/sec (past minute): 7836749 Total read response rate bytes/sec (past five minutes): 2027754 Average write bytes per partition (last five minutes): 620 Maximum write bytes per partition (last five minutes): 620 Total write rate bytes/sec (past minute): 2391983 Total write rate bytes/sec (past five minutes): 2940078 {code} > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: T Jake Luciani > Labels: ops, performance, stability > Fix For: 3.0 > > Attachments: 7402.txt > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > It would be very useful for operators to see how much garbage the system is > using to handle in flight mutations and queries. > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-7402: -- Description: When running a production cluster one common operational issue is quantifying GC pauses caused by ongoing requests. Since different queries return varying amount of data you can easily get your self into a situation where you Stop the world from a couple of bad actors in the system. Or more likely the aggregate garbage generated on a single node across all in flight requests causes a GC. It would be very useful for operators to see how much garbage the system is using to handle in flight mutations and queries. It would also be nice to have either a log of queries which generate the most garbage so operators can track this. Also a histogram. was: When running a production cluster one common operational issue is quantifying GC pauses caused by ongoing requests. Since different queries return varying amount of data you can easily get your self into a situation where you Stop the world from a couple of bad actors in the system. Or more likely the aggregate garbage generated on a single node across all in flight requests causes a GC. We should be able to set a limit on the max heap we can allocate to all outstanding requests and track the garbage per requests to stop this from happening. It should increase a single nodes availability substantially. In the yaml this would be {code} total_request_memory_space_mb: 400 {code} It would also be nice to have either a log of queries which generate the most garbage so operators can track this. Also a histogram. > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: T Jake Luciani > Labels: ops, performance, stability > Fix For: 3.0 > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > It would be very useful for operators to see how much garbage the system is > using to handle in flight mutations and queries. > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7402) Add metrics to track memory used by client requests
[ https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-7402: -- Summary: Add metrics to track memory used by client requests (was: limit the on heap memory available to requests) > Add metrics to track memory used by client requests > --- > > Key: CASSANDRA-7402 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7402 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: ops, performance, stability > Fix For: 3.0 > > > When running a production cluster one common operational issue is quantifying > GC pauses caused by ongoing requests. > Since different queries return varying amount of data you can easily get your > self into a situation where you Stop the world from a couple of bad actors in > the system. Or more likely the aggregate garbage generated on a single node > across all in flight requests causes a GC. > We should be able to set a limit on the max heap we can allocate to all > outstanding requests and track the garbage per requests to stop this from > happening. It should increase a single nodes availability substantially. > In the yaml this would be > {code} > total_request_memory_space_mb: 400 > {code} > It would also be nice to have either a log of queries which generate the most > garbage so operators can track this. Also a histogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)