[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation

2018-11-18 Thread C. Scott Andreas (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-3670:

Component/s: Observability

> provide "red flags" JMX instrumentation
> ---
>
> Key: CASSANDRA-3670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Peter Schuller
>Priority: Minor
>
> As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
> certain information which is almost without exception indicative of something 
> being wrong with the node or cluster.
> In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
> Other examples include:
> * Number of times the selection of files to compact was adjusted due to disk 
> space heuristics
> * Number of times compaction has failed
> * Any I/O error reading from or writing to disk (the work here is collecting, 
> not exposing, so maybe not in an initial version)
> * Any data skipped due to checksum mismatches (when checksumming is being 
> used); e.g., "number of skips".
> * Any arbitrary exception at least in certain code paths (compaction, scrub, 
> cleanup for starters)
> Probably other things.
> The motivation is that if we have clear and obvious indications that 
> something truly is wrong, it seems suboptimal to just leave that information 
> in the log somewhere, for someone to discover later when something else broke 
> as a result and a human investigates. You might argue that one should use 
> non-trivial log analysis to detect these things, but I highly doubt a lot of 
> people do this and it seems very wasteful to require that in comparison to 
> just providing the MBean.
> It is important to note that the *lack* of a certain problem being advertised 
> in this MBean is not supposed to be indicative of a *lack* of a problem. 
> Rather, the point is that to the extent we can easily do so, it is nice to 
> have a clear method of communicating to monitoring systems where there *is* a 
> clear indication of something being wrong.
> The main part of this ticket is not to cover everything under the sun, but 
> rather to reach agreement on adding an MBean where these types of indicators 
> can be collected. Individual counters can then be added over time as one 
> thinks of them.
> I propose:
> * Create an org.apache.cassandra.db.RedFlags MBean
> * Populate with a few things to begin with.
> I'll submit the patch if there is agreement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-3670) provide red flags JMX instrumentation

2014-02-27 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3670:
--

Reviewer:   (was: Brandon Williams)
Assignee: (was: Tyler Hobbs)

 provide red flags JMX instrumentation
 ---

 Key: CASSANDRA-3670
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Priority: Minor

 As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
 certain information which is almost without exception indicative of something 
 being wrong with the node or cluster.
 In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
 Other examples include:
 * Number of times the selection of files to compact was adjusted due to disk 
 space heuristics
 * Number of times compaction has failed
 * Any I/O error reading from or writing to disk (the work here is collecting, 
 not exposing, so maybe not in an initial version)
 * Any data skipped due to checksum mismatches (when checksumming is being 
 used); e.g., number of skips.
 * Any arbitrary exception at least in certain code paths (compaction, scrub, 
 cleanup for starters)
 Probably other things.
 The motivation is that if we have clear and obvious indications that 
 something truly is wrong, it seems suboptimal to just leave that information 
 in the log somewhere, for someone to discover later when something else broke 
 as a result and a human investigates. You might argue that one should use 
 non-trivial log analysis to detect these things, but I highly doubt a lot of 
 people do this and it seems very wasteful to require that in comparison to 
 just providing the MBean.
 It is important to note that the *lack* of a certain problem being advertised 
 in this MBean is not supposed to be indicative of a *lack* of a problem. 
 Rather, the point is that to the extent we can easily do so, it is nice to 
 have a clear method of communicating to monitoring systems where there *is* a 
 clear indication of something being wrong.
 The main part of this ticket is not to cover everything under the sun, but 
 rather to reach agreement on adding an MBean where these types of indicators 
 can be collected. Individual counters can then be added over time as one 
 thinks of them.
 I propose:
 * Create an org.apache.cassandra.db.RedFlags MBean
 * Populate with a few things to begin with.
 I'll submit the patch if there is agreement.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-3670) provide red flags JMX instrumentation

2013-07-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3670:
--

Reviewer: brandon.williams  (was: slebresne)
Assignee: Tyler Hobbs  (was: Peter Schuller)

WDYT [~thobbs]?

 provide red flags JMX instrumentation
 ---

 Key: CASSANDRA-3670
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Tyler Hobbs
Priority: Minor

 As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
 certain information which is almost without exception indicative of something 
 being wrong with the node or cluster.
 In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
 Other examples include:
 * Number of times the selection of files to compact was adjusted due to disk 
 space heuristics
 * Number of times compaction has failed
 * Any I/O error reading from or writing to disk (the work here is collecting, 
 not exposing, so maybe not in an initial version)
 * Any data skipped due to checksum mismatches (when checksumming is being 
 used); e.g., number of skips.
 * Any arbitrary exception at least in certain code paths (compaction, scrub, 
 cleanup for starters)
 Probably other things.
 The motivation is that if we have clear and obvious indications that 
 something truly is wrong, it seems suboptimal to just leave that information 
 in the log somewhere, for someone to discover later when something else broke 
 as a result and a human investigates. You might argue that one should use 
 non-trivial log analysis to detect these things, but I highly doubt a lot of 
 people do this and it seems very wasteful to require that in comparison to 
 just providing the MBean.
 It is important to note that the *lack* of a certain problem being advertised 
 in this MBean is not supposed to be indicative of a *lack* of a problem. 
 Rather, the point is that to the extent we can easily do so, it is nice to 
 have a clear method of communicating to monitoring systems where there *is* a 
 clear indication of something being wrong.
 The main part of this ticket is not to cover everything under the sun, but 
 rather to reach agreement on adding an MBean where these types of indicators 
 can be collected. Individual counters can then be added over time as one 
 thinks of them.
 I propose:
 * Create an org.apache.cassandra.db.RedFlags MBean
 * Populate with a few things to begin with.
 I'll submit the patch if there is agreement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-3670) provide red flags JMX instrumentation

2011-12-23 Thread Peter Schuller (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller updated CASSANDRA-3670:
--

Reviewer: slebresne

 provide red flags JMX instrumentation
 ---

 Key: CASSANDRA-3670
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller
Priority: Minor

 As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
 certain information which is almost without exception indicative of something 
 being wrong with the node or cluster.
 In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
 Other examples include:
 * Number of times the selection of files to compact was adjusted due to disk 
 space heuristics
 * Number of times compaction has failed
 * Any I/O error reading from or writing to disk (the work here is collecting, 
 not exposing, so maybe not in an initial version)
 * Any data skipped due to checksum mismatches (when checksumming is being 
 used); e.g., number of skips.
 * Any arbitrary exception at least in certain code paths (compaction, scrub, 
 cleanup for starters)
 Probably other things.
 The motivation is that if we have clear and obvious indications that 
 something truly is wrong, it seems suboptimal to just leave that information 
 in the log somewhere, for someone to discover later when something else broke 
 as a result and a human investigates. You might argue that one should use 
 non-trivial log analysis to detect these things, but I highly doubt a lot of 
 people do this and it seems very wasteful to require that in comparison to 
 just providing the MBean.
 It is important to note that the *lack* of a certain problem being advertised 
 in this MBean is not supposed to be indicative of a *lack* of a problem. 
 Rather, the point is that to the extent we can easily do so, it is nice to 
 have a clear method of communicating to monitoring systems where there *is* a 
 clear indication of something being wrong.
 The main part of this ticket is not to cover everything under the sun, but 
 rather to reach agreement on adding an MBean where these types of indicators 
 can be collected. Individual counters can then be added over time as one 
 thinks of them.
 I propose:
 * Create an org.apache.cassandra.db.RedFlags MBean
 * Populate with a few things to begin with.
 I'll submit the patch if there is agreement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira