[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571508#comment-16571508 ] mck commented on CASSANDRA-14435: - reviewed. +1 from me. > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature > Components: Observability >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569129#comment-16569129 ] Stefan Podkowinski commented on CASSANDRA-14435: The latest version of the code has been squashed and tested. It now basically follows the design as described in my previous post. * [github|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-14435] * [circleci|https://circleci.com/gh/spodkowinski/cassandra/382] * [dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/602/] > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature > Components: Observability >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490609#comment-16490609 ] Stefan Podkowinski commented on CASSANDRA-14435: As already pointed out in this discussion, we need to be careful to avoid contention around the JMX notification (ring) buffer. I've now changed the approach implemented in this ticket to stop broadcasting events as part of JMX notifications directly. Instead, notifications will only be used to announce the last (greatest) ID for each event type. Clients will be able to detect if new events will be available by keeping a local list of IDs and subscribe to notifications with ID updates. To make this work, IDs must be monotonically increasing and comparable (e.g. Long or TimeUUID). As notification on updated IDs will be broadcasted periodically, missing notifications isn't an issue and the full list of IDs will be received on the next broadcast interval. The actual events will be available through a standard MBean method call, which accepts the event ID of the client's last retrieved event and sends a limited number of events newer since the provided ID. This call can be remotely polled until the latest event has been retrieved. > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471772#comment-16471772 ] mck commented on CASSANDRA-14435: - {quote}That or do like in CASSANDRA-13480 where there is an operation to check recent events or something when notifications are lost. \{quote} Yes, I think this is the idea [~cnlwsu]. With CASSANDRA-13460 it'll be possible to query the list via a jmx endpoint (and a virtual table?). > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462629#comment-16462629 ] Chris Lohfink commented on CASSANDRA-14435: --- We should at least make it clear that there its just best effort and theres high likelihood of missing events to make sure people dont rely on the events for alerting or anything. That or do like in CASSANDRA-13480 where there is an operation to check recent events or something when notifications are lost. Can test with {{-Djmx.remote.x.notification.buffer.size=1}} > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462475#comment-16462475 ] Stefan Podkowinski commented on CASSANDRA-14435: {quote}While I think its a good idea here, we should probably at least have a note in yaml about enabling it may impact operational tooling if using broadcaster. With the shared event buffer (1000), the more we use it (even if no one is listening to that mbean's events) the more lost notifications will occur. On an active node we already end up losing a lot of events if the client is anywhere with relevant latency from the node. Increasing the buffer isn't really a good option as it puts massive pressure on the heap as the composite data objects (particularly streaming ones) are huge. {quote} JMX surely isn't the most scalable and robust eventing solution. But any diag. event consumers would also fall into the "operational tooling" category and tool creators and users should be aware of latency and contention based limitations. It's not ideal, but selectively sending infrequent events should hurt that much either. We can always improve form here and work on a more scalable long term solution. Maybe something based on chronicle queue with a CQL streaming extension and/or virtual tables on top. But that's not strictly related to diagnostic events and shouldn't prevent us from continue to use JMX until we have another solution. {quote}Once of the issues I can see is that events are sent on the current thread, ref NotificationBroadcasterSupport.defaultExecutor. {quote} I've pushed a commit [here|https://github.com/spodkowinski/cassandra/commit/c7df1333e84f5b91ebe61161ab4d669fe8da9b32] to share the same executor introduced in CASSANDRA-12146. > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462003#comment-16462003 ] mck commented on CASSANDRA-14435: - [~cnlwsu], thanks for highlighting past issues with jmx in CASSANDRA-13480 > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461865#comment-16461865 ] mck commented on CASSANDRA-14435: - Thanks [~cnlwsu], am reading up on it. Do you know any suggestions to improving jmx in C*, ref https://issues.apache.org/jira/browse/CASSANDRA-14346?focusedCommentId=16459583&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16459583 > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461849#comment-16461849 ] Chris Lohfink commented on CASSANDRA-14435: --- Jmx notifications are stateless with clients. It keeps a cyclic buffer of events with ids. When a polling client sends for an update it sends last if seen. If the is is no longer in buffer the values between last read and lowest are lost. In between nvm and existing events going on that buffer we frequently lose events as is. Not that we can’t use it but it’s a global limited resource that can be sensitive with higher latenciea between jmx client and server > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461804#comment-16461804 ] mck commented on CASSANDRA-14435: - {quote}With the shared event buffer (1000){quote} I'm lost, where's this buffer you mention [~cnlwsu]? Once of the issues I can see is that events are sent on the current thread, ref {{NotificationBroadcasterSupport.defaultExecutor}}. > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461508#comment-16461508 ] Chris Lohfink commented on CASSANDRA-14435: --- Would be nice to be able to enable it for other mechanisms like native transport but not JMX While I think its a good idea here, we should probably at least have a note in yaml about enabling it may impact operational tooling if using broadcaster. With the shared event buffer (1000), the more we use it (even if no one is listening to that mbean's events) the more lost notifications will occur. On an active node we already end up losing a lot of events if the client is anywhere with relevant latency from the node. Increasing the buffer isn't really a good option as it puts massive pressure on the heap as the composite data objects (particularly streaming ones) are huge. > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460990#comment-16460990 ] Stefan Podkowinski commented on CASSANDRA-14435: Quick way to give this a local test: * compile and start * jconsole localhost:7199 * Enable events if disabled in yaml: {{o.a.c.diag DiagnosticEventService}} -> {{resumePublishing()}} * Start emit dummy events: {{o.a.c.diag DummyEventEmitter}} -> {{dummyEventEmitIntervalMillis(1000)}} * Enable listening to dummy events: {{o.a.c.diag DiagnosticEvents}} -> {{enableEvents(org.apache.cassandra.diag.DummyEvent)}} * Go to {{o.a.c.diag DiagnosticEvents}} Notifications and subscribe > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.x > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org