[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547819#comment-15547819 ] Alex Petrov commented on CASSANDRA-12461: - Thank you [~Stefania] & [~acoz]! > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Improvement > Components: Lifecycle >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.10 > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547340#comment-15547340 ] Stefania commented on CASSANDRA-12461: -- Thanks for rebasing. Committed to 3.X as 9fd0d0747bd46a09ec3567f4ec94fa3d63eca9aa and merged into trunk. There were CI failures but they should all be accounted for, if we include the dtest pr for CASSANDRA-12509, except perhaps for {{SnitchConfigurationUpdateTest.test_rf_collapse_property_file_snitch}}, but it passes locally and I really don't see how this patch would break it, since without hooks this patch is a no-op. For trunk, testall is reported as a failed build, but the only failed test is a failure that happens also on unpatched trunk and I verified locally that there are no eclipse warnings or other reasons for the build to be marked failed. Also, according to the console [output|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-testall/3/console], it failed because of the timeout in long-test, which caused the single known failure. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Improvement >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545146#comment-15545146 ] Alex Petrov commented on CASSANDRA-12461: - Rebased on top of latest {{3.X}} and {{trunk}} (it'd cleanly merge from {{3.X}}, trunk is mostly for CI purposes): |[12461-trunk|https://github.com/ifesdjeen/cassandra/tree/12461-trunk]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-testall/]| |[12461-3.X|https://github.com/ifesdjeen/cassandra/tree/12461-3.X]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-3.X-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-3.X-testall/]| > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Improvement >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15544086#comment-15544086 ] Stefania commented on CASSANDRA-12461: -- CASSANDRA-12509 has been committed, we can rebase this patch on cassandra-3.X. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532870#comment-15532870 ] Alex Petrov commented on CASSANDRA-12461: - Right, [CASSANDRA-12011] is contained. I've already created patch in [CASSANDRA-12509], but anything would work for me) > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532856#comment-15532856 ] Jeremiah Jordan commented on CASSANDRA-12461: - If we put the hooks in 3.0 we can use them to fix CASSANDRA-12011. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532753#comment-15532753 ] Alex Petrov commented on CASSANDRA-12461: - Moved the bugfix to [CASSANDRA-12509], will have to rebase after it's committed. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532397#comment-15532397 ] Aleksey Yeschenko commented on CASSANDRA-12461: --- I'd only leave the bug fix for 3.0. To make it less confusing, pull them all into a separate ticket, this one only adding the hooks. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531998#comment-15531998 ] Stefania commented on CASSANDRA-12461: -- +1 for committing the full patch to 3.0 since it fixes a number issues, including making problems like CASSANDRA-12397 much harder to reproduce. I'll do another quick round tomorrow, and if CI results are clean, we can commit. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531974#comment-15531974 ] Alex Petrov commented on CASSANDRA-12461: - +1 on changes. The initial patch was marked as a "bug", I've re-labeled it as an "improvement". But since we have fixed several issues with drain process ([CASSANDRA-12509], for example), might be good to have it in 3.0? Or should I remove hooks, only leave bugfix for 3.0 patch (will be a trivial change, too), just don't want to create confusion.. Backport to 3.0 was trivial, but we can just skip committing if anything. CI triggered for both branches (although your latest branch only squashed, without any changes on top). |[12461-trunk|https://github.com/ifesdjeen/cassandra/tree/12461-trunk]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-testall/]| |[12461-trunk|https://github.com/ifesdjeen/cassandra/tree/12461-trunk]|[3.0|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-3.0/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-testall/]||[12461-3.0|https://github.com/ifesdjeen/cassandra/tree/12461-3.0]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-3.0-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-3.0-testall/]| > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531686#comment-15531686 ] Stefania commented on CASSANDRA-12461: -- It LGTM, with these two changes: * I ill advised you yesterday regarding replacing the boolean with the operating mode, because we are not updating it atomically and we are not guarding against unwanted transitions, due to some bug somewhere, e.g. DRAINED -> DECOMMISSIONED or DRAINED -> NORMAL. I did not want to handle all this and check all the callers of {{setMode}} and so I've reintroduced the boolean, I merely renamed it to {{isShutdown}}. * I think we could have races if we check that a node is drained before starting a service with two JMX calls. I move the check in the SS methods and made them synchronized. |[patch|https://github.com/stef1927/cassandra/commits/12461]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12461-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12461-dtest/]| Let me know if you are OK with these two changes, if so, should we also back-port to 3.0? > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529017#comment-15529017 ] Alex Petrov commented on CASSANDRA-12461: - Thank you for the detailed review. I've made the corresponding fixes and rebased. I didn't squash commits, mostly to make it easier to look over. bq. shall we rename inShutdownHook to something like draining or drained? Sure, renamed. bq. there is still one unprotected call to logger.warn in drain(), line 4462, and a call in runShutdownHooks() The one in {{4462}} (I assume that's a line number after rebase, [this one in original file|https://github.com/ifesdjeen/cassandra/blob/c49dd44710f5ba816eb0cd414ef31efa914f7776/src/java/org/apache/cassandra/service/StorageService.java#L4449]). I've only "turned off" the logging that is related strictly to draining. I might be missing something however. bq. let's update the documentation for drain() True, it was incorrect. Fixed now. Stating differences with a normal shutdown hook now is too technical (both logging and windows timers are unrelated to core of what Cassandra does), so I left it out. bq. shall we catch any exceptions in drain() to ensure the post shutdown hooks are run also if there is an exception? You're right, it's a good idea to do that. I've also changed catching {{Exception}} to catching {{Throwable}} (with {{Throwables}} as you described) when running hooks in order to avoid unintended errors breaking drain. bq. the documentation says that the post shutdown hooks should only be called on final shutdown, not on drain, is this still the case? Since there's just one process: either shutdown or drain (you can't re-run the drain code after termination), it's not true anymore. Actually, that was the reason to provide the second part of patch. bq. here is a proposal for some extra work (feel free to turn it down): refactor setMode() to accept an optional log level instead of a boolean, when the optional is empty it should not log, so we could call this method also from the shutdown hook and possibly replace inShutdownHook with operationMode, provided this becomes volatile. I would also add an override since most of the times it is called with the boolean set to true, so the override would have the logging level set to INFO. I like the idea a lot, and I've implemented at least a part of it (using {{DRAINED}} instead of an additional boolean). I think that transitioning to {{DRAINING}} / {{DRAINED}} state on shutdown is also correct. As regards log level, I could not find a good intuitive way to pass log level, as slf4j API uses method calls instead of levels, both native, log4j and apache logging levels do not translate fully to slf4j method calls. Hope it's still ok.. bq. given that drain() is synchronized, can we not just look at inShutdownHook (or operationMode) at the beginning of drain(), to solve CASSANDRA-12509? Is there anything else I am missing about it? Actually my intention was that this issue fully contains #12509 already. I will add the corresponding link. bq. the StorageService import is unused in the Enabled*.java files Cleaned up. bq. we could replace runShutdownHooks(); with Throwables.perform(null, postShutdownHooks.stream().map(h -> h::run));, logging any non-null returned exceptions, if not in final shutdown. This would not only avoid one method implementation, but also make the log call visible in drain(). I've made it log exceptions even if it's a final shutdown. It might be useful to have them logged in both cases. I've triggered a CI, will report as soon as I have passed tests. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528282#comment-15528282 ] Stefania commented on CASSANDRA-12461: -- I'm definitely very much in favor of what you've done by unifying the drain code, behaviorally we are changing shutdown according to the table above, notably we will also flush tables without durable writes, recycle commitlog segments, and stop compactions. I think this should be the correct behavior, it may make shutdown a bit longer but operators normally run drain before shutdown anyway and so it would be a no-op during shutdown. I'm not sure if we need this in 3.0 as well, the shutdown hooks are a new feature, but running a full drain on shutdown may be very useful when upgrading, in case operators forget to call drain. Here is the review in detail (some points overlap or cancel each other): * there are 3 dtest failures which are not on trunk, {{cql_tests.LWTTester}} but they seem unrelated and they pass locally, it is probably a glitch in dtests since the code was changed 2 days ago, but let's repeat dtests at least two more times. * shall we rename {{inShutdownHook}} to something like {{draining}} or {{drained}}? * there is still one unprotected call to {{logger.warn}} in {{drain()}}, line 4462, and a call in {{runShutdownHooks()}} * let's update the documentation for {{drain()}} * shall we catch any exceptions in {{drain()}} to ensure the post shutdown hooks are run also if there is an exception? * the documentation says that the post shutdown hooks should only be called on final shutdown, not on drain, is this still the case? * here is a proposal for some extra work (feel free to turn it down): refactor {{setMode()}} to accept an optional log level instead of a boolean, when the optional is empty it should not log, so we could call this method also from the shutdown hook and possibly replace {{inShutdownHook}} with {{operationMode}}, provided this becomes volatile. I would also add an override since most of the times it is called with the boolean set to true, so the override would have the logging level set to INFO. * given that {{drain()}} is synchronized, can we not just look at {{inShutdownHook}} (or {{operationMode}}) at the beginning of {{drain()}}, to solve CASSANDRA-12509? Is there anything else I am missing about it? * the {{StorageService}} import is unused in the Enabled*.java files * we could replace {{runShutdownHooks();}} with {{Throwables.perform(null, postShutdownHooks.stream().map(h -> h::run));}}, logging any non-null returned exceptions, if not in final shutdown. This would not only avoid one method implementation, but also make the log call visible in {{drain()}}. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513021#comment-15513021 ] Alex Petrov commented on CASSANDRA-12461: - I've discovered several more problems while working on this patch: * node drain code was duplicated (with minor differences, which I indicate below) * if the node was drained, under Windows the timer resolution (added in [CASSANDRA-9634]) was not reset, since the node was considered "already drained". * same was happening with post-shutdown hooks, since [here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L575-L576] we'll return from runnable, since those services were shut down during {{drain}} [here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L586-L589]. This is one of the reasons I was advocating for a single consistent drain process. I also suggest disallowing re-enabling auto-compaction, binary, gossip, handoff and thrift to ensure that we do not need to re-stop them in the final shutdown hook. Operator can not bring the node into "working" state after drain without restart anyways (one of the reasons is the fact that commit log is shut down by that time), and it was most likely never intended to do so. However, it might be useful for operator to run compactions on the drained node, so we only shut down compaction manager later. I've made a comparison table to make it easier to see what {{drain()}} method was doing compared to {{drainOnShutdown}} runnable: || nodetool drain || shutdown drain hook | | disables autocompaction | | | shuts down compaction manager | | | recycles commitlog segment recycling | | | shuts down batchlog and hints earlier | | | | flushes only tables with durable_writes | | | clears set timer resolution for windows | I've combined the two processes, made clearer distinctions to allow running things in {{drainOnShutdown}}. Since we can run all the items from the {{nodetool drain}} part of the list during the normal node shutdown, the code got a bit simpler, too (the only difference is now logging). Preliminary version of the update (also, CI pending): |[12461-trunk-v2|https://github.com/ifesdjeen/cassandra/tree/12461-trunk-v2]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-testall/]| (I've discussed the change "in theory" with [~slebresne], although it's still worth for someone to take a deeper look at it, I'll ask around) > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511331#comment-15511331 ] Alex Petrov commented on CASSANDRA-12461: - [~JoshuaMcKenzie] I've created [CASSANDRA-12509] to solve the problem that we've faced while trying to resolve that one (shutdown hook running twice), which needs thorough investigation. Had to work on multiple other issues meanwhile, I'll put it on top of my list. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511255#comment-15511255 ] Joshua McKenzie commented on CASSANDRA-12461: - ping [~ifesdjeen] > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487544#comment-15487544 ] Anthony Cozzie commented on CASSANDRA-12461: Pushed a new patch with the renames to my github. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449879#comment-15449879 ] Anthony Cozzie commented on CASSANDRA-12461: Nod, that did occur to me as well. Easy enough to fix the names if things are all basically working. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449097#comment-15449097 ] Sergio Bossa commented on CASSANDRA-12461: -- FWIW, I find the naming a bit confusing, i.e. I think {{runBeforeShutdown()}} and {{removePreShutdownHook()}} should be made consistent, possibly by renaming the former as {{addPreShutdownHook()}} (and the same applies to the post hooks). > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446337#comment-15446337 ] Anthony Cozzie commented on CASSANDRA-12461: Ah, good catch. I think this is a simple fix (logback's context aware doesn't happen in the constructor for some reason). Also, the problem with removing the shutdown hook is that then nothing that we register here will run. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438651#comment-15438651 ] Alex Petrov commented on CASSANDRA-12461: - bq. I was unable to reproduce the 'LOGBACK: No context' error (I ran it as DSE in our CCM) with either 'ccm stop' and a 'kill XXX'. What were you doing there to trigger that bug? You can only see it when attached to terminal (i.e. running {{cassandra -f}}) and force-quitting it. I suppose since logging is already shut down it can not write to the file, so can only output through {{STDOUT}}... bq. it runs pre shutdown hooks it clears the list so they do not run twice That definitely fixes the issue, although I still think that we should solve it by (maybe) removing a shutdown hook on successful drain ([CASSANDRA-12509]) > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433003#comment-15433003 ] Anthony Cozzie commented on CASSANDRA-12461: I pushed a new version of the patch. Specifically: drain does not run post shutdown hooks, and when it runs pre shutdown hooks it clears the list so they do not run twice. I smoke tested this again and I was unable to reproduce the 'LOGBACK: No context' error (I ran it as DSE in our CCM) with either 'ccm stop' and a 'kill XXX'. What were you doing there to trigger that bug? > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428541#comment-15428541 ] Alex Petrov commented on CASSANDRA-12461: - bq. It is a great way to introduce stupid bugs and the performance benefits here would be nonexistent. I was actually advocating for correctness. With a volatile variable we won't be able to guarantee exactly-once (we could with locking but it's usually much harder). If you're still against it even in that case, I'll take a look at locking version you implemented again as I think we should guard the boolean variable, as it'd already protect the access to lists. Either way works for me, we should just make sure that list accessors (both add and remove) disallow modifying lists after shutdown hook is ran. bq. I only smoke tested things on a shutdown, I'm wondering if that error has to do with the shutdown hook for logback being called twice. I'll test that case today. As far as I remember, this error showed up during "normal" shutdown (interrupt). And during drain (I've tested by adding some println hooks), I just seen all messages duplicated (which is kind of clear from the logic), but I'd argue that running whole process twice is not required. And if we do run hooks, we most likely should run them once (?), which would be during "normal" shutdown. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428470#comment-15428470 ] Anthony Cozzie commented on CASSANDRA-12461: Thanks for your review Alex! First, the nits: I am -1 on any sort of synchronization optimization. It is a great way to introduce stupid bugs and the performance benefits here would be nonexistent. For example, I'm not sure your patch works there: suppose we have the following interleaving: * addHook reads the AtomicBoolean, it is false * shutdownHook sets the AtomicBoolean to true * shutdownHook runs the hooks * addHook adds the hook I don't even know if this analysis is correct; I just don't want to think about it. List add doesn't really seem to return anything useful according to the javadoc "Lists that support this operation may place limitations on what elements may be added to this list. In particular, some lists will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. List classes should clearly specify in their documentation any restrictions on what elements may be added.". So I am OK either way. OK, the more serious stuff: I did not understand what drain() was doing. We should definitely not run post shutdown hooks and turn off logging and such until the very end. Probably the simplest thing would be to simply clear the preshutdownhook list after calling them on drain() so they are not called twice. I only smoke tested things on a shutdown, I'm wondering if that error has to do with the shutdown hook for logback being called twice. I'll test that case today. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428040#comment-15428040 ] Alex Petrov commented on CASSANDRA-12461: - Hi [~acoz]. I have several minor suggestions: * run post-shutdown hooks after setting the mode * disallow removing hooks if already in shutdown hook * return boolean yielded by `add` of the list * currently shutdown hooks will run twice on drained node: once during the node drain and second time during the actual shutdown, is that intended / known? Should we even run shutdown process for the second time after the node is already drained? Or can we use an atomic boolean and ensure exactly one run? * it might be possible to avoid syncrhonisation by using atomic boolean instead of volatile + syncronised block I also suggest working further on fixing [CASSANDRA-12011], since currently it results into {code} LOGBACK: No context given for ch.qos.logback.core.hook.DelayingShutdownHook@5bdd8803 {code} which in my opinion requires some investigation. I've started reading up on it and it seems that, for instance, Spring Boot is doing it through the context shutdown. Since logging isn't going to be available for post-shutdown hooks, it'd be hard to use them, as you can not see side-effects. I've pushed a dirty copy of my suggestions [here|https://github.com/ifesdjeen/cassandra/commits/12461-trunk]. It might be simpler to look at the code :) > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424480#comment-15424480 ] Alex Petrov commented on CASSANDRA-12461: - Hi Anthony, thank you for the patch! I'll submit for changes to CI and will review them within next couple of days. > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423525#comment-15423525 ] Anthony Cozzie commented on CASSANDRA-12461: OK, back ported the patch to 2.2 and removed the shutdown hook, replacing it with a direct call in StorageService(). Since there are no lambdas in 2.2, I reworked that part and the first note no longer applies. In addition, I tried to do this via Github rather than attached patches. Hopefully I pulled that off. Please let me know what you think, Alex! * 2.2: https://github.com/acoz/cassandra/tree/12461 * 3.0: https://github.com/acoz/cassandra/tree/12461-3.0 * 3.10: https://github.com/acoz/cassandra/tree/12461-trunk > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421633#comment-15421633 ] Anthony Cozzie commented on CASSANDRA-12461: I'm going to go ahead and fix Cassandra-12011 while I'm at it since it will be easy. At that point we can hopefully patch this into 2.2 and up (I'll have to take out the lambdas I guess). > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421170#comment-15421170 ] Anthony Cozzie commented on CASSANDRA-12461: Notes: * Wrapping the hooks on insertion leads to some nice code in the shutdown hook, but it means we can't easily remove the hooks once we add them. If this is important, it can be changed pretty easily. * The implementation uses synchronized blocks rather than CopyOnWriteArrayList. This lets callers know if Cassandra shut down before they could add their hook. I don't know if this is important enough to worry about; we could just punt here and simplify the code. * I didn't write any unit tests. Hopefully the code is simply enough to just reason about, and I don't know of a simple way to unit test this (it would be difficult to add hooks in a dtest, verifying the order in a non-brittle way seems difficult, and Cassandra doesn't seem to have a good way to boot up full nodes in unit tests, as this is mostly a bad idea). > Add hooks to StorageService shutdown > > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug >Reporter: Anthony Cozzie >Assignee: Anthony Cozzie > Attachments: > 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to > synchronization problems between Cassandra, services that depend on it, and > services it depends on. This patch adds some simple support for shutdown > hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)