[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166731#comment-17166731 ] Brandon Williams commented on CASSANDRA-15191: -- 3.0 looks good, +1. > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: log.txt > > Time Spent: 3.5h > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165117#comment-17165117 ] David Capwell commented on CASSANDRA-15191: --- I took a stab at back porting to 3.0: https://github.com/dcapwell/cassandra/commit/fb3162efa1308bc00fd8bd479e91c563160dea0e I also made a few small changes from your original patch 1) calls to FSError handler now go through jvm stability 2) Instance adds default fs handler. this was working on trunk since we fixed it there, but didn't backport; so adding it so we don't need in the test 3) CL didn't need a lot of changes, it looks like the issue you faced was a change between 3.0 and 3.11, so only had to update the CL method in jvm stability > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0-beta > > Attachments: log.txt > > Time Spent: 3.5h > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164051#comment-17164051 ] David Capwell commented on CASSANDRA-15191: --- CI 3.11 - https://app.circleci.com/pipelines/github/dcapwell/cassandra/309/workflows/b4cbed8d-868f-4640-a697-471fa03fd4bf CI trunk - https://app.circleci.com/pipelines/github/dcapwell/cassandra/310/workflows/62969c9b-9c65-4558-9ec0-3fcc3f17d79e Looks like this patch doesn't play nicely with commit log, this breaks the following tests commitlog_test.py - test_ignore_failure_policy - test_stop_commit_failure_policy Here is the log from the ignore policy test https://1573-209217594-gh.circle-artifacts.com/62/dtest_j8_without_vnodes_logs/1595547611103_test_ignore_failure_policy/node1.log sample that stands out {code} ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:08,735 CommitLog.java:499 - Failed managing commit log segments org.apache.cassandra.io.FSWriteError: java.nio.file.AccessDeniedException: /tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log at org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:180) at org.apache.cassandra.db.commitlog.MemoryMappedSegment.(MemoryMappedSegment.java:45) at org.apache.cassandra.db.commitlog.CommitLogSegment.createSegment(CommitLogSegment.java:137) at org.apache.cassandra.db.commitlog.CommitLogSegmentManagerStandard.createSegment(CommitLogSegmentManagerStandard.java:66) at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager$1.runMayThrow(AbstractCommitLogSegmentManager.java:114) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.file.AccessDeniedException: /tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:175) ... 7 common frames omitted ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:09,736 DefaultFSErrorHandler.java:66 - Stopping transports as disk_failure_policy is stop {code} Looks like the commit policy isn't respected and instead we fall back to the normal disk policy. [~stefan.miklosovic] can you look into this? > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > Attachments: log.txt > > Time Spent: 3.5h > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162283#comment-17162283 ] David Capwell commented on CASSANDRA-15191: --- FYI conversation has been happening in slack: https://the-asf.slack.com/archives/CK23JSY2K/p1595280621333400 Updates: * the tests are flaky, looks like there is a race condition in the test where the flag isn't updated yet. A workaround was added to query multiple times with a 5 second sleep in hopes to make the tests stable > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > Attachments: log.txt > > Time Spent: 3.5h > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161586#comment-17161586 ] David Capwell commented on CASSANDRA-15191: --- python dtests failed; it looks like exceptions were not logged before and all are logged now. Simple example is with the auth tests found in https://app.circleci.com/pipelines/github/dcapwell/cassandra/297/workflows/aefdb912-4395-498a-a1b6-b16770d46a45/jobs/1438 test_udf_permissions_validation - auth_test.TestAuthRoles {code} Unexpected error found in node logs (see stdout for full details). Errors: [ERROR [Native-Transport-Requests-11] 2020-07-20 21:51:31,717 JVMStabilityInspector.java:81 - Uncaught exception in thread Thread[Native-Transport-Requests-11,10,main] org.apache.cassandra.exceptions.UnauthorizedException: User mike has no ALTER permission on or any of its parents at org.apache.cassandra.service.ClientState.ensurePermissionOnResourceChain(ClientState.java:430) at org.apache.cassandra.service.ClientState.ensurePermission(ClientState.java:404) at org.apache.cassandra.cql3.statements.schema.CreateFunctionStatement.authorize(CreateFunctionStatement.java:177) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:203) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:253) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:240) at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108) at org.apache.cassandra.transport.Message$Request.execute(Message.java:253) at org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:725) at org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:630) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) {code} This exception wasn't logged before and was sent back to the user, with this patch we now log all these hidden exceptions. > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.x, 4.0-beta > > Attachments: log.txt > > Time Spent: 3h 20m > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161457#comment-17161457 ] David Capwell commented on CASSANDRA-15191: --- Overall the patch LGTM (only reviewed trunk so far), my main comments were in the tests; hope the example given helps. [~stefan.miklosovic] do you have any CI runs for this patch? > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.7, 4.0-beta1 > > Attachments: log.txt > > Time Spent: 3h 10m > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161451#comment-17161451 ] David Capwell commented on CASSANDRA-15191: --- Thanks for the changes. To help show what I was trying (and failing) to say in the PR, I posted different tests that hit the read stage and show this is a problem. > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.7, 4.0-beta1 > > Attachments: log.txt > > Time Spent: 2h 40m > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160407#comment-17160407 ] Stefan Miklosovic commented on CASSANDRA-15191: --- [~dcapwell] please review again, I have added a test (hopefully that is something you expect otherwise I am out of ideas here) + I have moved the logging from ALAES to inspector. > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.7, 4.0-beta1 > > Attachments: log.txt > > Time Spent: 1h 50m > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160294#comment-17160294 ] David Capwell commented on CASSANDRA-15191: --- Took a stab at review and left a few comments in the PR. > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.7, 4.0-beta1 > > Attachments: log.txt > > Time Spent: 0.5h > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159763#comment-17159763 ] Stefan Miklosovic commented on CASSANDRA-15191: --- PR for trunk aka 4.0 [https://github.com/apache/cassandra/pull/684] > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.7, 4.0-beta1 > > Attachments: log.txt > > Time Spent: 20m > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159292#comment-17159292 ] Stefan Miklosovic commented on CASSANDRA-15191: --- Hi [~jeromatron] and [~Bereng], could you review this, please? I ll create patch for trunk if proposed solution is fine here. > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Attachments: log.txt > > Time Spent: 10m > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > The core reason for ignoring that exception is that thrown exception in > AbstractLocalAwareExecturorService is not CorruptSSTableException but it is > RuntimeException and that exception is as its cause. Hence it is better if we > handle this in JVMStabilityInspector which can recursively examine it, hence > act accordingly. > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up
[ https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159287#comment-17159287 ] Stefan Miklosovic commented on CASSANDRA-15191: --- PR for 3.11 [https://github.com/apache/cassandra/pull/681] > stop_paranoid disk failure policy is ignored on CorruptSSTableException after > node is up > > > Key: CASSANDRA-15191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15191 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Vincent White >Assignee: Stefan Miklosovic >Priority: Normal > Attachments: log.txt > > Time Spent: 10m > Remaining Estimate: 0h > > There is a bug when disk_failure_policy is set to stop_paranoid and > CorruptSSTableException is thrown after server is up. The problem is that > this setting is ignored. Normally, it should stop gossip and transport but it > just continues to serve requests and an exception is just logged. > > This patch unifies the exception handling in JVMStabilityInspector and code > is reworked in such way that this inspector acts as a central place where > such exceptions are inspected. > > Behaviour before: > stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException > is thrown, e.g. on a regular select statement > Behaviour after: > Gossip and transport (cql) is turned off, JVM is still up for further > investigation e.g. by jmx. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org