[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-28 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166731#comment-17166731
 ] 

Brandon Williams commented on CASSANDRA-15191:
--

3.0 looks good, +1.

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: log.txt
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-25 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165117#comment-17165117
 ] 

David Capwell commented on CASSANDRA-15191:
---

I took a stab at back porting to 3.0: 
https://github.com/dcapwell/cassandra/commit/fb3162efa1308bc00fd8bd479e91c563160dea0e

I also made a few small changes from your original patch

1) calls to FSError handler now go through jvm stability
2) Instance adds default fs handler.  this was working on trunk since we fixed 
it there, but didn't backport; so adding it so we don't need in the test
3) CL didn't need a lot of changes, it looks like the issue you faced was a 
change between 3.0 and 3.11, so only had to update the CL method in jvm 
stability

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: log.txt
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-23 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164051#comment-17164051
 ] 

David Capwell commented on CASSANDRA-15191:
---

CI 3.11 - 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/309/workflows/b4cbed8d-868f-4640-a697-471fa03fd4bf
CI trunk - 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/310/workflows/62969c9b-9c65-4558-9ec0-3fcc3f17d79e

Looks like this patch doesn't play nicely with commit log, this breaks the 
following tests

commitlog_test.py
 - test_ignore_failure_policy
 - test_stop_commit_failure_policy

Here is the log from the ignore policy test 
https://1573-209217594-gh.circle-artifacts.com/62/dtest_j8_without_vnodes_logs/1595547611103_test_ignore_failure_policy/node1.log

sample that stands out

{code}
ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:08,735 CommitLog.java:499 - 
Failed managing commit log segments
org.apache.cassandra.io.FSWriteError: java.nio.file.AccessDeniedException: 
/tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:180)
at 
org.apache.cassandra.db.commitlog.MemoryMappedSegment.(MemoryMappedSegment.java:45)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.createSegment(CommitLogSegment.java:137)
at 
org.apache.cassandra.db.commitlog.CommitLogSegmentManagerStandard.createSegment(CommitLogSegmentManagerStandard.java:66)
at 
org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager$1.runMayThrow(AbstractCommitLogSegmentManager.java:114)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.AccessDeniedException: 
/tmp/dtest-zt17lw0m/test/node1/commitlogs/CommitLog-7-1595547598804.log
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:175)
... 7 common frames omitted
ERROR [COMMIT-LOG-ALLOCATOR] 2020-07-23 23:40:09,736 
DefaultFSErrorHandler.java:66 - Stopping transports as disk_failure_policy is 
stop
{code}

Looks like the commit policy isn't respected and instead we fall back to the 
normal disk policy.

[~stefan.miklosovic] can you look into this?

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: log.txt
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-21 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162283#comment-17162283
 ] 

David Capwell commented on CASSANDRA-15191:
---

FYI conversation has been happening in slack: 
https://the-asf.slack.com/archives/CK23JSY2K/p1595280621333400

Updates:

* the tests are flaky, looks like there is a race condition in the test where 
the flag isn't updated yet.  A workaround was added to query multiple times 
with a 5 second sleep in hopes to make the tests stable

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: log.txt
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161586#comment-17161586
 ] 

David Capwell commented on CASSANDRA-15191:
---

python dtests failed; it looks like exceptions were not logged before and all 
are logged now.  Simple example is with the auth tests found in 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/297/workflows/aefdb912-4395-498a-a1b6-b16770d46a45/jobs/1438
 

test_udf_permissions_validation - auth_test.TestAuthRoles

{code}
Unexpected error found in node logs (see stdout for full details). Errors: 
[ERROR [Native-Transport-Requests-11] 2020-07-20 21:51:31,717 
JVMStabilityInspector.java:81 - Uncaught exception in thread 
Thread[Native-Transport-Requests-11,10,main]
org.apache.cassandra.exceptions.UnauthorizedException: User mike has no ALTER 
permission on  or any of its parents
at 
org.apache.cassandra.service.ClientState.ensurePermissionOnResourceChain(ClientState.java:430)
at 
org.apache.cassandra.service.ClientState.ensurePermission(ClientState.java:404)
at 
org.apache.cassandra.cql3.statements.schema.CreateFunctionStatement.authorize(CreateFunctionStatement.java:177)
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:203)
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:253)
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:240)
at 
org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108)
at 
org.apache.cassandra.transport.Message$Request.execute(Message.java:253)
at 
org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:725)
at 
org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:630)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
{code}

This exception wasn't logged before and was sent back to the user, with this 
patch we now log all these hidden exceptions.

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
> Attachments: log.txt
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161457#comment-17161457
 ] 

David Capwell commented on CASSANDRA-15191:
---

Overall the patch LGTM (only reviewed trunk so far), my main comments were in 
the tests; hope the example given helps.

[~stefan.miklosovic] do you have any CI runs for this patch?

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.7, 4.0-beta1
>
> Attachments: log.txt
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-20 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161451#comment-17161451
 ] 

David Capwell commented on CASSANDRA-15191:
---

Thanks for the changes.  To help show what I was trying (and failing) to say in 
the PR, I posted different tests that hit the read stage and show this is a 
problem.  

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.7, 4.0-beta1
>
> Attachments: log.txt
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160407#comment-17160407
 ] 

Stefan Miklosovic commented on CASSANDRA-15191:
---

[~dcapwell] please review again, I have added a test (hopefully that is 
something you expect otherwise I am out of ideas here) + I have moved the 
logging from ALAES to inspector.

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.7, 4.0-beta1
>
> Attachments: log.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-17 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160294#comment-17160294
 ] 

David Capwell commented on CASSANDRA-15191:
---

Took a stab at review and left a few comments in the PR.

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.7, 4.0-beta1
>
> Attachments: log.txt
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-17 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159763#comment-17159763
 ] 

Stefan Miklosovic commented on CASSANDRA-15191:
---

PR for trunk aka 4.0 [https://github.com/apache/cassandra/pull/684]

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 3.11.7, 4.0-beta1
>
> Attachments: log.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-16 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159292#comment-17159292
 ] 

Stefan Miklosovic commented on CASSANDRA-15191:
---

Hi [~jeromatron] and [~Bereng], could you review this, please? I ll create 
patch for trunk if proposed solution is fine here.

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Attachments: log.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> The core reason for ignoring that exception is that thrown exception in 
> AbstractLocalAwareExecturorService is not CorruptSSTableException but it is 
> RuntimeException and that exception is as its cause. Hence it is better if we 
> handle this in JVMStabilityInspector which can recursively examine it, hence 
> act accordingly.
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15191) stop_paranoid disk failure policy is ignored on CorruptSSTableException after node is up

2020-07-16 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159287#comment-17159287
 ] 

Stefan Miklosovic commented on CASSANDRA-15191:
---

PR for 3.11 [https://github.com/apache/cassandra/pull/681]

> stop_paranoid disk failure policy is ignored on CorruptSSTableException after 
> node is up
> 
>
> Key: CASSANDRA-15191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15191
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Vincent White
>Assignee: Stefan Miklosovic
>Priority: Normal
> Attachments: log.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is a bug when disk_failure_policy is set to stop_paranoid and 
> CorruptSSTableException is thrown after server is up. The problem is that 
> this setting is ignored. Normally, it should stop gossip and transport but it 
> just continues to serve requests and an exception is just logged.
>  
> This patch unifies the exception handling in JVMStabilityInspector and code 
> is reworked in such way that this inspector acts as a central place where 
> such exceptions are inspected. 
>  
> Behaviour before:
> stop_paranoid of disk_failure_policy is ignored when CorruptSSTableException 
> is thrown, e.g. on a regular select statement
> Behaviour after:
> Gossip and transport (cql) is turned off, JVM is still up for further 
> investigation e.g. by jmx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org