[jira] [Commented] (CASSANDRA-14375) Digest mismatch Exception when sending raw hints in cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484284#comment-16484284 ] Vineet Ghatge commented on CASSANDRA-14375: --- Any one able to reproduce the issue. Using the following option - {{JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"}} on cluster node for very unreliable network avoids this problem. > Digest mismatch Exception when sending raw hints in cluster > --- > > Key: CASSANDRA-14375 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14375 > Project: Cassandra > Issue Type: Bug > Components: Hints > Environment: CentOS 7.3 >Reporter: Vineet Ghatge >Priority: Major > > We have 14 nodes cluster where we seen hints file getting corrupted and > resulting in the following error > {noformat} > ERROR [HintsDispatcher:1] 2018-04-06 16:26:44,423 CassandraDaemon.java:228 - > Exception in thread Thread[HintsDispatcher:1,1,main] > org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch > exception > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:298) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:263) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:169) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:128) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:113) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:94) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:278) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:260) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:238) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:217) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_141] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_141] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > [apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141] > Caused by: java.io.IOException: Digest mismatch exception > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:315) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:289) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > ... 16 common frames omitted > {noformat} > Notes on cluster and investigation done so far > 1. Cassandra used here is built locally from 3.11.1 branch along with > following patch from issue: CASSANDRA-14080 > > [https://github.com/apache/cassandra/commit/68079e4b2ed4e58dbede70af45414b3d4214e195] > 2. The bootstrap of 14 nodes happens in the following way: > - Out of 14 nodes only 3 nodes are picked as seed nodes. > - Only 1 out 3 seed nodes is started and schema is created if it was not > created previously. > - Post this, rest of nodes are bootstrapped. > - In failure scenario, only 5 out of 14 succesfully formed the cassandra > cluster. The failed nodes include two seed nodes. > 3. We confirmed the following patch from issue: CASSANDRA-13696 has been > applied. From confirmed from Jay Zhuang that this is different issue from > what was previously fixed. > "this should be a different issue, as HintsDispatcher.java:128 sends hints > with \{{buffer}}s, this patch is only to fix the digest mismatch for > HintsDispatcher.java:129, which sends hints one by one." > 4. Application uses jav
[jira] [Commented] (CASSANDRA-14375) Digest mismatch Exception when sending raw hints in cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436714#comment-16436714 ] Jay Zhuang commented on CASSANDRA-14375: We saw the same issue in {{3.0.14}} 2 times in the last one week: {noformat} ERROR [HintsDispatcher:1] 2018-04-10 23:43:47,930 HintsDispatchExecutor.java:234 - Failed to dispatch hints file d921cf74-c064-465d-82b4-aa964cb3b8f6-1523401451406-1.hints: file is corrupted ({}) org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:296) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:261) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:138) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268) [apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251) [apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229) [apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208) [apache-cassandra-3.0.14.x.jar:3.0.14.x] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.0.14.x.jar:3.0.14.x] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] Caused by: java.io.IOException: Digest mismatch exception at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:313) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:287) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] ... 16 common frames omitted ERROR [HintsDispatcher:1] 2018-04-10 23:43:47,931 CassandraDaemon.java:207 - Exception in thread Thread[HintsDispatcher:1,1,main] org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:296) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:261) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:138) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229) ~[apache-cassandra-3.0.14.x.jar:3.0.14.x] at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(H
[jira] [Commented] (CASSANDRA-14375) Digest mismatch Exception when sending raw hints in cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433114#comment-16433114 ] Vineet Ghatge commented on CASSANDRA-14375: --- I am trying to reproducing the issue using ccm. I will update this once I have something working > Digest mismatch Exception when sending raw hints in cluster > --- > > Key: CASSANDRA-14375 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14375 > Project: Cassandra > Issue Type: Bug > Components: Hints > Environment: CentOS 7.3 >Reporter: Vineet Ghatge >Priority: Major > > We have 14 nodes cluster where we seen hints file getting corrupted and > resulting in the following error > [04/06/18 12:21 PM] Kotkar, Shantanu: ERROR [HintsDispatcher:1] 2018-04-06 > 16:26:44,423 CassandraDaemon.java:228 - Exception in thread > Thread[HintsDispatcher:1,1,main] > org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch > exception > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:298) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:263) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:169) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:128) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:113) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:94) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:278) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:260) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:238) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:217) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_141] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_141] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > [apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141] > Caused by: java.io.IOException: Digest mismatch exception > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:315) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > at > org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:289) > ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT] > ... 16 common frames omitted > Notes on cluster and investigation done so far > 1. Cassandra used here is built locally from 3.11.1 branch along with > following patch from issue: CASSANDRA-14080 > > https://github.com/apache/cassandra/commit/68079e4b2ed4e58dbede70af45414b3d4214e195 > 2. The bootstrap of 14 nodes happens in the following way: > - Out of 14 nodes only 3 nodes are picked as seed nodes. > - Only 1 out 3 seed nodes is started and schema is created if it was not > created previously. > - Post this, rest of nodes are bootstrapped. > - In failure scenario, only 5 out of 14 succesfully formed the cassandra > cluster. The failed nodes include two seed nodes. > 3. We confirmed the following patch from issue: CASSANDRA-13696 has been > applied. From confirmed from Jay Zhuang that this is different issue from > what was previously fixed. > "this should be a different issue, as HintsDispatcher.java:128 sends hints > with \{{buffer}}s, this patch is only to fix the digest mismatch for > HintsDispatcher.java:129, which sends hints one by one." > 4. Application uses java driver with quoram setting for cassandra > 5. We saw this issue on 7 node cluster