[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458820#comment-16458820 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Thank You!!! [~iamaleksey] > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core, Hints >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 4.0, 3.0.17, 3.11.3 > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458817#comment-16458817 ] Aleksey Yeschenko commented on CASSANDRA-13740: --- Made some changes and committed to 3.0 as [b2f6ce961f38a3e4cd744e102026bf7a471056c9|https://github.com/apache/cassandra/commit/b2f6ce961f38a3e4cd744e102026bf7a471056c9] and merged upwards. Changes made: - fixed {{excise()}} to properly handle non-existing stores instead of re-initializing them - changed the delay to be min rpc timeout + write rpc timeout, which roughly the time you may expect a new hint written after node decom Thank you for you patience and for the added tests. > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core, Hints >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405567#comment-16405567 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Hi [~iamaleksey] Please find updated patch here: ||trunk||3.0|| |[patch|https://github.com/apache/cassandra/compare/trunk...jaydeepkumar1984:13740-trunk?expand=1]|[patch|https://github.com/apache/cassandra/compare/cassandra-3.0...jaydeepkumar1984:CASSANDRA-13740_1?expand=1]| |[utest|https://circleci.com/gh/jaydeepkumar1984/cassandra/46]|[utest|https://circleci.com/gh/jaydeepkumar1984/cassandra/44]| Jaydeep > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core, Hints >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402241#comment-16402241 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Thanks [~iamaleksey] for the review. Reason behind {{RING_DELAY}} is as following, in this fix one thing is clear that we have to delay {{StorageProxy.excise()}} which means we have to put some sleep. So we have two options to put sleep: 1. Hardcode some random value say for example delay {{StorageProxy.excise()}} for 10 seconds OR 2. Other nodes in the ring will no longer accept writes once they learn that given node is no longer part of the ring. Hence I have used {{RING_DELAY}} which is general delay used at [many places|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/gms/Gossiper.java#L553] and after this delay we can assume ring has stabilized. So my theory is that once ring has stabilized then everyone in the ring would have learnt about node that just left and at this time it is safe to do {{StorageProxy.excise(). }}Please let me know if my understanding is not correct, I can change it to some hardcoded value say 20 seconds. I will incorporate other code review comments and will send you updated patch soon. > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core, Hints >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands,
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400994#comment-16400994 ] Aleksey Yeschenko commented on CASSANDRA-13740: --- And the changes to {{deleteAllHints()}} I don't fully understand. I don't think it's the responsibility of the method to close any writers. The contract is (as I understand it) - please remove all written hints files at this point. One problem is that for catalogues that aren't loaded, if some files are remaining, they won't be deleted - but this change doesn't address it. More importantly, I think that with excise fixed, that would be less of a problem and probably not needed to fix.. So let's leave {{deleteAllHints()}} alone. And also leave excise more or less alone, but call it instead at the end of {{StorageProxy.excise()}}, with a delay. Not sure why you picked {{RING_DELAY}} for the delay though.. can you explain? > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core, Hints >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400985#comment-16400985 ] Aleksey Yeschenko commented on CASSANDRA-13740: --- Sure. So what the current patch is doing is it does excise, and then, in essence, schedules another excise in {{RING_DELAY}}? In other words, we simply don't trust the immediate excise, and rely on the follow-up one. In that case, to make sure that hints for writes that will time out at some later point get deleted from disk, shouldn't we just say "alright, let's just delay the first excise, instead of doing the immediate one and then another one"? > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core, Hints >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387326#comment-16387326 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Hi [~iamaleksey] A gentle ping. If you have some time then can you please help me close this? Jaydeep > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core, Hints >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164951#comment-16164951 ] Aleksey Yeschenko commented on CASSANDRA-13740: --- A bit busy currently, sorry. Will have a look as soon as I can. > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162052#comment-16162052 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Hi [~iamaleksey] Can you please review my latest patch? Jaydeep > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144419#comment-16144419 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Hi [~tuxslayer] Isn’t the race condition is as following: *Thread T1 - Time 1:* Removes {{HintStore}} by calling [HintsService.instance.excise |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L2268] but at this time node has not yet been removed from [tokenMetadata |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L133] *Thread T2 - Time 2:* Mutation stage does a lookup to [tokenMetadata | https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageProxy.java#L2617] and finds node as valid hence it dumps hint for it. *Thread T1 - Time 3:* Now removes node from [tokenMetadata | https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L2270] Hence I decided it is safer to sleep for {{StorageService.RING_DELAY}} and then schedule optional {{removeOrphanHintFiles}} task. Please let me know your comments. I have also incorporated your review comment, please find updated patch attached as well as here: https://github.com/jaydeepkumar1984/cassandra/commit/16d4ab3316ab71a9a3b96ab67944384092be40ca Jaydeep > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140263#comment-16140263 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Thanks for the review comments, I will change it and send it updated one soon. > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135814#comment-16135814 ] Aleksey Yeschenko commented on CASSANDRA-13740: --- Hey. There are a few [code style|https://wiki.apache.org/cassandra/CodeStyle] issues: we don't use {{final}} for arguments and local variables, brackets go to new lines always. And the patch doesn't wait for {{closeWriter}} future to be completed. And a more interesting issue is that of the delay. {{RING_DELAY}} doesn't have anything to do with hints. What does is write timeouts, and {{MessagingService}} 's timeout reporter the callbacks expiring map firing - that's where the race ultimately is. Also, we aren't fixing the issue of {{nodetool truncatehints}} not being able to clean up after we excise. The more I think about it, the more I'm inclined to just correct that last issue and leave everything else be as is (and also commit your unit tests, thanks for those). > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124249#comment-16124249 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Hi [~iamaleksey] I have modified code as per your review comments, please find it attached "13740-2_3.0.15.txt" Also please find same patch here: https://github.com/jaydeepkumar1984/cassandra/commit/173fce0362246595d26b24196d6690223d132d5e I will create patch for 3.11 as well as will run {{circleci}} after receiving your review comments. Jaydeep > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120714#comment-16120714 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Thanks [~iamaleksey] for the code review. I will change it as per your suggestion and will provide updated patch. > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x, 3.11.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119799#comment-16119799 ] Aleksey Yeschenko commented on CASSANDRA-13740: --- The patch likely works, but I think we can do better. Some of the issues I have with it: 1. It introduces a dependency on {{HintsService}} and {{StorageService}} to {{HintsWriteExecutor}} 2. It introduces a dependency on {{HintsService}} to {{HintsStore}} When designing the current iteration of hints I was very careful to design the system in a top-down way without any interleaving that’s avoidable. Each class is a dumb as possible on its own, and as you go up, you just compose dumb classes that by themselves know nothing of layers above them. As for the problem itself, we do acknowledge that “The worst that can happen if we don't get everything right is a hints file (or two) remaining undeleted.” - comments to {{excise()}}, it’s more of a known limitation than a bug. But of course we can improve on it. What is a problem, however, is the inability to programmatically remove those orphan files via JMX. {{nodetool truncatehints}} should get results no matter what, and it should be fixed. If we want to deal with the orphans for sure - and I don’t see why not improve this as well - I suggest you do so in a different way. Perhaps as last step of {{excise()}} schedule an task - on {{ScheduledExecutors.optionalTasks}} to clean up any orphans, if any, after some delay. > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119106#comment-16119106 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Please find pull request here: https://github.com/apache/cassandra/pull/136/commits/993d5891e69f5cda402ae2158e06914f653a644d > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x > > Attachments: 13740-2_3.0.15.txt, 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119094#comment-16119094 ] ASF GitHub Bot commented on CASSANDRA-13740: GitHub user jaydeepkumar1984 opened a pull request: https://github.com/apache/cassandra/pull/136 CASSANDRA-13740 orphan hint file get created You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaydeepkumar1984/cassandra 13740-3.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cassandra/pull/136.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #136 commit 993d5891e69f5cda402ae2158e06914f653a644d Author: Jaydeepkumar ChovatiaDate: 2017-08-03T22:34:26Z CASSANDRA-13740 orphan hint file get created > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x > > Attachments: 13740-2_3.0.15.txt, 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115027#comment-16115027 ] Jaydeepkumar Chovatia commented on CASSANDRA-13740: --- Thanks [~jay.zhuang] for review comments. {{HintsStore}} object may get created multiple times for same {{hostId}} as following: *time t1* removenode thread calls [catalog.exciseStore | https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327 ] and it removes from [Mapstores | https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsCatalog.java#L41] *time t2* HintWriter thread calls [HintsStore::get | https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsCatalog.java#L85] which creates a new {{HintStore}} object and will write hints again. For this bug specifically above mentioned scenario is been happening, please let me know your comments. I think we should atleast move {{static final Set evictedHostIds}} from {{HintStore.java}} to {{HintService.java}} which is a singleton? > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114869#comment-16114869 ] Jay Zhuang commented on CASSANDRA-13740: Hi [~chovatia.jayd...@gmail.com] nice catch. One comment about your patch, I don't think it's a good idea to have {{static final Set evictedHostIds}} set in {{HintsStore}}, as the HintsStore instance is for one hints file. How about adding a field {{private boolean isEvicted = false;}}? To indicate if the target host is evicted or not. > Orphan hint file gets created while node is being removed from cluster > -- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Minor > Fix For: 3.0.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints }} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org