[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-08-18 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-7560:
--

Attachment: 0001-partial-backport-3569.patch

I took IRepairJobEventListener part from original patch so that snapshot 
failure will abort repair session.

 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram
Assignee: Yuki Morishita
 Fix For: 2.0.10

 Attachments: 0001-backport-CASSANDRA-6747.patch, 
 0001-partial-backport-3569.patch, cassandra_daemon.log, 
 cassandra_daemon_rep1.log, cassandra_daemon_rep2.log, nodetool_command.log


 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-08-18 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-7560:
---

Reviewer: Joshua McKenzie  (was: Marcus Eriksson)

 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram
Assignee: Yuki Morishita
 Fix For: 2.0.10

 Attachments: 0001-backport-CASSANDRA-6747.patch, 
 0001-partial-backport-3569.patch, cassandra_daemon.log, 
 cassandra_daemon_rep1.log, cassandra_daemon_rep2.log, nodetool_command.log


 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-07-24 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-7560:
--

Attachment: 0001-backport-CASSANDRA-6747.patch

Attaching CASSANDRA-6747 backport.
It turns out, the logic uses custom message property and does not bump 
messaging version, we are able to backport all the feature to 2.0 branch.

 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram
 Fix For: 2.0.10

 Attachments: 0001-backport-CASSANDRA-6747.patch, 
 cassandra_daemon.log, cassandra_daemon_rep1.log, cassandra_daemon_rep2.log, 
 nodetool_command.log


 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-07-24 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-7560:
--

Reviewer: Marcus Eriksson

[~krummas] to review

 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram
Assignee: Yuki Morishita
 Fix For: 2.0.10

 Attachments: 0001-backport-CASSANDRA-6747.patch, 
 cassandra_daemon.log, cassandra_daemon_rep1.log, cassandra_daemon_rep2.log, 
 nodetool_command.log


 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-07-18 Thread Vladimir Avram (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Avram updated CASSANDRA-7560:
--

Attachment: cassandra_daemon.log

jstack output from JVM running C*

 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram
 Attachments: cassandra_daemon.log


 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-07-18 Thread Vladimir Avram (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Avram updated CASSANDRA-7560:
--

Attachment: nodetool_command.log

jstack output of the JVM running nodetool

 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram
 Attachments: cassandra_daemon.log, nodetool_command.log


 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-07-18 Thread Vladimir Avram (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Avram updated CASSANDRA-7560:
--

Attachment: cassandra_daemon_rep1.log

There is also a stalled AntiEntropySession on this node.

 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram
 Attachments: cassandra_daemon.log, cassandra_daemon_rep1.log, 
 nodetool_command.log


 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-07-18 Thread Vladimir Avram (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Avram updated CASSANDRA-7560:
--

Attachment: cassandra_daemon_rep2.log

 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram
 Attachments: cassandra_daemon.log, cassandra_daemon_rep1.log, 
 cassandra_daemon_rep2.log, nodetool_command.log


 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-07-16 Thread Vladimir Avram (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Avram updated CASSANDRA-7560:
--

Description: 
Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
AntiEntropySessions.

The system logs will show the repair command starting

{noformat}
 INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
Starting repair command #1, repairing 256 ranges for keyspace x
{noformat}

You can then see a few AntiEntropySessions completing with:

{noformat}
INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
successfully
{noformat}

Finally we reach an AntiEntropySession at some point that hangs just before 
requesting the merkle trees for the next column family in line for repair. So 
we first see the previous CF being finished and the whole repair sessions hangs 
here with no visible progress or errors on this or any of the related nodes.

{noformat}
INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 221) 
[repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully synced
{noformat}

  was:
Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
AntiEntropySessions.

The system logs will show the repair command starting

{panel}
 INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
Starting repair command #1, repairing 256 ranges for keyspace x
{panel}

You can then see a few AntiEntropySessions completing with:

{panel}
INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
successfully
{panel}

Finally we reach an AntiEntropySession at some point that hangs just before 
requesting the merkle trees for the next column family in line for repair. So 
we first see the previous CF being finished and the whole repair sessions hangs 
here with no visible progress or errors on this or any of the related nodes.

{panel}
INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 221) 
[repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully synced
{panel}


 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram

 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2014-07-16 Thread Vladimir Avram (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Avram updated CASSANDRA-7560:
--

Description: 
Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
AntiEntropySessions.

The system logs will show the repair command starting

{noformat}
 INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
Starting repair command #1, repairing 256 ranges for keyspace x
{noformat}

You can then see a few AntiEntropySessions completing with:

{noformat}
INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
successfully
{noformat}

Finally we reach an AntiEntropySession at some point that hangs just before 
requesting the merkle trees for the next column family in line for repair. So 
we first see the previous CF being finished and the whole repair sessions hangs 
here with no visible progress or errors on this or any of the related nodes.

{noformat}
INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 221) 
[repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully synced
{noformat}

Notes:
* Single DC 6 node cluster with an average load of 86 GB per node.
* This appears to be random; it does not always happen on the same CF or on the 
same session.

  was:
Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
AntiEntropySessions.

The system logs will show the repair command starting

{noformat}
 INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
Starting repair command #1, repairing 256 ranges for keyspace x
{noformat}

You can then see a few AntiEntropySessions completing with:

{noformat}
INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
successfully
{noformat}

Finally we reach an AntiEntropySession at some point that hangs just before 
requesting the merkle trees for the next column family in line for repair. So 
we first see the previous CF being finished and the whole repair sessions hangs 
here with no visible progress or errors on this or any of the related nodes.

{noformat}
INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 221) 
[repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully synced
{noformat}


 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram

 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)