[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-05-01 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646519#comment-13646519
 ] 

Brandon Williams commented on CASSANDRA-5432:
-

I never thought CASSANDRA-5171 was a really big gain anyway, but it looked 
innocuous enough at the time. +1 on reverting it.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical
 Attachments: 0001-CASSANDRA-5432.patch


 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644558#comment-13644558
 ] 

Jonathan Ellis commented on CASSANDRA-5432:
---

Why does let's use the last-known location of this node cause problems?

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical
 Attachments: 0001-CASSANDRA-5432.patch


 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-29 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644658#comment-13644658
 ] 

Vijay commented on CASSANDRA-5432:
--

The problem is that we need Private_ip to communicate within DC/region is not 
available until the gossiping with nodes. 
Since we dont have the private information but we do have the rest (DC/RACK), 
we are trying to connect via public IP.

Removing that optimization forces us to assume it is in other DC and hence 
using public IP and SSL port, eventually when we receive the private IP we 
reset the status to use the right (private_ip) connection.
You may ask why not store the private IP? well we could but currently the reset 
connection (to private IP) logic is in the snitch.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical
 Attachments: 0001-CASSANDRA-5432.patch


 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641589#comment-13641589
 ] 

Ondřej Černoš commented on CASSANDRA-5432:
--

I have exactly the same issue as Arya.

I also had to open non-SSL ports from within the datacenter in order to create 
the cluster.

I was wondering if it could be a networking issue (we use mixed aws-private 
cloud setup), so it is good to see we are not alone with this.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641599#comment-13641599
 ] 

Ondřej Černoš commented on CASSANDRA-5432:
--

Please see also CASSANDRA-5493 - the MessagingService also reports dropped 
messages on _itself_ using it's public IP. The output displays 3 public IPs and 
2 private (the private IP of the node itself is not included), while the remote 
DC is reported correctly. This seems related.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-25 Thread Arya Goudarzi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642436#comment-13642436
 ] 

Arya Goudarzi commented on CASSANDRA-5432:
--

Sure, I should be able to get back to you either tonight or tomorrow.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical
 Attachments: 0001-CASSANDRA-5432.patch


 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-25 Thread Arya Goudarzi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642609#comment-13642609
 ] 

Arya Goudarzi commented on CASSANDRA-5432:
--

+1 works for me.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical
 Attachments: 0001-CASSANDRA-5432.patch


 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-24 Thread Arya Goudarzi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641178#comment-13641178
 ] 

Arya Goudarzi commented on CASSANDRA-5432:
--

So, I rolled back CASSANDRA-5171. Pushed it to my test cluster. The gossip 
issue where nodes after restart didn't see each other got fixed. The repair 
still tried to connect to the machine running repair (self) with its public IP 
for requesting MerkleTree where it gets stuck, so it has the same issue. Some 
behavior changed though, and the OutBoundTCPConnection didn't report connecting 
to other 2 replicas for requesting MerkleTree, so I only saw the message when 
trying to connect. Here is the snippet: 

 INFO [Thread-458] 2013-04-24 23:21:16,543 StorageService.java (line 2407) 
Starting repair command #1, repairing 1 ranges for keyspace app_production
DEBUG [Thread-458] 2013-04-24 23:21:16,580 StorageService.java (line 2547) 
computing ranges for 1808575600, 7089215977519551322153637656637080005, 
14178431955039102644307275311465584410, 4253529586511
7307932921825930779602030, 49624511842636859255075463585608106435, 
56713727820156410577229101240436610840, 85070591730234615865843651859750628460, 
92159807707754167187997289514579132865, 9924902368527
3718510150927169407637270, 127605887595351923798765477788721654890, 
134695103572871475120919115443550159295, 141784319550391026443072753098378663700
 INFO [AntiEntropySessions:1] 2013-04-24 23:21:16,587 AntiEntropyService.java 
(line 651) [repair #a9a87e40-ad35-11e2-945a-050d956ff11b] new session: will 
sync /107.20.98.11, /54.224.107.137, /54.224.1
33.163 on range 
(99249023685273718510150927169407637270,127605887595351923798765477788721654890]
 for cardspring_production.[App]
 INFO [AntiEntropySessions:1] 2013-04-24 23:21:16,598 AntiEntropyService.java 
(line 857) [repair #a9a87e40-ad35-11e2-945a-050d956ff11b] requesting merkle 
trees for App (to [/XX.YYY.107.137, /XX.YYY.133.163, /XXX.YY.98.11])
DEBUG [WRITE-/107.20.98.11] 2013-04-24 23:21:16,601 OutboundTcpConnection.java 
(line 260) attempting to connect to /XXX.YY.98.11
 INFO [AntiEntropyStage:1] 2013-04-24 23:21:19,111 AntiEntropyService.java 
(line 213) [repair #a9a87e40-ad35-11e2-945a-050d956ff11b] Received merkle tree 
for App from /XX.YYY.133.163
DEBUG [ScheduledTasks:1] 2013-04-24 23:21:19,409 GCInspector.java (line 121) GC 
for ParNew: 54 ms for 1 collections, 669806384 used; max is 4211081216
 INFO [AntiEntropyStage:1] 2013-04-24 23:21:20,408 AntiEntropyService.java 
(line 213) [repair #a9a87e40-ad35-11e2-945a-050d956ff11b] Received merkle tree 
for App from /XX.YYY.107.137

See the debug line with OutboundTcpConnection. It is trying to connect to 
public IP of self (XXX.YY.98.11), which is still an issue. What I was expecting 
to see before this line was two other consecutive lines like before where it 
showed OutboundTcpConnection trying to connect to other nodes as well. Despite 
them returning the MerkleTrees, those log lines did not show. So, connection 
was made successfully to the other nodes somehow. 

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 

[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-23 Thread Arya Goudarzi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638823#comment-13638823
 ] 

Arya Goudarzi commented on CASSANDRA-5432:
--

I was actually suspicious about that. I can roll back that patch and try it. 
Give me till end of the week. My hands are tied up right now.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-21 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637687#comment-13637687
 ] 

Vijay commented on CASSANDRA-5432:
--

Priam opens port for other DC's to talk to each other but nothing to do within, 
i still doubt the SG setup coz all IP's within a security group should be 
opened for both ports. 
May be CASSANDRA-5171 created a side effect, which i am not sure.

[~jbrown] do you mind verifying it with 1.2.4? Verifying it with Priam is a 
bigger undertaking for me now :)

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-20 Thread Arya Goudarzi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637454#comment-13637454
 ] 

Arya Goudarzi commented on CASSANDRA-5432:
--

Priam only opens one port, and that is the SSL port on public IPs (see line 
74): http://goo.gl/vY8WX 

I did not remove the IPs from security group. I left the IP rules for the SSL 
port as were set by Priam. I only remove the NON SSL port rules on public IPs 
which I had added manually to work around this issue.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-19 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636660#comment-13636660
 ] 

Vijay commented on CASSANDRA-5432:
--

Arya, 
The first time we start the communication to a node we try to Initiate 
communications we use the public IP and eventually once we have the private IP 
we will switch back to local ip's.

I am confused with the analysis, because the nodes should have been connected 
and communicating and Tree request is another message in the same channel as 
any other message. 
Are the nodes up in the first place?

{code}
this.treeRequests = new 
RequestCoordinatorTreeRequest(isSequential)
{
public void send(TreeRequest r)
{

MessagingService.instance().sendOneWay(r.createMessage(), r.endpoint);
}
};
{code}

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-19 Thread Arya Goudarzi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637036#comment-13637036
 ] 

Arya Goudarzi commented on CASSANDRA-5432:
--

Hey Vijay,

Good to see you here. Sorry if my analysis is unclear. Here is my take:

 The first time we start the communication to a node we try to Initiate 
 communications we use the public IP and eventually once we have the private 
 IP we will switch back to local ip's.

Has this always been the case? Because if you are using public ips (not public 
dns name), there has to be explicit security rules on public ips to allow this. 
Otherwise, if in security groups you are opening the ports to the machines in 
the same group using their security group name, it allows traffic only within 
their private ips, so this won't work. 

We use Priam (your awesome tooling), and as you know, it opens up only the SSL 
port on the public IPs for cross region communication. And from the operator's 
perspective, that is the correct thing to do. I only have the SSL port open on 
public IPs and don't want to open the non SSL port for security reasons. Now, 
all other ports like non SSL, JMX, etc are opened the way I described using 
security group names and it allows traffic on private IPs. It is just the way 
AWS has been. So, if within the same region, you are trying to connect to any 
machine using public ip, it won't work. 

Here is how I achieved the scenario above and I believe they are all co-related 
to the statement you said that all machine connect to public IPs first.

Setup a cluster as I described in my previous comment. It can be a single 
region. Restart all machines at the same time. Each machine would only see 
itself at UP. Everyone else is reported to be DOWN in nodetool ring. I am 
guessing that it is because they are trying to send gossips to public IPs but 
only SSL port is open on public IPs. The cluster is configured to only do SSL 
cross datacenter/region not within the same region. So, not I am left with 
bunch of nodes that only see themselves in the ring. I go to my AWS console, 
open up the non SSL port on every single public IP in that security group. Now 
all the nodes see each other. 

By now, I had a theory about nodes wanting to communicate through the public ip 
which is not possible, so I stepped into troubleshooting repairs. I know that 
with current settings repair would succeed. Since the nodes see each other now, 
I go to security groups and remove the non SSL on public IP rules that I added 
in previous step. Start the repair, and I ended up with the log message as 
above. The public ip mentioned in the log, belongs to the node that owns the 
log and is running repair, so it tried to communicated to itself using its own 
public IP. 

Did I make sense? I can call you to describe it over the phone, but basically 
this setup used to work on 1.1.10 but does not work on 1.2.4. I have attached 
the debugger to a node and am trying to trace  the code. I'll let you know if I 
find something new.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO 

[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-19 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637123#comment-13637123
 ] 

Vijay commented on CASSANDRA-5432:
--

Hi Arya, Thanks and you can call me anytime but it will help others if we keep 
the discussion here.

{quote}
Has this always been the case? 
{quote}
As far as i know, yes.

{quote}
 I go to security groups and remove the non SSL on public IP rules that I added 
in previous step.
{quote}
Priam opens up ports for the local nodes and also the remote nodes within the 
security group (http://goo.gl/l9Q1T). Looks like you shouldn't do the above 
because you are now disabling cassandra from restarting the connections.

Also the reason you are seeing all the nodes to be UP in a multi region case 
event though they cannot communicate within the DC is because of the issue 
mentioned in CASSANDRA-3533, I can almost bet that the read/write requests will 
be failing in the local DC, If not try after restarting nodes. :)

Let me know if you still have issues or disagree.


 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Assignee: Vijay
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-18 Thread Arya Goudarzi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636082#comment-13636082
 ] 

Arya Goudarzi commented on CASSANDRA-5432:
--

 non-ssl on the private IP within the same one [region]

OK, a little more digging, and I found the root cause which I believe is a bug, 
so I am re-opening this.

See this log snippet for a repair sessions I triggered on nodes in a single 
region in AWS:

 INFO [AntiEntropySessions:1] 2013-04-19 04:28:16,587 AntiEntropyService.java 
(line 651) [repair #8e59b7c0-a8a9-11e2-ba85-d39d57f66b97] new session: will 
sync /54.242.X.YYY, /54.224.XX.YYY, /50.17.XXX.YYY on range 
(99249023685273718510150927169407637270,127605887595351923798765477788721654890]
 for cardspring_production.[App]
 INFO [AntiEntropySessions:1] 2013-04-19 04:28:16,591 AntiEntropyService.java 
(line 857) [repair #8e59b7c0-a8a9-11e2-ba85-d39d57f66b97] requesting merkle 
trees for App (to [/54.224.XX.YYY, /50.17.XXX.YYY, /54.242.X.YYY])
DEBUG [WRITE-/50.17.159.210] 2013-04-19 04:28:16,592 OutboundTcpConnection.java 
(line 260) attempting to connect to /10.170.XX.YYY
DEBUG [WRITE-/54.224.36.214] 2013-04-19 04:28:16,593 OutboundTcpConnection.java 
(line 260) attempting to connect to /10.121.XX.YYY
DEBUG [WRITE-/54.242.1.111] 2013-04-19 04:28:16,593 OutboundTcpConnection.java 
(line 260) attempting to connect to /54.242.X.YYY

Notice the last line. This is the public IP of the node running repair. Why is 
this picking up the public ip address for itself to send the tree request? This 
is the source of problem. In AWS you cannot communicated through public ip 
address with security group rules that are defined based on group names, which 
is a common use case. Hence the tree request gets stuck at sending point to 
itself. 





 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-17 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633903#comment-13633903
 ] 

Jonathan Ellis commented on CASSANDRA-5432:
---

You said above that you had it configured this way in 1.1 as well:

{quote}
7100 from cluster1 (Configured Normal Storage)
7103 from cluster1 (Configured SSL Storage)
{quote}

In any case, it is not a bug for you to need both open; Cassandra will use SSL 
between datacenters (regions), and non-ssl on the private IP within the same 
one.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-16 Thread Arya Goudarzi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633492#comment-13633492
 ] 

Arya Goudarzi commented on CASSANDRA-5432:
--

I have used the IRC channel already. It was suggested to me to open a JIRA 
ticket as no one could help.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4

2013-04-16 Thread Arya Goudarzi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633502#comment-13633502
 ] 

Arya Goudarzi commented on CASSANDRA-5432:
--

I added a correction. It is not JMX Jonathan, you are right. It is opening the 
non-ssl storage port on public IPs that fixes it. We didn't have to do this on 
1.1.10.

 Repair Freeze/Gossip Invisibility Issues 1.2.4
 --

 Key: CASSANDRA-5432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.4
 Environment: Ubuntu 10.04.1 LTS
 C* 1.2.3
 Sun Java 6 u43
 JNA Enabled
 Not using VNodes
Reporter: Arya Goudarzi
Priority: Critical

 Read comment 6. This description summarizes the repair issue only, but I 
 believe there is a bigger problem going on with networking as described on 
 that comment. 
 Since I have upgraded our sandbox cluster, I am unable to run repair on any 
 node and I am reaching our gc_grace seconds this weekend. Please help. So 
 far, I have tried the following suggestions:
 - nodetool scrub
 - offline scrub
 - running repair on each CF separately. Didn't matter. All got stuck the same 
 way.
 The repair command just gets stuck and the machine is idling. Only the 
 following logs are printed for repair job:
  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
 Starting repair command #4, repairing 1 ranges for keyspace 
 cardspring_production
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
 (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
 sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
 (1808575600,42535295865117307932921825930779602032] for 
 keyspace_production.[comma separated list of CFs]
  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
 (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
 trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
 /X.X.X.190])
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.43
  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
 (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
 tree for ColumnFamilyName from /X.X.X.56
 Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira