[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646519#comment-13646519 ] Brandon Williams commented on CASSANDRA-5432: - I never thought CASSANDRA-5171 was a really big gain anyway, but it looked innocuous enough at the time. +1 on reverting it. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Attachments: 0001-CASSANDRA-5432.patch Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644558#comment-13644558 ] Jonathan Ellis commented on CASSANDRA-5432: --- Why does let's use the last-known location of this node cause problems? Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Attachments: 0001-CASSANDRA-5432.patch Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644658#comment-13644658 ] Vijay commented on CASSANDRA-5432: -- The problem is that we need Private_ip to communicate within DC/region is not available until the gossiping with nodes. Since we dont have the private information but we do have the rest (DC/RACK), we are trying to connect via public IP. Removing that optimization forces us to assume it is in other DC and hence using public IP and SSL port, eventually when we receive the private IP we reset the status to use the right (private_ip) connection. You may ask why not store the private IP? well we could but currently the reset connection (to private IP) logic is in the snitch. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Attachments: 0001-CASSANDRA-5432.patch Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641589#comment-13641589 ] Ondřej Černoš commented on CASSANDRA-5432: -- I have exactly the same issue as Arya. I also had to open non-SSL ports from within the datacenter in order to create the cluster. I was wondering if it could be a networking issue (we use mixed aws-private cloud setup), so it is good to see we are not alone with this. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641599#comment-13641599 ] Ondřej Černoš commented on CASSANDRA-5432: -- Please see also CASSANDRA-5493 - the MessagingService also reports dropped messages on _itself_ using it's public IP. The output displays 3 public IPs and 2 private (the private IP of the node itself is not included), while the remote DC is reported correctly. This seems related. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642436#comment-13642436 ] Arya Goudarzi commented on CASSANDRA-5432: -- Sure, I should be able to get back to you either tonight or tomorrow. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Attachments: 0001-CASSANDRA-5432.patch Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642609#comment-13642609 ] Arya Goudarzi commented on CASSANDRA-5432: -- +1 works for me. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Attachments: 0001-CASSANDRA-5432.patch Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641178#comment-13641178 ] Arya Goudarzi commented on CASSANDRA-5432: -- So, I rolled back CASSANDRA-5171. Pushed it to my test cluster. The gossip issue where nodes after restart didn't see each other got fixed. The repair still tried to connect to the machine running repair (self) with its public IP for requesting MerkleTree where it gets stuck, so it has the same issue. Some behavior changed though, and the OutBoundTCPConnection didn't report connecting to other 2 replicas for requesting MerkleTree, so I only saw the message when trying to connect. Here is the snippet: INFO [Thread-458] 2013-04-24 23:21:16,543 StorageService.java (line 2407) Starting repair command #1, repairing 1 ranges for keyspace app_production DEBUG [Thread-458] 2013-04-24 23:21:16,580 StorageService.java (line 2547) computing ranges for 1808575600, 7089215977519551322153637656637080005, 14178431955039102644307275311465584410, 4253529586511 7307932921825930779602030, 49624511842636859255075463585608106435, 56713727820156410577229101240436610840, 85070591730234615865843651859750628460, 92159807707754167187997289514579132865, 9924902368527 3718510150927169407637270, 127605887595351923798765477788721654890, 134695103572871475120919115443550159295, 141784319550391026443072753098378663700 INFO [AntiEntropySessions:1] 2013-04-24 23:21:16,587 AntiEntropyService.java (line 651) [repair #a9a87e40-ad35-11e2-945a-050d956ff11b] new session: will sync /107.20.98.11, /54.224.107.137, /54.224.1 33.163 on range (99249023685273718510150927169407637270,127605887595351923798765477788721654890] for cardspring_production.[App] INFO [AntiEntropySessions:1] 2013-04-24 23:21:16,598 AntiEntropyService.java (line 857) [repair #a9a87e40-ad35-11e2-945a-050d956ff11b] requesting merkle trees for App (to [/XX.YYY.107.137, /XX.YYY.133.163, /XXX.YY.98.11]) DEBUG [WRITE-/107.20.98.11] 2013-04-24 23:21:16,601 OutboundTcpConnection.java (line 260) attempting to connect to /XXX.YY.98.11 INFO [AntiEntropyStage:1] 2013-04-24 23:21:19,111 AntiEntropyService.java (line 213) [repair #a9a87e40-ad35-11e2-945a-050d956ff11b] Received merkle tree for App from /XX.YYY.133.163 DEBUG [ScheduledTasks:1] 2013-04-24 23:21:19,409 GCInspector.java (line 121) GC for ParNew: 54 ms for 1 collections, 669806384 used; max is 4211081216 INFO [AntiEntropyStage:1] 2013-04-24 23:21:20,408 AntiEntropyService.java (line 213) [repair #a9a87e40-ad35-11e2-945a-050d956ff11b] Received merkle tree for App from /XX.YYY.107.137 See the debug line with OutboundTcpConnection. It is trying to connect to public IP of self (XXX.YY.98.11), which is still an issue. What I was expecting to see before this line was two other consecutive lines like before where it showed OutboundTcpConnection trying to connect to other nodes as well. Despite them returning the MerkleTrees, those log lines did not show. So, connection was made successfully to the other nodes somehow. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56,
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638823#comment-13638823 ] Arya Goudarzi commented on CASSANDRA-5432: -- I was actually suspicious about that. I can roll back that patch and try it. Give me till end of the week. My hands are tied up right now. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637687#comment-13637687 ] Vijay commented on CASSANDRA-5432: -- Priam opens port for other DC's to talk to each other but nothing to do within, i still doubt the SG setup coz all IP's within a security group should be opened for both ports. May be CASSANDRA-5171 created a side effect, which i am not sure. [~jbrown] do you mind verifying it with 1.2.4? Verifying it with Priam is a bigger undertaking for me now :) Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637454#comment-13637454 ] Arya Goudarzi commented on CASSANDRA-5432: -- Priam only opens one port, and that is the SSL port on public IPs (see line 74): http://goo.gl/vY8WX I did not remove the IPs from security group. I left the IP rules for the SSL port as were set by Priam. I only remove the NON SSL port rules on public IPs which I had added manually to work around this issue. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636660#comment-13636660 ] Vijay commented on CASSANDRA-5432: -- Arya, The first time we start the communication to a node we try to Initiate communications we use the public IP and eventually once we have the private IP we will switch back to local ip's. I am confused with the analysis, because the nodes should have been connected and communicating and Tree request is another message in the same channel as any other message. Are the nodes up in the first place? {code} this.treeRequests = new RequestCoordinatorTreeRequest(isSequential) { public void send(TreeRequest r) { MessagingService.instance().sendOneWay(r.createMessage(), r.endpoint); } }; {code} Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637036#comment-13637036 ] Arya Goudarzi commented on CASSANDRA-5432: -- Hey Vijay, Good to see you here. Sorry if my analysis is unclear. Here is my take: The first time we start the communication to a node we try to Initiate communications we use the public IP and eventually once we have the private IP we will switch back to local ip's. Has this always been the case? Because if you are using public ips (not public dns name), there has to be explicit security rules on public ips to allow this. Otherwise, if in security groups you are opening the ports to the machines in the same group using their security group name, it allows traffic only within their private ips, so this won't work. We use Priam (your awesome tooling), and as you know, it opens up only the SSL port on the public IPs for cross region communication. And from the operator's perspective, that is the correct thing to do. I only have the SSL port open on public IPs and don't want to open the non SSL port for security reasons. Now, all other ports like non SSL, JMX, etc are opened the way I described using security group names and it allows traffic on private IPs. It is just the way AWS has been. So, if within the same region, you are trying to connect to any machine using public ip, it won't work. Here is how I achieved the scenario above and I believe they are all co-related to the statement you said that all machine connect to public IPs first. Setup a cluster as I described in my previous comment. It can be a single region. Restart all machines at the same time. Each machine would only see itself at UP. Everyone else is reported to be DOWN in nodetool ring. I am guessing that it is because they are trying to send gossips to public IPs but only SSL port is open on public IPs. The cluster is configured to only do SSL cross datacenter/region not within the same region. So, not I am left with bunch of nodes that only see themselves in the ring. I go to my AWS console, open up the non SSL port on every single public IP in that security group. Now all the nodes see each other. By now, I had a theory about nodes wanting to communicate through the public ip which is not possible, so I stepped into troubleshooting repairs. I know that with current settings repair would succeed. Since the nodes see each other now, I go to security groups and remove the non SSL on public IP rules that I added in previous step. Start the repair, and I ended up with the log message as above. The public ip mentioned in the log, belongs to the node that owns the log and is running repair, so it tried to communicated to itself using its own public IP. Did I make sense? I can call you to describe it over the phone, but basically this setup used to work on 1.1.10 but does not work on 1.2.4. I have attached the debugger to a node and am trying to trace the code. I'll let you know if I find something new. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637123#comment-13637123 ] Vijay commented on CASSANDRA-5432: -- Hi Arya, Thanks and you can call me anytime but it will help others if we keep the discussion here. {quote} Has this always been the case? {quote} As far as i know, yes. {quote} I go to security groups and remove the non SSL on public IP rules that I added in previous step. {quote} Priam opens up ports for the local nodes and also the remote nodes within the security group (http://goo.gl/l9Q1T). Looks like you shouldn't do the above because you are now disabling cassandra from restarting the connections. Also the reason you are seeing all the nodes to be UP in a multi region case event though they cannot communicate within the DC is because of the issue mentioned in CASSANDRA-3533, I can almost bet that the read/write requests will be failing in the local DC, If not try after restarting nodes. :) Let me know if you still have issues or disagree. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Assignee: Vijay Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636082#comment-13636082 ] Arya Goudarzi commented on CASSANDRA-5432: -- non-ssl on the private IP within the same one [region] OK, a little more digging, and I found the root cause which I believe is a bug, so I am re-opening this. See this log snippet for a repair sessions I triggered on nodes in a single region in AWS: INFO [AntiEntropySessions:1] 2013-04-19 04:28:16,587 AntiEntropyService.java (line 651) [repair #8e59b7c0-a8a9-11e2-ba85-d39d57f66b97] new session: will sync /54.242.X.YYY, /54.224.XX.YYY, /50.17.XXX.YYY on range (99249023685273718510150927169407637270,127605887595351923798765477788721654890] for cardspring_production.[App] INFO [AntiEntropySessions:1] 2013-04-19 04:28:16,591 AntiEntropyService.java (line 857) [repair #8e59b7c0-a8a9-11e2-ba85-d39d57f66b97] requesting merkle trees for App (to [/54.224.XX.YYY, /50.17.XXX.YYY, /54.242.X.YYY]) DEBUG [WRITE-/50.17.159.210] 2013-04-19 04:28:16,592 OutboundTcpConnection.java (line 260) attempting to connect to /10.170.XX.YYY DEBUG [WRITE-/54.224.36.214] 2013-04-19 04:28:16,593 OutboundTcpConnection.java (line 260) attempting to connect to /10.121.XX.YYY DEBUG [WRITE-/54.242.1.111] 2013-04-19 04:28:16,593 OutboundTcpConnection.java (line 260) attempting to connect to /54.242.X.YYY Notice the last line. This is the public IP of the node running repair. Why is this picking up the public ip address for itself to send the tree request? This is the source of problem. In AWS you cannot communicated through public ip address with security group rules that are defined based on group names, which is a common use case. Hence the tree request gets stuck at sending point to itself. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633903#comment-13633903 ] Jonathan Ellis commented on CASSANDRA-5432: --- You said above that you had it configured this way in 1.1 as well: {quote} 7100 from cluster1 (Configured Normal Storage) 7103 from cluster1 (Configured SSL Storage) {quote} In any case, it is not a bug for you to need both open; Cassandra will use SSL between datacenters (regions), and non-ssl on the private IP within the same one. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633492#comment-13633492 ] Arya Goudarzi commented on CASSANDRA-5432: -- I have used the IRC channel already. It was suggested to me to open a JIRA ticket as no one could help. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5432) Repair Freeze/Gossip Invisibility Issues 1.2.4
[ https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633502#comment-13633502 ] Arya Goudarzi commented on CASSANDRA-5432: -- I added a correction. It is not JMX Jonathan, you are right. It is opening the non-ssl storage port on public IPs that fixes it. We didn't have to do this on 1.1.10. Repair Freeze/Gossip Invisibility Issues 1.2.4 -- Key: CASSANDRA-5432 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Ubuntu 10.04.1 LTS C* 1.2.3 Sun Java 6 u43 JNA Enabled Not using VNodes Reporter: Arya Goudarzi Priority: Critical Read comment 6. This description summarizes the repair issue only, but I believe there is a bigger problem going on with networking as described on that comment. Since I have upgraded our sandbox cluster, I am unable to run repair on any node and I am reaching our gc_grace seconds this weekend. Please help. So far, I have tried the following suggestions: - nodetool scrub - offline scrub - running repair on each CF separately. Didn't matter. All got stuck the same way. The repair command just gets stuck and the machine is idling. Only the following logs are printed for repair job: INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) Starting repair command #4, repairing 1 ranges for keyspace cardspring_production INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range (1808575600,42535295865117307932921825930779602032] for keyspace_production.[comma separated list of CFs] INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, /X.X.X.190]) INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.43 INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle tree for ColumnFamilyName from /X.X.X.56 Please advise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira