[jira] [Commented] (CASSANDRA-5605) Crash caused by insufficient disk space to flush
[ https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705442#comment-13705442 ] Ananth Gundabattula commented on CASSANDRA-5605: Am not sure if the following information helps but we too hit this issue in production today. We were running with cassandra 1.2.4 and two patches CASSANDRA-5554 CASSANDRA-5418. We were running with RF=3 and LCS. We cross checked using JMX if blacklisting is the cause of this bug and it looks like it is definitely not the case. We however saw a pile up of pending compactions ~ 1800 pending compactions per node when node crashed. Surprising thing is that the Insufficient disk space to write bytes appears much before the node crashes. For us it started appearing aprrox 3 hours before the node crashed. The cluster which showed this behavior was having loads of writes occurring ( We were using multiple SSTableLoaders to stream data into this cluster. ). We pushed in almost 15 TB worth data ( including the RF =3 ) in a matter of 16 hours. We were not serving any reads from this cluster as we were still migrating data to it. Another interesting behavior observed that nodes were neighbors in most of the time. Am not sure if the above information helps but wanted to add it to the context of the ticket. Crash caused by insufficient disk space to flush Key: CASSANDRA-5605 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.3, 1.2.5 Environment: java version 1.7.0_15 Reporter: Dan Hendry A few times now I have seen our Cassandra nodes crash by running themselves out of memory. It starts with the following exception: {noformat} ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 164) Exception in thread Thread[FlushWriter:13000,5,main] java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} After which, it seems the MemtablePostFlusher stage gets stuck and no further memtables get flushed: {noformat} INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) MemtablePostFlusher 132 0 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) CompactionManager 1 2 {noformat} What makes this ridiculous is that, at the time, the data directory on this node had 981GB free disk space (as reported by du). We primarily use STCS and at the time the aforementioned exception occurred, at least one compaction task was executing which could have easily involved 981GB (or more) worth of input SSTables. Correct me if I am wrong but but Cassandra counts data currently being compacted against available disk space. In our case, this is a significant overestimation of the space required by compaction since a large portion of the data being compacted has expired or is an overwrite. More to the point though, Cassandra should not crash because its out of disk space unless its really actually out of disk space (ie, dont consider 'phantom' compaction disk usage when flushing). I have seen one of our nodes die in this way before our alerts for disk space even went off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (CASSANDRA-5605) Crash caused by insufficient disk space to flush
[ https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705442#comment-13705442 ] Ananth Gundabattula edited comment on CASSANDRA-5605 at 7/11/13 5:54 AM: - Am not sure if the following information helps but we too hit this issue in production today. We were running with cassandra 1.2.4 and two patches CASSANDRA-5554 CASSANDRA-5418. We were running with RF=3 and LCS. We ran into this issue while using sstablelaoder to push data from remote 1.2.4 cluster nodes to another cluster We cross checked using JMX if blacklisting is the cause of this bug and it looks like it is definitely not the case. We however saw a pile up of pending compactions ~ 1800 pending compactions per node when node crashed. Surprising thing is that the Insufficient disk space to write bytes appears much before the node crashes. For us it started appearing aprrox 3 hours before the node crashed. The cluster which showed this behavior was having loads of writes occurring ( We were using multiple SSTableLoaders to stream data into this cluster. ). We pushed in almost 15 TB worth data ( including the RF =3 ) in a matter of 16 hours. We were not serving any reads from this cluster as we were still migrating data to it. Another interesting behavior observed that nodes were neighbors in most of the time. Am not sure if the above information helps but wanted to add it to the context of the ticket. was (Author: agundabattula): Am not sure if the following information helps but we too hit this issue in production today. We were running with cassandra 1.2.4 and two patches CASSANDRA-5554 CASSANDRA-5418. We were running with RF=3 and LCS. We cross checked using JMX if blacklisting is the cause of this bug and it looks like it is definitely not the case. We however saw a pile up of pending compactions ~ 1800 pending compactions per node when node crashed. Surprising thing is that the Insufficient disk space to write bytes appears much before the node crashes. For us it started appearing aprrox 3 hours before the node crashed. The cluster which showed this behavior was having loads of writes occurring ( We were using multiple SSTableLoaders to stream data into this cluster. ). We pushed in almost 15 TB worth data ( including the RF =3 ) in a matter of 16 hours. We were not serving any reads from this cluster as we were still migrating data to it. Another interesting behavior observed that nodes were neighbors in most of the time. Am not sure if the above information helps but wanted to add it to the context of the ticket. Crash caused by insufficient disk space to flush Key: CASSANDRA-5605 URL: https://issues.apache.org/jira/browse/CASSANDRA-5605 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.3, 1.2.5 Environment: java version 1.7.0_15 Reporter: Dan Hendry A few times now I have seen our Cassandra nodes crash by running themselves out of memory. It starts with the following exception: {noformat} ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line 164) Exception in thread Thread[FlushWriter:13000,5,main] java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} After which, it seems the MemtablePostFlusher stage gets stuck and no further memtables get flushed: {noformat} INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68) MemtablePostFlusher 132 0 INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73) CompactionManager 1 2 {noformat} What makes this ridiculous is that, at the time, the data directory on this node had 981GB free disk space (as reported by du). We primarily use STCS and at the time the aforementioned exception occurred, at least one compaction task was executing which could have easily involved 981GB (or more) worth of input SSTables. Correct me if I am wrong but but Cassandra counts data currently being compacted against available disk space. In our case, this is a significant overestimation of the space required by compaction since a large portion of the data being compacted has
[jira] [Created] (CASSANDRA-5684) Multi-DC not working between 1.1.10 and 1.2.4 version
Ananth Gundabattula created CASSANDRA-5684: -- Summary: Multi-DC not working between 1.1.10 and 1.2.4 version Key: CASSANDRA-5684 URL: https://issues.apache.org/jira/browse/CASSANDRA-5684 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Linux Reporter: Ananth Gundabattula We wanted to upgrade to a higher version of cassandra ( a custom build of 1.2.4 ) from current cassandra version 1.1.10. The way we wanted to upgrade is to build a new cluster and enable multi-dc replication; rebuild and switch the application layer to the new cluster. This enabled us to have the least downtime. However, I was not able to get a multi-dc going between these two versions. I followed pretty much the instructions here ( for the 1.2.4 version ) http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html. I tried enabling multi-dc using these instructions for the 1.1.10 cluster http://www.datastax.com/docs/1.0/initialize/cluster_init_multi_dc Mixing both the approaches did not work either. Specifying a 1.2.4 node as one of the 1.1.10 cluster seed nodes caused errors in the startup of 1.2.4 node. The error was not an exception but a message looking like this in the logs : ClusterName mismatch from /10.xxx.xx.xx DC1!=DC2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5684) Multi-DC not working between 1.1.10 and 1.2.4 version
[ https://issues.apache.org/jira/browse/CASSANDRA-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690913#comment-13690913 ] Ananth Gundabattula commented on CASSANDRA-5684: Here is our situation which prompted us to take that path: (The path of setting up a new cluster and migrate from an old cluster rather than performing a rolling upgrade) We have two small clusters one running 1.1.4 and one running 1.1.10. We are having loads of issues with the 1.1.10 cluster (primarily could be because of the load patterns it experiences and our experiments have shown that the 1.1.10 also degraded in performance over time possibly because of hardware as well). This has led us to the decision to move to new hardware. Our new hardware being LCS capable ( we were running with hard disks on the older clusters ) and at the same time we cant afford a gradual rip off Size tiered compaction to LCS based SSTables because it will impact our latencies to a very great extent. Unfortunately we have no backup clusters right now. Our other tests also made us think about 1.2.4 because of the heap data structures that 1.2.4 provides us with. GCs have really became a headache for us and our application requires DB calls to be returned below 15ms time range (as shown in opscenter). In our tests we saw about 125 % gain in performance as compared to the hard disk based 1.1.x versions with the same number of nodes. In this context, we want to merge both the old clusters into a single new hardware based one and running 1.2.4 with the least amount of downtimes and application layer impacts. Multi-DC not working between 1.1.10 and 1.2.4 version - Key: CASSANDRA-5684 URL: https://issues.apache.org/jira/browse/CASSANDRA-5684 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Linux Reporter: Ananth Gundabattula We wanted to upgrade to a higher version of cassandra ( a custom build of 1.2.4 ) from current cassandra version 1.1.10. The way we wanted to upgrade is to build a new cluster and enable multi-dc replication; rebuild and switch the application layer to the new cluster. This enabled us to have the least downtime. However, I was not able to get a multi-dc going between these two versions. I followed pretty much the instructions here ( for the 1.2.4 version ) http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html. I tried enabling multi-dc using these instructions for the 1.1.10 cluster http://www.datastax.com/docs/1.0/initialize/cluster_init_multi_dc Mixing both the approaches did not work either. Specifying a 1.2.4 node as one of the 1.1.10 cluster seed nodes caused errors in the startup of 1.2.4 node. The error was not an exception but a message looking like this in the logs : ClusterName mismatch from /10.xxx.xx.xx DC1!=DC2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-5684) Multi-DC not working between 1.1.10 and 1.2.4 version
[ https://issues.apache.org/jira/browse/CASSANDRA-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ananth Gundabattula resolved CASSANDRA-5684. Resolution: Invalid Multi-DC not working between 1.1.10 and 1.2.4 version - Key: CASSANDRA-5684 URL: https://issues.apache.org/jira/browse/CASSANDRA-5684 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Linux Reporter: Ananth Gundabattula We wanted to upgrade to a higher version of cassandra ( a custom build of 1.2.4 ) from current cassandra version 1.1.10. The way we wanted to upgrade is to build a new cluster and enable multi-dc replication; rebuild and switch the application layer to the new cluster. This enabled us to have the least downtime. However, I was not able to get a multi-dc going between these two versions. I followed pretty much the instructions here ( for the 1.2.4 version ) http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html. I tried enabling multi-dc using these instructions for the 1.1.10 cluster http://www.datastax.com/docs/1.0/initialize/cluster_init_multi_dc Mixing both the approaches did not work either. Specifying a 1.2.4 node as one of the 1.1.10 cluster seed nodes caused errors in the startup of 1.2.4 node. The error was not an exception but a message looking like this in the logs : ClusterName mismatch from /10.xxx.xx.xx DC1!=DC2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5684) Multi-DC not working between 1.1.10 and 1.2.4 version
[ https://issues.apache.org/jira/browse/CASSANDRA-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691025#comment-13691025 ] Ananth Gundabattula commented on CASSANDRA-5684: Looks like I got my definition of cluster and datacenter mixed up. All the while I was thinking that a datacenter can have multiple clusters hosted while it looks like the opposite is the actual case and hence the confusion. Closing this ticket Multi-DC not working between 1.1.10 and 1.2.4 version - Key: CASSANDRA-5684 URL: https://issues.apache.org/jira/browse/CASSANDRA-5684 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.2.4 Environment: Linux Reporter: Ananth Gundabattula We wanted to upgrade to a higher version of cassandra ( a custom build of 1.2.4 ) from current cassandra version 1.1.10. The way we wanted to upgrade is to build a new cluster and enable multi-dc replication; rebuild and switch the application layer to the new cluster. This enabled us to have the least downtime. However, I was not able to get a multi-dc going between these two versions. I followed pretty much the instructions here ( for the 1.2.4 version ) http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html. I tried enabling multi-dc using these instructions for the 1.1.10 cluster http://www.datastax.com/docs/1.0/initialize/cluster_init_multi_dc Mixing both the approaches did not work either. Specifying a 1.2.4 node as one of the 1.1.10 cluster seed nodes caused errors in the startup of 1.2.4 node. The error was not an exception but a message looking like this in the logs : ClusterName mismatch from /10.xxx.xx.xx DC1!=DC2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4773) CQL shell not reflecting latest data when timestamp is passed as part of insert statements
[ https://issues.apache.org/jira/browse/CASSANDRA-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471095#comment-13471095 ] Ananth Gundabattula commented on CASSANDRA-4773: Hello Brandon, Thanks a lot for responding to the ticket. So if I get you correctly, event using a timestamp later than the first timestamp should result in an undefined behavior? I did try step 5 as given above wherein the second insert was with a timestamp much later than the first timestamp. In spite of this, I get the row inserted first but not with the later time stamp. As per my current understanding of cassandra that does not sound to be right. Maybe I am missing something here ? CQL shell not reflecting latest data when timestamp is passed as part of insert statements -- Key: CASSANDRA-4773 URL: https://issues.apache.org/jira/browse/CASSANDRA-4773 Project: Cassandra Issue Type: Bug Affects Versions: 1.1.4 Environment: CentOS Reporter: Ananth Gundabattula Priority: Critical While using the CQLSH, I try inserting a row using timestamp and TTL along with consistency. The insert works fine for the first time. When I try to insert using the same key but different value and then issue a select , the value of the returned row is always the first value of the inserted row and not the value that was inserted later. Here are the details : 1. Replication Factor = 1 2. Consistency Level = ONE 3. TTL = 14 days 4. Timestamp = A value which reflects 10 days before the current day 5. Cassandra 1.1.4 6. CQL shell version 2 It may be noted that if the timstamp is not used while issuing the insert statement, the second insert on the same key works fine. Here are the details: 1. INSERT INTO Table1 (col1,col2,col3,col4,col5,col6) VALUES('abcde', 0, 87, 1345603159,222,'65DE') USING TIMESTAMP 1349476620 AND TTL 1209600; This works fine 2. Issuing a select works fine for the above row. 3. INSERT INTO Table1 (col1,col2,col3,col4,col5,col6) VALUES('abcde', 0, 87, 1345603159,222,'a2134') USING TIMESTAMP 1349476620 AND TTL 1209600; There is no error on this statement 4. Issuing a select returns the data inserted as given in step 1. 5. Giving a timestamp later than the above timestamp also does not change things 6. However, if I issue an insert without specifying the timestamp , the select statement gives the latest data always. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4773) CQL shell not reflecting latest data when timestamp is passed as part of insert statements
[ https://issues.apache.org/jira/browse/CASSANDRA-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471140#comment-13471140 ] Ananth Gundabattula commented on CASSANDRA-4773: Thanks Jonathan. That was indeed the cause. Closing this issue. CQL shell not reflecting latest data when timestamp is passed as part of insert statements -- Key: CASSANDRA-4773 URL: https://issues.apache.org/jira/browse/CASSANDRA-4773 Project: Cassandra Issue Type: Bug Affects Versions: 1.1.4 Environment: CentOS Reporter: Ananth Gundabattula Priority: Critical While using the CQLSH, I try inserting a row using timestamp and TTL along with consistency. The insert works fine for the first time. When I try to insert using the same key but different value and then issue a select , the value of the returned row is always the first value of the inserted row and not the value that was inserted later. Here are the details : 1. Replication Factor = 1 2. Consistency Level = ONE 3. TTL = 14 days 4. Timestamp = A value which reflects 10 days before the current day 5. Cassandra 1.1.4 6. CQL shell version 2 It may be noted that if the timstamp is not used while issuing the insert statement, the second insert on the same key works fine. Here are the details: 1. INSERT INTO Table1 (col1,col2,col3,col4,col5,col6) VALUES('abcde', 0, 87, 1345603159,222,'65DE') USING TIMESTAMP 1349476620 AND TTL 1209600; This works fine 2. Issuing a select works fine for the above row. 3. INSERT INTO Table1 (col1,col2,col3,col4,col5,col6) VALUES('abcde', 0, 87, 1345603159,222,'a2134') USING TIMESTAMP 1349476620 AND TTL 1209600; There is no error on this statement 4. Issuing a select returns the data inserted as given in step 1. 5. Giving a timestamp later than the above timestamp also does not change things 6. However, if I issue an insert without specifying the timestamp , the select statement gives the latest data always. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (CASSANDRA-4773) CQL shell not reflecting latest data when timestamp is passed as part of insert statements
[ https://issues.apache.org/jira/browse/CASSANDRA-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ananth Gundabattula closed CASSANDRA-4773. -- Not a bug as the time format used was wrong CQL shell not reflecting latest data when timestamp is passed as part of insert statements -- Key: CASSANDRA-4773 URL: https://issues.apache.org/jira/browse/CASSANDRA-4773 Project: Cassandra Issue Type: Bug Affects Versions: 1.1.4 Environment: CentOS Reporter: Ananth Gundabattula Priority: Critical While using the CQLSH, I try inserting a row using timestamp and TTL along with consistency. The insert works fine for the first time. When I try to insert using the same key but different value and then issue a select , the value of the returned row is always the first value of the inserted row and not the value that was inserted later. Here are the details : 1. Replication Factor = 1 2. Consistency Level = ONE 3. TTL = 14 days 4. Timestamp = A value which reflects 10 days before the current day 5. Cassandra 1.1.4 6. CQL shell version 2 It may be noted that if the timstamp is not used while issuing the insert statement, the second insert on the same key works fine. Here are the details: 1. INSERT INTO Table1 (col1,col2,col3,col4,col5,col6) VALUES('abcde', 0, 87, 1345603159,222,'65DE') USING TIMESTAMP 1349476620 AND TTL 1209600; This works fine 2. Issuing a select works fine for the above row. 3. INSERT INTO Table1 (col1,col2,col3,col4,col5,col6) VALUES('abcde', 0, 87, 1345603159,222,'a2134') USING TIMESTAMP 1349476620 AND TTL 1209600; There is no error on this statement 4. Issuing a select returns the data inserted as given in step 1. 5. Giving a timestamp later than the above timestamp also does not change things 6. However, if I issue an insert without specifying the timestamp , the select statement gives the latest data always. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-4773) CQL shell not reflecting latest data when timestamp is passed as part of insert statements
Ananth Gundabattula created CASSANDRA-4773: -- Summary: CQL shell not reflecting latest data when timestamp is passed as part of insert statements Key: CASSANDRA-4773 URL: https://issues.apache.org/jira/browse/CASSANDRA-4773 Project: Cassandra Issue Type: Bug Affects Versions: 1.1.4 Environment: CentOS Reporter: Ananth Gundabattula Priority: Critical While using the CQLSH, I try inserting a row using timestamp and TTL along with consistency. The insert works fine for the first time. When I try to insert using the same key but different value and then issue a select , the value of the returned row is always the first value of the inserted row and not the value that was inserted later. Here are the details : 1. Replication Factor = 1 2. Consistency Level = ONE 3. TTL = 14 days 4. Timestamp = A value which reflects 10 days before the current day 5. Cassandra 1.1.4 6. CQL shell version 2 It may be noted that if the timstamp is not used while issuing the insert statement, the second insert on the same key works fine. Here are the details: 1. INSERT INTO Table1 (col1,col2,col3,col4,col5,col6) VALUES('abcde', 0, 87, 1345603159,222,'65DE') USING TIMESTAMP 1349476620 AND TTL 1209600; This works fine 2. Issuing a select works fine for the above row. 3. INSERT INTO Table1 (col1,col2,col3,col4,col5,col6) VALUES('abcde', 0, 87, 1345603159,222,'a2134') USING TIMESTAMP 1349476620 AND TTL 1209600; There is no error on this statement 4. Issuing a select returns the data inserted as given in step 1. 5. Giving a timestamp later than the above timestamp also does not change things 6. However, if I issue an insert without specifying the timestamp , the select statement gives the latest data always. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira