Re: Impact of removing compactions_in_progress folder
Any comments on exceptions related to unfinished compactions on Cassandra start up? Best way to deal with them? Side effects of deleting compactions_in_progress folder to resolve the issue? Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:Anuj Wadehra anujw_2...@yahoo.co.in Date:Mon, 13 Apr, 2015 at 12:32 am Subject:Impact of removing compactions_in_progress folder Often we face errors on Cassandra start regarding unfinished compactions particularly when cassandra was abrupty shut down . Problem gets resolved when we delete /var/lib/cassandra/data/system/compactions_in_progress folder. Does deletion of the folder has any impact on integrity of data or any other aspect? Thanks Anuj Wadehra
Re: Impact of removing compactions_in_progress folder
On Sun, Apr 12, 2015 at 12:02 PM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Often we face errors on Cassandra start regarding unfinished compactions particularly when cassandra was abrupty shut down . Problem gets resolved when we delete /var/lib/cassandra/data/system/compactions_in_progress folder. Does deletion of the folder has any impact on integrity of data or any other aspect? While I have no specific knowledge about this case, it is difficult to imagine how canceling a compaction could have any meaningful negative effect other than the normal penalty one pays for uncompacted data. nodetool stop can also cancel compactions, probably by approximately the same mechanism as removing an in-progress-compactions file. However if you can reproduce this reliably, you should : 1) file a ticket on http://issues.apache.org 2) respond to this mail letting the list know the JIRA # of the ticket =Rob
Re: Delete-only work loads crash Cassandra
Unfortunately, I’ve switched email systems and don’t have my emails from that time period. I did not file a Jira, and I don’t remember who made the patch for me or if he filed a Jira on my behalf. I vaguely recall seeing the fix in the Cassandra change logs, but I just went and read them and I don’t see it. I’m probably remembering wrong. My suspicion is that the original patch did not make it into the main branch, and I just have always had enough concurrent writing to keep Cassandra happy. Hopefully the author of the patch will read this and be able to chime in. This issue is very reproducible. I’ll try to come up with some time to write a simple program that illustrates the problem and file a Jira. Thanks Robert On Apr 13, 2015, at 10:39 AM, Philip Thompson philip.thomp...@datastax.commailto:philip.thomp...@datastax.com wrote: Did the original patch make it into upstream? That's unclear. If so, what was the JIRA #? Have you filed a JIRA for the new problem? On Mon, Apr 13, 2015 at 12:21 PM, Robert Wille rwi...@fold3.commailto:rwi...@fold3.com wrote: Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I did lots of deletes and no upserts, Cassandra would report that the memtable was 0 bytes because an accounting error. The memtable would never flush and Cassandra would eventually die. Someone was kind enough to create a patch, which seemed to have fixed the problem, but last night it reared its ugly head. I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, CL=1). The workload was pretty light, because this cleanup process is single-threaded and does everything synchronously. It was performing 4 reads per second and about 3000 deletes per second. Over the course of many hours, heap slowly grew on all nodes. CPU utilization also increased as GC consumed an ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 second GC pauses. The workaround is pretty simple. This cleanup process can simply write a dummy record with a TTL periodically so that Cassandra can flush its memtables and function properly. However, I think this probably ought to be fixed. Delete-only workloads can’t be that rare. I can’t be the only one that needs to go through and cleanup their tables. Robert
Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists
On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Any comments on side effects of Major compaction especially when sstable generated is 100+ GB? I have no idea how this interacts with the automatic compaction stuff; if you find out, let us know? But if you want to do a major and don't want to deal with One Big SSTable afterwards, stop the node and then run sstable_split utility. =Rob
Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists
Any comments on side effects of Major compaction especially when sstable generated is 100+ GB? After Cassandra 1.2 , automated tombstone compaction occurs even on a single sstable if tombstone percentage increases the tombstone_threshold sub property specified in compaction strategy. So, even if the huge sstable is not compacted with any new table, still tombstones will be collected. Any other disadvantage of having a giant sstable of hundreds of GB? I understand that sstables have a summary and index which helps finding correct data blocks directly from a large data file. Still are there any disadvantages? Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:Anuj Wadehra anujw_2...@yahoo.co.in Date:Mon, 13 Apr, 2015 at 12:33 am Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists No. Anuj Wadehra On Monday, 13 April 2015 12:23 AM, Sebastian Estevez sebastian.este...@datastax.com wrote: Have you tried user defined compactions via JMX? On Apr 12, 2015 1:40 PM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Recently we faced an issue where every repair operation caused addition of hundreds of sstables (CASSANDRA-9146). In order to bring situation under control and make sure reads are not impacted, we were left with no option but to run major compaction to ensure that thousands of tiny sstables are compacted. Queries: Does major compaction has any drawback after automatic tombstone compaction got implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)? I understand that the huge SSTable created after major compaction wont be compacted with new data any time soon but is that a problem if purged data is removed via automatic tombstone compaction? If we major compaction results in a huge file say 500GB, what are the drawbacks of it? If one big sstable is a problem, is there any way of solving the problem? We tried running sstablesplit after major compaction to split the big sstable but as new sstables were of same size they are again compacted into single huge table once Cassandra was started after executing sstablesplit. Thanks Anuj Wadehra
Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists
Rob, Does that mean once you split it back into small ones, automatic compaction a will continue to happen on a more frequent basis now that it's no longer a single large monolith? Rahul On Apr 13, 2015, at 3:23 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: Any comments on side effects of Major compaction especially when sstable generated is 100+ GB? I have no idea how this interacts with the automatic compaction stuff; if you find out, let us know? But if you want to do a major and don't want to deal with One Big SSTable afterwards, stop the node and then run sstable_split utility. =Rob
Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists
On Mon, Apr 13, 2015 at 12:26 PM, Rahul Neelakantan ra...@rahul.be wrote: Does that mean once you split it back into small ones, automatic compaction a will continue to happen on a more frequent basis now that it's no longer a single large monolith? That's what the word size tiered means in the phrase size tiered compaction, yes. =Rob
Re: Do I need to run repair and compaction every node?
What about incremental repair and sequential repair? I ran nodetool repair -- keyspace table on one node. I found the repair sessions running on different nodes. Will this command repair the whole table? In this page: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html#concept_ds_ebj_d3q_gk__opsRepairPrtRng *Using the nodetool repair -pr (–partitioner-range) option repairs only the first range returned by the partitioner for a node. Other replicas for that range still have to perform the Merkle tree calculation, causing a validation compaction.* Does it sound like -pr runs on one node? I'm still don't understand the first range returned by the partitioned for a node? On Mon, Apr 13, 2015 at 1:40 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang bewang.t...@gmail.com wrote: - I need to run compaction one each node, In general, there is no requirement to manually run compaction. Minor compaction occurs in the background, automatically. - To repair a table (column family), I only need to run repair on any of nodes. It depends on whether you are doing -pr or non -pr repair. If you are doing -pr repair, you run repair on all nodes. If you do non -pr repair, you have to figure out what set of nodes to run it on. That's why -pr exists, to simplify this. =Rob
Re: Do I need to run repair and compaction every node?
On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang bewang.t...@gmail.com wrote: - I need to run compaction one each node, In general, there is no requirement to manually run compaction. Minor compaction occurs in the background, automatically. - To repair a table (column family), I only need to run repair on any of nodes. It depends on whether you are doing -pr or non -pr repair. If you are doing -pr repair, you run repair on all nodes. If you do non -pr repair, you have to figure out what set of nodes to run it on. That's why -pr exists, to simplify this. =Rob
Do I need to run repair and compaction every node?
I read the document for several times, but I still not quite sure how to run repair and compaction. To my understanding, - I need to run compaction one each node, - To repair a table (column family), I only need to run repair on any of nodes. Am I right? Thanks.
Re: Do I need to run repair and compaction every node?
On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland j...@tubularlabs.com wrote: Nodetool repair -par: covers all nodes, computes merkle trees for each node at the same time. Much higher IO load as every copy of a key range is scanned at once. Can be totally OK with SSDs and throughput limits. Only need to run the command one node. No? -par is just a performance (of repair) de-optimization, intended to improve service time during repair. Doing -par without -pr on a single node doesn't repair your entire cluster. Consider the following 7 node cluster, without vnodes : A B C D E F G RF=3 You run a repair on node D, without -pr. D is repaired against B's tertiary replicas. D is repaired against C's secondary replicas. E is repaired against D's secondary replicas. F is repaired against D's tertiary replicas. Nodes A and G are completely unaffected and unrepaired, because D does not share any ranges with them. repair with or without -par only covers all *replica* nodes. Even with vnodes, you still have to run it on almost all nodes in most cases. Which is why most users should save themselves the complexity and just do a rolling -par -pr on all nodes, one by one. =Rob
Re: Do I need to run repair and compaction every node?
Or use spotify’s reaper and forget about it https://github.com/spotify/cassandra-reaper https://github.com/spotify/cassandra-reaper On Apr 13, 2015, at 3:45 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland j...@tubularlabs.com mailto:j...@tubularlabs.com wrote: Nodetool repair -par: covers all nodes, computes merkle trees for each node at the same time. Much higher IO load as every copy of a key range is scanned at once. Can be totally OK with SSDs and throughput limits. Only need to run the command one node. No? -par is just a performance (of repair) de-optimization, intended to improve service time during repair. Doing -par without -pr on a single node doesn't repair your entire cluster. Consider the following 7 node cluster, without vnodes : A B C D E F G RF=3 You run a repair on node D, without -pr. D is repaired against B's tertiary replicas. D is repaired against C's secondary replicas. E is repaired against D's secondary replicas. F is repaired against D's tertiary replicas. Nodes A and G are completely unaffected and unrepaired, because D does not share any ranges with them. repair with or without -par only covers all *replica* nodes. Even with vnodes, you still have to run it on almost all nodes in most cases. Which is why most users should save themselves the complexity and just do a rolling -par -pr on all nodes, one by one. =Rob
Keyspace Replication changes not synchronized after adding Datacenter
Hi guys, We have recently added two datacenters to our existing 2.0.6 cluster. We followed the process here pretty much exactly: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html We are using GossipingPropertyFileSnitch and NetworkTopologyStrategy across the board. All property files are identical in each of the three datacenters, and we use two nodes from each DC in the seed list. However when we came to step 7.a. we ran the ALTER KEYSPACE command on one of the new datacenters (to add it as a replica). This change was reflected on the new datacenter where it ran as returned by DESCRIBE KEYSPACE. However the change was NOT propagated to either of the other two datacenters. We effectively had to run the ALTER KEYSPACE command 3 times, one in each datacenter. Is this expected? I could find no documentation stating that this needed to be done, nor any documentation around how the system keyspace was kept in sync across datacenters in general. If this is indicative of a larger problem with our installation, how would we go about troubleshooting it? Thanks in advance! Thunder
Binary Protocol Version and CQL version supported in 2.0.14
Hello, I was trying to find what protocol versions are supported in Cassandara 2.0.14 and after reading multiple links i am very very confused. Please correct me if my understanding is correct: - Binary Protocol version and CQL Spec version are different ? - Cassandra 2.0.x supports CQL 3 ? - Is there a different Binary Protocol version between 2.0.x and 2.1.x ? Is there some link which states what version of cassandra supports what binary protocol version and CQL spec version (Additionally showing which drivers support what will be great too) ? The link http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3 shows some info but i am not sure what the supported protocol versions are referring to(binary or CQL spec). Thanks Anishek
Re: Uderstanding Read after update
Yes it will look in each sstable that according to the bloom filter may have data for that partition key and use time stamps to figure out the latest version (or none in case of newer tombstone) to return for each clustering key Sent from my iPhone On Apr 12, 2015, at 11:18 PM, Anishek Agarwal anis...@gmail.com wrote: Thanks Tyler for the validations, I have a follow up question. One SSTable doesn't have precedence over another. Instead, when the same cell exists in both sstables, the one with the higher write timestamp wins. if my table has 5(non partition key columns) and i update only 1 of them then the new SST table should have only that entry, which means if i query everything for that parition key, cassandra has to have the timestamp matched per column for a partition key across SST tables to get me the data ? On Fri, Apr 10, 2015 at 10:52 PM, Tyler Hobbs ty...@datastax.com wrote: SST Table level bloom filters have details as to what partition keys are in that table. So to clear up my understanding, if I insert and then have a update to the same row after some time (assuming both go to different SST Tables), then during read cassandra will read data from both SST Tables and merge them in order of time series with Data in Second SST table for the row taking precedence over the First SST Table and return the result ? That's approximately correct. The only part that's incorrect is how merging works. One SSTable doesn't have precedence over another. Instead, when the same cell exists in both sstables, the one with the higher write timestamp wins. Does it mark the old column as tombstone in the previous SST Table or wait for compaction to remove the old data ? It just waits for compaction to remove the old data, there's no tombstone. when the data is in mem cache it also keep tracks of unique keys in that memtable so when it writes to disk it can use that to derive the right size of bloom filter for that SST Table ? That's correct, it knows the number of keys before the bloom filter is created. -- Tyler Hobbs DataStax
Re: Delete-only work loads crash Cassandra
Did the original patch make it into upstream? That's unclear. If so, what was the JIRA #? Have you filed a JIRA for the new problem? On Mon, Apr 13, 2015 at 12:21 PM, Robert Wille rwi...@fold3.com wrote: Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I did lots of deletes and no upserts, Cassandra would report that the memtable was 0 bytes because an accounting error. The memtable would never flush and Cassandra would eventually die. Someone was kind enough to create a patch, which seemed to have fixed the problem, but last night it reared its ugly head. I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, CL=1). The workload was pretty light, because this cleanup process is single-threaded and does everything synchronously. It was performing 4 reads per second and about 3000 deletes per second. Over the course of many hours, heap slowly grew on all nodes. CPU utilization also increased as GC consumed an ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 second GC pauses. The workaround is pretty simple. This cleanup process can simply write a dummy record with a TTL periodically so that Cassandra can flush its memtables and function properly. However, I think this probably ought to be fixed. Delete-only workloads can’t be that rare. I can’t be the only one that needs to go through and cleanup their tables. Robert
Delete-only work loads crash Cassandra
Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I did lots of deletes and no upserts, Cassandra would report that the memtable was 0 bytes because an accounting error. The memtable would never flush and Cassandra would eventually die. Someone was kind enough to create a patch, which seemed to have fixed the problem, but last night it reared its ugly head. I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, CL=1). The workload was pretty light, because this cleanup process is single-threaded and does everything synchronously. It was performing 4 reads per second and about 3000 deletes per second. Over the course of many hours, heap slowly grew on all nodes. CPU utilization also increased as GC consumed an ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 second GC pauses. The workaround is pretty simple. This cleanup process can simply write a dummy record with a TTL periodically so that Cassandra can flush its memtables and function properly. However, I think this probably ought to be fixed. Delete-only workloads can’t be that rare. I can’t be the only one that needs to go through and cleanup their tables. Robert