Re: tombstones problem with 1.0.8
Hi Jonathan, Thanks for your response. We were running a compact at least once a day over the keyspace. The gc_grace was set to only 1 hour, so from what you said I would expect that tombstones should be deleted after max 3 days. When I inspected the data in the SSTables after a compact, some rows contained millions of tombstones with many having timestamps indicating they were older than 2 weeks. We have recently migrated to a new schema design that avoids deleting columns or rows. I ran another compact once data was not being added to the new keyspace (it only ever added new columns, never modified existing or deleted columns). That compact deleted all of the existing tombstones, reducing our data from ~250G down to ~30G. I assume there must have been something strange in our keyspace that prevented tombstones from being deleted just while data was being added. We longer delete columns so the issue is no longer critical for us, but I am still curious as to what/why the issue was occurring just in case we start deleting columns again ;-) Thanks, Ross On 4 April 2012 09:10, Jonathan Ellis jbel...@gmail.com wrote: Removing expired columns actually requires two compaction passes: one to turn the expired column into a tombstone; one to remove the tombstone after gc_grace_seconds. (See https://issues.apache.org/jira/browse/CASSANDRA-1537.) Perhaps CASSANDRA-2786 was causing things to (erroneously) be cleaned up early enough that this helped you out in 0.8.2? On Wed, Mar 21, 2012 at 8:38 PM, Ross Black ross.w.bl...@gmail.com wrote: Hi, We recently moved from 0.8.2 to 1.0.8 and the behaviour seems to have changed so that tombstones are now not being deleted. Our application continually adds and removes columns from Cassandra. We have set a short gc_grace time (3600) since our application would automatically delete zombies if they appear. Under 0.8.2, the tombstones remained at a relatively constant number. Under 1.0.8, the tombstones have been continually increasing so that they exceed the size of our real data (at this stage we have over 100G of tombstones). Even after running a full compact the new compacted SSTable contains a massive number of tombstones, many that are several weeks old. Have I missed some new configuration option to allow deletion of tombstones? I also noticed that one of the changes between 0.8.2 and 1.0.8 was https://issues.apache.org/jira/browse/CASSANDRA-2786 which changed code to avoid dropping tombstones when they might still be needed to shadow data in another sstable. Could this be having an impact since we continually add and remove columns even while a major compact is executing? Thanks, Ross -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: tombstones problem with 1.0.8
Removing expired columns actually requires two compaction passes: one to turn the expired column into a tombstone; one to remove the tombstone after gc_grace_seconds. (See https://issues.apache.org/jira/browse/CASSANDRA-1537.) Perhaps CASSANDRA-2786 was causing things to (erroneously) be cleaned up early enough that this helped you out in 0.8.2? On Wed, Mar 21, 2012 at 8:38 PM, Ross Black ross.w.bl...@gmail.com wrote: Hi, We recently moved from 0.8.2 to 1.0.8 and the behaviour seems to have changed so that tombstones are now not being deleted. Our application continually adds and removes columns from Cassandra. We have set a short gc_grace time (3600) since our application would automatically delete zombies if they appear. Under 0.8.2, the tombstones remained at a relatively constant number. Under 1.0.8, the tombstones have been continually increasing so that they exceed the size of our real data (at this stage we have over 100G of tombstones). Even after running a full compact the new compacted SSTable contains a massive number of tombstones, many that are several weeks old. Have I missed some new configuration option to allow deletion of tombstones? I also noticed that one of the changes between 0.8.2 and 1.0.8 was https://issues.apache.org/jira/browse/CASSANDRA-2786 which changed code to avoid dropping tombstones when they might still be needed to shadow data in another sstable. Could this be having an impact since we continually add and remove columns even while a major compact is executing? Thanks, Ross -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: tombstones problem with 1.0.8
Radim, We are only deleting columns. *Rows are never deleted.* We are continually adding new columns that are then deleted. * Existing columns (deleted or otherwise) are never updated. * Ross* * On 28 March 2012 13:51, John Laban j...@pagerduty.com wrote: (Radim: I'm assuming you mean do not delete already deleted columns as Ross doesn't delete his rows.) Just to be clear about Ross' situation: he continually inserts columns and later deletes columns from the same set of rows. As long as he * doesn't* *keep deleting already-deleted columns* (which refreshes the tombstone on them), the deleted columns *should* get cleaned up, right? (Even though the row itself continually gets new columns inserted and other columns deleted?) Thanks, John On Tue, Mar 27, 2012 at 2:21 AM, Radim Kolar h...@filez.com wrote: Dne 27.3.2012 11:13, Ross Black napsal(a): Any pointers on what I should be looking for in our application that would be stopping the deletion of tombstones? do not delete already deleted rows. On read cassandra returns deleted rows as empty in range slices.
Re: tombstones problem with 1.0.8
Hi Radim, I am hunting for what I believe is a bug in Cassandra and tombstone handling that may be triggered by our particular application usage. I appreciate your attempt to help, but without you actually knowing what our application is doing and why, your advice to change our application is pointless. Thanks, Ross On 28 March 2012 23:13, Radim Kolar h...@filez.com wrote: Dne 28.3.2012 13:14, Ross Black napsal(a): Radim, We are only deleting columns. *Rows are never deleted.* i suggest to change app to delete rows. try composite keys.
Re: tombstones problem with 1.0.8
Any pointers on what I should be looking for in our application that would be stopping the deletion of tombstones? Thanks, Ross On 26 March 2012 16:27, Viktor Jevdokimov viktor.jevdoki...@adform.comwrote: Upon read from S1 S6 rows are merged, T3 timestamp wins. T1 will be deleted upon S1 compaction with S6 or manual cleanup. We're running major compactions nightly, a lot of inserts per day with TTL, some with deletes from app - no problems with tombstones. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Radim Kolar [mailto:h...@filez.com] Sent: Sunday, March 25, 2012 13:20 To: user@cassandra.apache.org Subject: Re: tombstones problem with 1.0.8 Scenario 4 T1 write column T2 Flush memtable to S1 T3 del row T4 flush memtable to S5 T5 tomstone S5 expires T6 S5 is compacted but not with S1 Result?
Re: tombstones problem with 1.0.8
Dne 27.3.2012 11:13, Ross Black napsal(a): Any pointers on what I should be looking for in our application that would be stopping the deletion of tombstones? do not delete already deleted rows. On read cassandra returns deleted rows as empty in range slices.
Re: tombstones problem with 1.0.8
(Radim: I'm assuming you mean do not delete already deleted columns as Ross doesn't delete his rows.) Just to be clear about Ross' situation: he continually inserts columns and later deletes columns from the same set of rows. As long as he *doesn't* *keep deleting already-deleted columns* (which refreshes the tombstone on them), the deleted columns *should* get cleaned up, right? (Even though the row itself continually gets new columns inserted and other columns deleted?) Thanks, John On Tue, Mar 27, 2012 at 2:21 AM, Radim Kolar h...@filez.com wrote: Dne 27.3.2012 11:13, Ross Black napsal(a): Any pointers on what I should be looking for in our application that would be stopping the deletion of tombstones? do not delete already deleted rows. On read cassandra returns deleted rows as empty in range slices.
Re: tombstones problem with 1.0.8
Scenario 4 T1 write column T2 Flush memtable to S1 T3 del row T4 flush memtable to S5 T5 tomstone S5 expires T6 S5 is compacted but not with S1 Result?
RE: tombstones problem with 1.0.8
Upon read from S1 S6 rows are merged, T3 timestamp wins. T1 will be deleted upon S1 compaction with S6 or manual cleanup. We're running major compactions nightly, a lot of inserts per day with TTL, some with deletes from app - no problems with tombstones. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Radim Kolar [mailto:h...@filez.com] Sent: Sunday, March 25, 2012 13:20 To: user@cassandra.apache.org Subject: Re: tombstones problem with 1.0.8 Scenario 4 T1 write column T2 Flush memtable to S1 T3 del row T4 flush memtable to S5 T5 tomstone S5 expires T6 S5 is compacted but not with S1 Result?
RE: tombstones problem with 1.0.8
Yes, continued deletions of the same columns/rows will prevent removing them from final sstable upon compaction due to new timestamp. You're getting sliding tombstone gc grace period in that case. During compaction of selected sstables Cassandra checks the whole Column Family for the latest timestamp of the column/row, including other sstables and memtable. You need to review your application logic. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#!/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Ross Black [mailto:ross.w.bl...@gmail.com] Sent: Friday, March 23, 2012 07:16 To: user@cassandra.apache.org Subject: Re: tombstones problem with 1.0.8 Hi Victor, Thanks for your response. Is there a possibility that continual deletions during compact could be blocking removal of the tombstones? The full manual compact takes about 4 hours per node for our data, so there is a large number of deletes occurring during that time. This is the description from cassandra-cli Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:3] Column Families: ColumnFamily: weekly Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 3600 Compaction min/max thresholds: 3/8 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Ross On 23 March 2012 02:55, Viktor Jevdokimov viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote: Just tested 1.0.8 before upgrading from 1.0.7: tombstones created by TTL or by delete operation are perfectly deleted after either compaction or cleanup. Have no idea about any other settings than gc_grace_seconds, check you schema from cassandra-cli. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063tel:%2B370%205%20212%203063. Fax: +370 5 261 0453tel:%2B370%205%20261%200453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#%21/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Ross Black [mailto:ross.w.bl...@gmail.commailto:ross.w.bl...@gmail.com] Sent: Thursday, March 22, 2012 03:38 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: tombstones problem with 1.0.8 Hi, We recently moved from 0.8.2 to 1.0.8 and the behaviour seems to have changed so that tombstones are now not being deleted. Our application continually adds and removes columns from Cassandra. We have set a short gc_grace time (3600) since our application would automatically delete zombies if they appear. Under 0.8.2, the tombstones remained at a relatively constant number. Under 1.0.8, the tombstones have been continually increasing so that they exceed the size of our real data (at this stage we have over 100G of tombstones). Even after running a full compact the new compacted SSTable contains a massive number of tombstones, many that are several weeks old. Have I missed some new configuration option to allow deletion of tombstones? I also noticed that one of the changes between 0.8.2
Re: tombstones problem with 1.0.8
During compaction of selected sstables Cassandra checks the whole Column Family for the latest timestamp of the column/row, including other sstables and memtable. You are explaining that if i have expired row tombstone and there exists later timestamp on this row that tombstone is not deleted? If this works that way, it will be never deleted.
RE: tombstones problem with 1.0.8
You are explaining that if i have expired row tombstone and there exists later timestamp on this row that tombstone is not deleted? If this works that way, it will be never deleted. Exactly. It is merged with new one. Example 1: a row with 1 column in sstable. delete a row, not a column. after compaction or cleanup in sstable will exist an empty row key with tombstone. Example 2: a row with 1 column in sstable. delete a column. after compaction or cleanup in sstable will exist a row with 1 column with tombstone. Question: why delete operation is requested from application for a row/column that is already deleted (can't be returned by get)? Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#!/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. inline: signature-logo31e4.pnginline: dm-exco16d5.pnginline: tweet3564.png
Re: tombstones problem with 1.0.8
Example: T1 T2 T3 at T1 write column at T2 delete row at T3 tombstone expiration do compact ( T1 + T2 ) and drop expired tombstone column from T1 will be alive again?
RE: tombstones problem with 1.0.8
Should not. Scenario 1, write delete in one memtable T1 write column T2 delete row T3 flush memtable, sstable 1 contains empty row tombstone T4 row tombstone expires T5 compaction/cleanup, row disappears from sstable 2 Scenario 2, write delete different sstables T1 write column T2 flush memtable, sstable 1 contains row with column T3 delete row T4 flush memtable, sstable 2 contains empty row tombstone T5 row tombstone expires T6 compaction, rows from sstable 1 2 merged, not saved to sstable 3 Scenario 3, alive tombstone T1 write column T2 flush memtable, sstable 1 contains row with column T3 delete row T4 flush memtable, sstable 2 contains empty row tombstone T5 delete row (present in memtable) T6 row tombstone for T3 expected to be expired T7 compaction, sstable 3 row tombstone appears because of T5 Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Radim Kolar [mailto:h...@filez.com] Sent: Friday, March 23, 2012 13:28 To: user@cassandra.apache.org Subject: Re: tombstones problem with 1.0.8 Example: T1 T2 T3 at T1 write column at T2 delete row at T3 tombstone expiration do compact ( T1 + T2 ) and drop expired tombstone column from T1 will be alive again?
RE: tombstones problem with 1.0.8
Just tested 1.0.8 before upgrading from 1.0.7: tombstones created by TTL or by delete operation are perfectly deleted after either compaction or cleanup. Have no idea about any other settings than gc_grace_seconds, check you schema from cassandra-cli. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#!/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Ross Black [mailto:ross.w.bl...@gmail.com] Sent: Thursday, March 22, 2012 03:38 To: user@cassandra.apache.org Subject: tombstones problem with 1.0.8 Hi, We recently moved from 0.8.2 to 1.0.8 and the behaviour seems to have changed so that tombstones are now not being deleted. Our application continually adds and removes columns from Cassandra. We have set a short gc_grace time (3600) since our application would automatically delete zombies if they appear. Under 0.8.2, the tombstones remained at a relatively constant number. Under 1.0.8, the tombstones have been continually increasing so that they exceed the size of our real data (at this stage we have over 100G of tombstones). Even after running a full compact the new compacted SSTable contains a massive number of tombstones, many that are several weeks old. Have I missed some new configuration option to allow deletion of tombstones? I also noticed that one of the changes between 0.8.2 and 1.0.8 was https://issues.apache.org/jira/browse/CASSANDRA-2786 which changed code to avoid dropping tombstones when they might still be needed to shadow data in another sstable. Could this be having an impact since we continually add and remove columns even while a major compact is executing? Thanks, Ross inline: signature-logo744e.pnginline: dm-exco3c0.pnginline: tweet6005.png
Re: tombstones problem with 1.0.8
Hi Victor, Thanks for your response. Is there a possibility that continual deletions during compact could be blocking removal of the tombstones? The full manual compact takes about 4 hours per node for our data, so there is a large number of deletes occurring during that time. This is the description from cassandra-cli Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:3] Column Families: ColumnFamily: weekly Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 3600 Compaction min/max thresholds: 3/8 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Ross On 23 March 2012 02:55, Viktor Jevdokimov viktor.jevdoki...@adform.comwrote: Just tested 1.0.8 before upgrading from 1.0.7: tombstones created by TTL or by delete operation are perfectly deleted after either compaction or cleanup. Have no idea about any other settings than gc_grace_seconds, check you schema from cassandra-cli. ** ** ** ** ** ** ** ** ** Best regards/ Pagarbiai ** ** *Viktor Jevdokimov* Senior Developer ** ** Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania ** ** ** ** [image: Adform news] http://www.adform.com/ [image: Visit us!] Follow: [image: twitter] http://twitter.com/#%21/adforminsider Visit our blog http://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* Ross Black [mailto:ross.w.bl...@gmail.com] *Sent:* Thursday, March 22, 2012 03:38 *To:* user@cassandra.apache.org *Subject:* tombstones problem with 1.0.8 ** ** Hi, We recently moved from 0.8.2 to 1.0.8 and the behaviour seems to have changed so that tombstones are now not being deleted. Our application continually adds and removes columns from Cassandra. We have set a short gc_grace time (3600) since our application would automatically delete zombies if they appear. Under 0.8.2, the tombstones remained at a relatively constant number. Under 1.0.8, the tombstones have been continually increasing so that they exceed the size of our real data (at this stage we have over 100G of tombstones). Even after running a full compact the new compacted SSTable contains a massive number of tombstones, many that are several weeks old. Have I missed some new configuration option to allow deletion of tombstones? I also noticed that one of the changes between 0.8.2 and 1.0.8 was https://issues.apache.org/jira/browse/CASSANDRA-2786 which changed code to avoid dropping tombstones when they might still be needed to shadow data in another sstable. Could this be having an impact since we continually add and remove columns even while a major compact is executing? Thanks, Ross dm-exco3c0.pngtweet6005.pngsignature-logo744e.png