Re: LCS not removing rows with all TTL expired columns

2013-01-22 Thread Bryan Talbot
,1357785277207001,d],
 [req_finish_time_us,50f21d3d,1357785277207001,d],
 [req_method,50f21d3d,1357785277207001,d],
 [req_service,50f21d3d,1357785277207001,d],
 [req_start_time_us,50f21d3d,1357785277207001,d],
 [success,50f21d3d,1357785277207001,d]]
 }


 My experience with TTL columns so far has been pretty similar to
 Viktor's in that the only way to keep them row count under control is to
 force major compactions.  In real world use, STCS and LCS both leave TTL
 expired rows around forever as far as I can tell.  When testing with
 minimal data, removal of TTL expired rows seem to work as expected but in
 this case there seems to be some divergence from real life work and test
 samples.

 -Bryan




 On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:

  @Bryan,

 ** **

 To keep data size as low as possible with TTL columns we still use
 STCS and nightly major compactions.

 ** **

 Experience with LCS was not successful in our case, data size keeps
 too high along with amount of compactions.

 ** **

 IMO, before 1.2, LCS was good for CFs without TTL or high delete rate.
 I have not tested 1.2 LCS behavior, we’re still on 1.0.x

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media 
 Suitehttp://vimeo.com/adform/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments
 is intended solely for the attention and use of the named addressee and 
 may
 be confidential. If you are not the intended recipient, you are reminded
 that the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Thursday, January 17, 2013 06:24
 *To:* user@cassandra.apache.org
 *Subject:* Re: LCS not removing rows with all TTL expired columns

 ** **

 Minor compaction (with Size Tiered) will only purge tombstones if all
 fragments of a row are contained in the SSTables being compacted. So if 
 you
 have a long lived row, that is present in many size tiers, the columns 
 will
 not be purged. 

 ** **

   (thus compacted compacted) 3 days after all columns for that row
 had expired

 Tombstones have to get on disk, even if you set the gc_grace_seconds
 to 0. If not they do not get a chance to delete previous versions of the
 column which already exist on disk. So when the compaction ran your
 ExpiringColumn was turned into a DeletedColumn and placed on disk. ***
 *

 ** **

 I would expect the next round of compaction to remove these columns. *
 ***

 ** **

 There is a new feature in 1.2 that may help you here. It will do a
 special compaction of individual sstables when they have a certain
 proportion of dead columns
 https://issues.apache.org/jira/browse/CASSANDRA-3442 

 ** **

 Also interested to know if LCS helps. 

 ** **

 Cheers

  

 ** **

 -

 Aaron Morton

 Freelance Cassandra Developer

 New Zealand

 ** **

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 17/01/2013, at 2:55 PM, Bryan Talbot btal...@aeriagames.com
 wrote:



 

 According to the timestamps (see original post) the SSTable was
 written (thus compacted compacted) 3 days after all columns for that row
 had expired and 6 days after the row was created; yet all columns are 
 still
 showing up in the SSTable.  Note that the column shows now rows when a
 get for that key is run so that's working correctly, but the data is
 lugged around far longer than it should be -- maybe forever.

 ** **

 ** **

 -Bryan

 ** **

 On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 To get column removed you have to meet two requirements 

 1. column should be expired

 2. after that CF gets compacted

 ** **

 I guess your expired columns are propagated to high tier CF, which
 gets compacted rarely.

 So, you have to wait when high tier CF gets compacted.  

 ** **

 Andrey

 ** **

 ** **

 On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.com
 wrote:

 On cassandra 1.1.5 with a write heavy workload, we're having problems
 getting rows to be compacted away (removed) even though all columns have
 expired TTL.  We've tried size tiered and now leveled and are seeing the
 same symptom: the data stays around essentially forever

Re: LCS not removing rows with all TTL expired columns

2013-01-22 Thread Derek Williams
],
 [client_req_id,50f21d3d,1357785277207001,d],
 [mysql_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_duration_us,50f21d3d,1357785277207001,d],
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_success_call_cnt,50f21d3d,1357785277207001,d],
 [req_duration_us,50f21d3d,1357785277207001,d],
 [req_finish_time_us,50f21d3d,1357785277207001,d],
 [req_method,50f21d3d,1357785277207001,d],
 [req_service,50f21d3d,1357785277207001,d],
 [req_start_time_us,50f21d3d,1357785277207001,d],
 [success,50f21d3d,1357785277207001,d]]
 }


 My experience with TTL columns so far has been pretty similar to
 Viktor's in that the only way to keep them row count under control is to
 force major compactions.  In real world use, STCS and LCS both leave TTL
 expired rows around forever as far as I can tell.  When testing with
 minimal data, removal of TTL expired rows seem to work as expected but in
 this case there seems to be some divergence from real life work and test
 samples.

 -Bryan




 On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:

  @Bryan,

 ** **

 To keep data size as low as possible with TTL columns we still use
 STCS and nightly major compactions.

 ** **

 Experience with LCS was not successful in our case, data size keeps
 too high along with amount of compactions.

 ** **

 IMO, before 1.2, LCS was good for CFs without TTL or high delete
 rate. I have not tested 1.2 LCS behavior, we’re still on 1.0.x

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media 
 Suitehttp://vimeo.com/adform/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments
 is intended solely for the attention and use of the named addressee and 
 may
 be confidential. If you are not the intended recipient, you are reminded
 that the information remains the property of the sender. You must not 
 use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Thursday, January 17, 2013 06:24
 *To:* user@cassandra.apache.org
 *Subject:* Re: LCS not removing rows with all TTL expired columns

 ** **

 Minor compaction (with Size Tiered) will only purge tombstones if all
 fragments of a row are contained in the SSTables being compacted. So if 
 you
 have a long lived row, that is present in many size tiers, the columns 
 will
 not be purged. 

 ** **

   (thus compacted compacted) 3 days after all columns for that row
 had expired

 Tombstones have to get on disk, even if you set the gc_grace_seconds
 to 0. If not they do not get a chance to delete previous versions of the
 column which already exist on disk. So when the compaction ran your
 ExpiringColumn was turned into a DeletedColumn and placed on disk. **
 **

 ** **

 I would expect the next round of compaction to remove these columns.
 

 ** **

 There is a new feature in 1.2 that may help you here. It will do a
 special compaction of individual sstables when they have a certain
 proportion of dead columns
 https://issues.apache.org/jira/browse/CASSANDRA-3442 

 ** **

 Also interested to know if LCS helps. 

 ** **

 Cheers

  

 ** **

 -

 Aaron Morton

 Freelance Cassandra Developer

 New Zealand

 ** **

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 17/01/2013, at 2:55 PM, Bryan Talbot btal...@aeriagames.com
 wrote:



 

 According to the timestamps (see original post) the SSTable was
 written (thus compacted compacted) 3 days after all columns for that row
 had expired and 6 days after the row was created; yet all columns are 
 still
 showing up in the SSTable.  Note that the column shows now rows when a
 get for that key is run so that's working correctly, but the data is
 lugged around far longer than it should be -- maybe forever.

 ** **

 ** **

 -Bryan

 ** **

 On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 To get column removed you have to meet two requirements 

 1. column should be expired

 2. after that CF gets compacted

 ** **

 I guess your expired columns are propagated to high tier CF, which
 gets compacted rarely.

 So, you have to wait when high tier CF gets compacted.  

 ** **

 Andrey

 ** **

 ** **

 On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot 
 btal...@aeriagames.com wrote

RE: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Viktor Jevdokimov
@Bryan,

To keep data size as low as possible with TTL columns we still use STCS and 
nightly major compactions.

Experience with LCS was not successful in our case, data size keeps too high 
along with amount of compactions.

IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I have 
not tested 1.2 LCS behavior, we're still on 1.0.x


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, January 17, 2013 06:24
To: user@cassandra.apache.org
Subject: Re: LCS not removing rows with all TTL expired columns

Minor compaction (with Size Tiered) will only purge tombstones if all fragments 
of a row are contained in the SSTables being compacted. So if you have a long 
lived row, that is present in many size tiers, the columns will not be purged.

 (thus compacted compacted) 3 days after all columns for that row had expired
Tombstones have to get on disk, even if you set the gc_grace_seconds to 0. If 
not they do not get a chance to delete previous versions of the column which 
already exist on disk. So when the compaction ran your ExpiringColumn was 
turned into a DeletedColumn and placed on disk.

I would expect the next round of compaction to remove these columns.

There is a new feature in 1.2 that may help you here. It will do a special 
compaction of individual sstables when they have a certain proportion of dead 
columns https://issues.apache.org/jira/browse/CASSANDRA-3442

Also interested to know if LCS helps.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 2:55 PM, Bryan Talbot 
btal...@aeriagames.commailto:btal...@aeriagames.com wrote:


According to the timestamps (see original post) the SSTable was written (thus 
compacted compacted) 3 days after all columns for that row had expired and 6 
days after the row was created; yet all columns are still showing up in the 
SSTable.  Note that the column shows now rows when a get for that key is run 
so that's working correctly, but the data is lugged around far longer than it 
should be -- maybe forever.


-Bryan

On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh 
ailin...@gmail.commailto:ailin...@gmail.com wrote:
To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted

I guess your expired columns are propagated to high tier CF, which gets 
compacted rarely.
So, you have to wait when high tier CF gets compacted.

Andrey


On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot 
btal...@aeriagames.commailto:btal...@aeriagames.com wrote:
On cassandra 1.1.5 with a write heavy workload, we're having problems getting 
rows to be compacted away (removed) even though all columns have expired TTL.  
We've tried size tiered and now leveled and are seeing the same symptom: the 
data stays around essentially forever.

Currently we write all columns with a TTL of 72 hours (259200 seconds) and 
expect to add 10 GB of data to this CF per day per node.  Each node currently 
has 73 GB for the affected CF and shows no indications that old rows will be 
removed on their own.

Why aren't rows being removed?  Below is some data from a sample row which 
should have been removed several days ago but is still around even though it 
has been involved in numerous compactions since being expired.

$ ./bin/nodetool -h localhost getsstables metrics request_summary 
459fb460-5ace-11e2-9b92-11d67b6163b4
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$ ls -alF 
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
-rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$ ./bin/sstable2json 
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92

Re: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Bryan Talbot
We are using LCS and the particular row I've referenced has been involved
in several compactions after all columns have TTL expired.  The most recent
one was again this morning and the row is still there -- TTL expired for
several days now with gc_grace=0 and several compactions later ...


$ ./bin/nodetool -h localhost getsstables metrics request_summary
459fb460-5ace-11e2-9b92-11d67b6163b4
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db

$ ls -alF
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
-rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db


$ ./bin/sstable2json
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
-k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
{
34353966623436302d356163652d313165322d396239322d313164363762363136336234:
[[app_name,50f21d3d,1357785277207001,d],
[client_ip,50f21d3d,1357785277207001,d],
[client_req_id,50f21d3d,1357785277207001,d],
[mysql_call_cnt,50f21d3d,1357785277207001,d],
[mysql_duration_us,50f21d3d,1357785277207001,d],
[mysql_failure_call_cnt,50f21d3d,1357785277207001,d],
[mysql_success_call_cnt,50f21d3d,1357785277207001,d],
[req_duration_us,50f21d3d,1357785277207001,d],
[req_finish_time_us,50f21d3d,1357785277207001,d],
[req_method,50f21d3d,1357785277207001,d],
[req_service,50f21d3d,1357785277207001,d],
[req_start_time_us,50f21d3d,1357785277207001,d],
[success,50f21d3d,1357785277207001,d]]
}


My experience with TTL columns so far has been pretty similar to Viktor's
in that the only way to keep them row count under control is to force major
compactions.  In real world use, STCS and LCS both leave TTL expired rows
around forever as far as I can tell.  When testing with minimal data,
removal of TTL expired rows seem to work as expected but in this case there
seems to be some divergence from real life work and test samples.

-Bryan




On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

  @Bryan,

 ** **

 To keep data size as low as possible with TTL columns we still use STCS
 and nightly major compactions.

 ** **

 Experience with LCS was not successful in our case, data size keeps too
 high along with amount of compactions.

 ** **

 IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I
 have not tested 1.2 LCS behavior, we’re still on 1.0.x

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Thursday, January 17, 2013 06:24
 *To:* user@cassandra.apache.org
 *Subject:* Re: LCS not removing rows with all TTL expired columns

 ** **

 Minor compaction (with Size Tiered) will only purge tombstones if all
 fragments of a row are contained in the SSTables being compacted. So if you
 have a long lived row, that is present in many size tiers, the columns will
 not be purged. 

 ** **

   (thus compacted compacted) 3 days after all columns for that row had
 expired

 Tombstones have to get on disk, even if you set the gc_grace_seconds to 0.
 If not they do not get a chance to delete previous versions of the column
 which already exist on disk. So when the compaction ran your ExpiringColumn
 was turned into a DeletedColumn and placed on disk. 

 ** **

 I would expect the next round of compaction to remove these columns. 

 ** **

 There is a new feature in 1.2 that may help you here. It will do a special
 compaction of individual sstables when they have a certain proportion of
 dead columns https://issues.apache.org/jira/browse/CASSANDRA-3442 

 ** **

 Also interested to know if LCS helps. 

 ** **

 Cheers

  

 ** **

 -

 Aaron Morton

 Freelance Cassandra Developer

 New Zealand

 ** **

 @aaronmorton

 http://www.thelastpickle.com

Re: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Bryan Talbot
/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Thursday, January 17, 2013 06:24
 *To:* user@cassandra.apache.org
 *Subject:* Re: LCS not removing rows with all TTL expired columns

 ** **

 Minor compaction (with Size Tiered) will only purge tombstones if all
 fragments of a row are contained in the SSTables being compacted. So if you
 have a long lived row, that is present in many size tiers, the columns will
 not be purged. 

 ** **

   (thus compacted compacted) 3 days after all columns for that row had
 expired

 Tombstones have to get on disk, even if you set the gc_grace_seconds to
 0. If not they do not get a chance to delete previous versions of the
 column which already exist on disk. So when the compaction ran your
 ExpiringColumn was turned into a DeletedColumn and placed on disk. 

 ** **

 I would expect the next round of compaction to remove these columns. 

 ** **

 There is a new feature in 1.2 that may help you here. It will do a
 special compaction of individual sstables when they have a certain
 proportion of dead columns
 https://issues.apache.org/jira/browse/CASSANDRA-3442 

 ** **

 Also interested to know if LCS helps. 

 ** **

 Cheers

  

 ** **

 -

 Aaron Morton

 Freelance Cassandra Developer

 New Zealand

 ** **

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 17/01/2013, at 2:55 PM, Bryan Talbot btal...@aeriagames.com wrote:**
 **



 

 According to the timestamps (see original post) the SSTable was written
 (thus compacted compacted) 3 days after all columns for that row had
 expired and 6 days after the row was created; yet all columns are still
 showing up in the SSTable.  Note that the column shows now rows when a
 get for that key is run so that's working correctly, but the data is
 lugged around far longer than it should be -- maybe forever.

 ** **

 ** **

 -Bryan

 ** **

 On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 To get column removed you have to meet two requirements 

 1. column should be expired

 2. after that CF gets compacted

 ** **

 I guess your expired columns are propagated to high tier CF, which gets
 compacted rarely.

 So, you have to wait when high tier CF gets compacted.  

 ** **

 Andrey

 ** **

 ** **

 On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.com
 wrote:

 On cassandra 1.1.5 with a write heavy workload, we're having problems
 getting rows to be compacted away (removed) even though all columns have
 expired TTL.  We've tried size tiered and now leveled and are seeing the
 same symptom: the data stays around essentially forever.  

 ** **

 Currently we write all columns with a TTL of 72 hours (259200 seconds)
 and expect to add 10 GB of data to this CF per day per node.  Each node
 currently has 73 GB for the affected CF and shows no indications that old
 rows will be removed on their own.

 ** **

 Why aren't rows being removed?  Below is some data from a sample row
 which should have been removed several days ago but is still around even
 though it has been involved in numerous compactions since being expired.*
 ***

 ** **

 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4


 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 

 ** **

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 

 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 

 ** **

 $ ./bin/sstable2json
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
 

 {

 34353966623436302d356163652d313165322d396239322d313164363762363136336234:
 [[app_name,50f21d3d,1357785277207001,d],
 [client_ip,50f21d3d,1357785277207001,d],
 [client_req_id,50f21d3d,1357785277207001,d],
 [mysql_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_duration_us,50f21d3d,1357785277207001,d],
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d

Re: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Derek Williams
 Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media 
 Suitehttp://vimeo.com/adform/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Thursday, January 17, 2013 06:24
 *To:* user@cassandra.apache.org
 *Subject:* Re: LCS not removing rows with all TTL expired columns

 ** **

 Minor compaction (with Size Tiered) will only purge tombstones if all
 fragments of a row are contained in the SSTables being compacted. So if you
 have a long lived row, that is present in many size tiers, the columns will
 not be purged. 

 ** **

   (thus compacted compacted) 3 days after all columns for that row had
 expired

 Tombstones have to get on disk, even if you set the gc_grace_seconds to
 0. If not they do not get a chance to delete previous versions of the
 column which already exist on disk. So when the compaction ran your
 ExpiringColumn was turned into a DeletedColumn and placed on disk. 

 ** **

 I would expect the next round of compaction to remove these columns. ***
 *

 ** **

 There is a new feature in 1.2 that may help you here. It will do a
 special compaction of individual sstables when they have a certain
 proportion of dead columns
 https://issues.apache.org/jira/browse/CASSANDRA-3442 

 ** **

 Also interested to know if LCS helps. 

 ** **

 Cheers

  

 ** **

 -

 Aaron Morton

 Freelance Cassandra Developer

 New Zealand

 ** **

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 17/01/2013, at 2:55 PM, Bryan Talbot btal...@aeriagames.com wrote:*
 ***



 

 According to the timestamps (see original post) the SSTable was written
 (thus compacted compacted) 3 days after all columns for that row had
 expired and 6 days after the row was created; yet all columns are still
 showing up in the SSTable.  Note that the column shows now rows when a
 get for that key is run so that's working correctly, but the data is
 lugged around far longer than it should be -- maybe forever.

 ** **

 ** **

 -Bryan

 ** **

 On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 To get column removed you have to meet two requirements 

 1. column should be expired

 2. after that CF gets compacted

 ** **

 I guess your expired columns are propagated to high tier CF, which gets
 compacted rarely.

 So, you have to wait when high tier CF gets compacted.  

 ** **

 Andrey

 ** **

 ** **

 On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.com
 wrote:

 On cassandra 1.1.5 with a write heavy workload, we're having problems
 getting rows to be compacted away (removed) even though all columns have
 expired TTL.  We've tried size tiered and now leveled and are seeing the
 same symptom: the data stays around essentially forever.  

 ** **

 Currently we write all columns with a TTL of 72 hours (259200 seconds)
 and expect to add 10 GB of data to this CF per day per node.  Each node
 currently has 73 GB for the affected CF and shows no indications that old
 rows will be removed on their own.

 ** **

 Why aren't rows being removed?  Below is some data from a sample row
 which should have been removed several days ago but is still around even
 though it has been involved in numerous compactions since being expired.
 

 ** **

 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4


 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 

 ** **

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 

 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 

 ** **

 $ ./bin/sstable2json
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x

Re: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Bryan Talbot
.

 ** **

 IMO, before 1.2, LCS was good for CFs without TTL or high delete rate.
 I have not tested 1.2 LCS behavior, we’re still on 1.0.x

 ** **

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media 
 Suitehttp://vimeo.com/adform/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments
 is intended solely for the attention and use of the named addressee and may
 be confidential. If you are not the intended recipient, you are reminded
 that the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Thursday, January 17, 2013 06:24
 *To:* user@cassandra.apache.org
 *Subject:* Re: LCS not removing rows with all TTL expired columns

 ** **

 Minor compaction (with Size Tiered) will only purge tombstones if all
 fragments of a row are contained in the SSTables being compacted. So if you
 have a long lived row, that is present in many size tiers, the columns will
 not be purged. 

 ** **

   (thus compacted compacted) 3 days after all columns for that row had
 expired

 Tombstones have to get on disk, even if you set the gc_grace_seconds to
 0. If not they do not get a chance to delete previous versions of the
 column which already exist on disk. So when the compaction ran your
 ExpiringColumn was turned into a DeletedColumn and placed on disk. 

 ** **

 I would expect the next round of compaction to remove these columns. **
 **

 ** **

 There is a new feature in 1.2 that may help you here. It will do a
 special compaction of individual sstables when they have a certain
 proportion of dead columns
 https://issues.apache.org/jira/browse/CASSANDRA-3442 

 ** **

 Also interested to know if LCS helps. 

 ** **

 Cheers

  

 ** **

 -

 Aaron Morton

 Freelance Cassandra Developer

 New Zealand

 ** **

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 17/01/2013, at 2:55 PM, Bryan Talbot btal...@aeriagames.com wrote:
 



 

 According to the timestamps (see original post) the SSTable was written
 (thus compacted compacted) 3 days after all columns for that row had
 expired and 6 days after the row was created; yet all columns are still
 showing up in the SSTable.  Note that the column shows now rows when a
 get for that key is run so that's working correctly, but the data is
 lugged around far longer than it should be -- maybe forever.

 ** **

 ** **

 -Bryan

 ** **

 On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 To get column removed you have to meet two requirements 

 1. column should be expired

 2. after that CF gets compacted

 ** **

 I guess your expired columns are propagated to high tier CF, which gets
 compacted rarely.

 So, you have to wait when high tier CF gets compacted.  

 ** **

 Andrey

 ** **

 ** **

 On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.com
 wrote:

 On cassandra 1.1.5 with a write heavy workload, we're having problems
 getting rows to be compacted away (removed) even though all columns have
 expired TTL.  We've tried size tiered and now leveled and are seeing the
 same symptom: the data stays around essentially forever.  

 ** **

 Currently we write all columns with a TTL of 72 hours (259200 seconds)
 and expect to add 10 GB of data to this CF per day per node.  Each node
 currently has 73 GB for the affected CF and shows no indications that old
 rows will be removed on their own.

 ** **

 Why aren't rows being removed?  Below is some data from a sample row
 which should have been removed several days ago but is still around even
 though it has been involved in numerous compactions since being expired.
 

 ** **

 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4


 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 

 ** **

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 

 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 

 ** **

 $ ./bin/sstable2json
 /virtual/cassandra/data/data

Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread Andrey Ilinykh
To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted

I guess your expired columns are propagated to high tier CF, which gets
compacted rarely.
So, you have to wait when high tier CF gets compacted.

Andrey



On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.comwrote:

 On cassandra 1.1.5 with a write heavy workload, we're having problems
 getting rows to be compacted away (removed) even though all columns have
 expired TTL.  We've tried size tiered and now leveled and are seeing the
 same symptom: the data stays around essentially forever.

 Currently we write all columns with a TTL of 72 hours (259200 seconds) and
 expect to add 10 GB of data to this CF per day per node.  Each node
 currently has 73 GB for the affected CF and shows no indications that old
 rows will be removed on their own.

 Why aren't rows being removed?  Below is some data from a sample row which
 should have been removed several days ago but is still around even though
 it has been involved in numerous compactions since being expired.

 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4

 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ./bin/sstable2json
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
 {
 34353966623436302d356163652d313165322d396239322d313164363762363136336234:
 [[app_name,50f21d3d,1357785277207001,d],
 [client_ip,50f21d3d,1357785277207001,d],
 [client_req_id,50f21d3d,1357785277207001,d],
 [mysql_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_duration_us,50f21d3d,1357785277207001,d],
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_success_call_cnt,50f21d3d,1357785277207001,d],
 [req_duration_us,50f21d3d,1357785277207001,d],
 [req_finish_time_us,50f21d3d,1357785277207001,d],
 [req_method,50f21d3d,1357785277207001,d],
 [req_service,50f21d3d,1357785277207001,d],
 [req_start_time_us,50f21d3d,1357785277207001,d],
 [success,50f21d3d,1357785277207001,d]]
 }


 Decoding the column timestamps to shows that the columns were written at
 Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan
 2013 02:34:37 GMT.  The date of the SSTable shows that it was generated on
 Jan 16 which is 3 days after all columns have TTL-ed out.


 The schema shows that gc_grace is set to 0 since this data is write-once,
 read-seldom and is never updated or deleted.

 create column family request_summary
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'UTF8Type'
   and key_validation_class = 'UTF8Type'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy =
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'NONE'
   and bloom_filter_fp_chance = 1.0
   and compression_options = {'chunk_length_kb' : '64',
 'sstable_compression' :
 'org.apache.cassandra.io.compress.SnappyCompressor'};


 Thanks in advance for help in understanding why rows such as this are not
 removed!

 -Bryan




Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread Bryan Talbot
According to the timestamps (see original post) the SSTable was written
(thus compacted compacted) 3 days after all columns for that row had
expired and 6 days after the row was created; yet all columns are still
showing up in the SSTable.  Note that the column shows now rows when a
get for that key is run so that's working correctly, but the data is
lugged around far longer than it should be -- maybe forever.


-Bryan


On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 To get column removed you have to meet two requirements
 1. column should be expired
 2. after that CF gets compacted

 I guess your expired columns are propagated to high tier CF, which gets
 compacted rarely.
 So, you have to wait when high tier CF gets compacted.

 Andrey



 On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.comwrote:

 On cassandra 1.1.5 with a write heavy workload, we're having problems
 getting rows to be compacted away (removed) even though all columns have
 expired TTL.  We've tried size tiered and now leveled and are seeing the
 same symptom: the data stays around essentially forever.

 Currently we write all columns with a TTL of 72 hours (259200 seconds)
 and expect to add 10 GB of data to this CF per day per node.  Each node
 currently has 73 GB for the affected CF and shows no indications that old
 rows will be removed on their own.

 Why aren't rows being removed?  Below is some data from a sample row
 which should have been removed several days ago but is still around even
 though it has been involved in numerous compactions since being expired.

 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4

 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ./bin/sstable2json
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
 {
 34353966623436302d356163652d313165322d396239322d313164363762363136336234:
 [[app_name,50f21d3d,1357785277207001,d],
 [client_ip,50f21d3d,1357785277207001,d],
 [client_req_id,50f21d3d,1357785277207001,d],
 [mysql_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_duration_us,50f21d3d,1357785277207001,d],
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_success_call_cnt,50f21d3d,1357785277207001,d],
 [req_duration_us,50f21d3d,1357785277207001,d],
 [req_finish_time_us,50f21d3d,1357785277207001,d],
 [req_method,50f21d3d,1357785277207001,d],
 [req_service,50f21d3d,1357785277207001,d],
 [req_start_time_us,50f21d3d,1357785277207001,d],
 [success,50f21d3d,1357785277207001,d]]
 }


 Decoding the column timestamps to shows that the columns were written at
 Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan
 2013 02:34:37 GMT.  The date of the SSTable shows that it was generated on
 Jan 16 which is 3 days after all columns have TTL-ed out.


 The schema shows that gc_grace is set to 0 since this data is write-once,
 read-seldom and is never updated or deleted.

 create column family request_summary
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'UTF8Type'
   and key_validation_class = 'UTF8Type'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy =
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'NONE'
   and bloom_filter_fp_chance = 1.0
   and compression_options = {'chunk_length_kb' : '64',
 'sstable_compression' :
 'org.apache.cassandra.io.compress.SnappyCompressor'};


 Thanks in advance for help in understanding why rows such as this are not
 removed!

 -Bryan





Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread aaron morton
Minor compaction (with Size Tiered) will only purge tombstones if all fragments 
of a row are contained in the SSTables being compacted. So if you have a long 
lived row, that is present in many size tiers, the columns will not be purged. 

  (thus compacted compacted) 3 days after all columns for that row had expired
Tombstones have to get on disk, even if you set the gc_grace_seconds to 0. If 
not they do not get a chance to delete previous versions of the column which 
already exist on disk. So when the compaction ran your ExpiringColumn was 
turned into a DeletedColumn and placed on disk. 

I would expect the next round of compaction to remove these columns. 

There is a new feature in 1.2 that may help you here. It will do a special 
compaction of individual sstables when they have a certain proportion of dead 
columns https://issues.apache.org/jira/browse/CASSANDRA-3442 

Also interested to know if LCS helps. 

Cheers
 

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 2:55 PM, Bryan Talbot btal...@aeriagames.com wrote:

 According to the timestamps (see original post) the SSTable was written (thus 
 compacted compacted) 3 days after all columns for that row had expired and 6 
 days after the row was created; yet all columns are still showing up in the 
 SSTable.  Note that the column shows now rows when a get for that key is 
 run so that's working correctly, but the data is lugged around far longer 
 than it should be -- maybe forever.
 
 
 -Bryan
 
 
 On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com wrote:
 To get column removed you have to meet two requirements 
 1. column should be expired
 2. after that CF gets compacted
 
 I guess your expired columns are propagated to high tier CF, which gets 
 compacted rarely.
 So, you have to wait when high tier CF gets compacted.  
 
 Andrey
 
 
 
 On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.com wrote:
 On cassandra 1.1.5 with a write heavy workload, we're having problems getting 
 rows to be compacted away (removed) even though all columns have expired TTL. 
  We've tried size tiered and now leveled and are seeing the same symptom: the 
 data stays around essentially forever.  
 
 Currently we write all columns with a TTL of 72 hours (259200 seconds) and 
 expect to add 10 GB of data to this CF per day per node.  Each node currently 
 has 73 GB for the affected CF and shows no indications that old rows will be 
 removed on their own.
 
 Why aren't rows being removed?  Below is some data from a sample row which 
 should have been removed several days ago but is still around even though it 
 has been involved in numerous compactions since being expired.
 
 $ ./bin/nodetool -h localhost getsstables metrics request_summary 
 459fb460-5ace-11e2-9b92-11d67b6163b4
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 
 $ ls -alF 
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 
 $ ./bin/sstable2json 
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
  -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
 {
 34353966623436302d356163652d313165322d396239322d313164363762363136336234: 
 [[app_name,50f21d3d,1357785277207001,d], 
 [client_ip,50f21d3d,1357785277207001,d], 
 [client_req_id,50f21d3d,1357785277207001,d], 
 [mysql_call_cnt,50f21d3d,1357785277207001,d], 
 [mysql_duration_us,50f21d3d,1357785277207001,d], 
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], 
 [mysql_success_call_cnt,50f21d3d,1357785277207001,d], 
 [req_duration_us,50f21d3d,1357785277207001,d], 
 [req_finish_time_us,50f21d3d,1357785277207001,d], 
 [req_method,50f21d3d,1357785277207001,d], 
 [req_service,50f21d3d,1357785277207001,d], 
 [req_start_time_us,50f21d3d,1357785277207001,d], 
 [success,50f21d3d,1357785277207001,d]]
 }
 
 
 Decoding the column timestamps to shows that the columns were written at 
 Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan 
 2013 02:34:37 GMT.  The date of the SSTable shows that it was generated on 
 Jan 16 which is 3 days after all columns have TTL-ed out.
 
 
 The schema shows that gc_grace is set to 0 since this data is write-once, 
 read-seldom and is never updated or deleted.
 
 create column family request_summary
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'UTF8Type'
   and key_validation_class = 'UTF8Type'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy =