Re: Upserting the same values multiple times

2014-01-21 Thread Robert Wille
No tombstones, just many copies of the same data until compaction occurs.

From:  Sanjeeth Kumar 
Reply-To:  
Date:  Tuesday, January 21, 2014 at 8:37 PM
To:  
Subject:  Upserting the same values multiple times

Hi,
   I have a table A, one of the fields of which is a text column called
body.
 This text's length could vary somewhere between 120 characters to say 400
characters. The contents of this column can be the same for millions of
rows.

To prevent the repetition of the same data, I thought I will add another
table B, which stores \.

Table A {
some fields;

digest text,
.
}
  

TABLE B (
  digest text,
  body text,
  PRIMARY KEY (digest)
)

Whenever I insert into table A, I calculate the digest of body, and blindly
call a insert into table B also. I'm not doing any read on B. This could
result in the same  being inserted millions of times in a
short span of time.

Couple of questions.

1) Would this cause an issue due to the number of tombstones created in a
short span of time .I'm assuming for every insert , there would be a
tombstone created for the previous record.
2) Or should I just replicate the same data in Table A itself multiple times
(with compression, space aint that big an issue ?)


- Sanjeeth




RE: Upserting the same values multiple times

2014-01-21 Thread Viktor Jevdokimov
It's not about tombstones. Tombstones are virtually markers for deleted columns 
(using delete or ttl) in new sstables after compaction to keep such columns for 
gcgrace period.

Updates do not create tombstones for previous records, latest version upon 
timestamp will be saved from memtable or when merged from sstables upon 
compaction.

While data is in the memtable, latest timestamp wins, only latest version will 
flush to disk. Then everything depends on how fast you flush memtables and how 
compaction works thereafter. Do not expect any tombstones with updates, except 
when delete columns.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsider
Experience Adform DNA

[Adform News] 
[Adform awarded the Best Employer 2012] 



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Sanjeeth Kumar [mailto:sanje...@exotel.in]
Sent: Wednesday, January 22, 2014 5:37 AM
To: user@cassandra.apache.org
Subject: Upserting the same values multiple times

Hi,
   I have a table A, one of the fields of which is a text column called body.
 This text's length could vary somewhere between 120 characters to say 400 
characters. The contents of this column can be the same for millions of rows.
To prevent the repetition of the same data, I thought I will add another table 
B, which stores \.
Table A {
some fields;

digest text,
.
}


TABLE B (
  digest text,
  body text,
  PRIMARY KEY (digest)
)
Whenever I insert into table A, I calculate the digest of body, and blindly 
call a insert into table B also. I'm not doing any read on B. This could result 
in the same  being inserted millions of times in a short span of 
time.
Couple of questions.
1) Would this cause an issue due to the number of tombstones created in a short 
span of time .I'm assuming for every insert , there would be a tombstone 
created for the previous record.
2) Or should I just replicate the same data in Table A itself multiple times 
(with compression, space aint that big an issue ?)

- Sanjeeth
<><>