On Wed, Jan 19, 2011 at 8:41 PM, Germán Kondolf <german.kond...@gmail.com>wrote:
> On Wed, Jan 19, 2011 at 12:59 AM, Zhu Han <schumi....@gmail.com> wrote: > > > > > > On Wed, Jan 19, 2011 at 11:35 AM, Germán Kondolf < > german.kond...@gmail.com> > > wrote: > >> > >> Yes, that's what I meant, but correct me if I'm wrong, when a deletion > >> comes after another deletion for the same row or column will the > gc-before > >> count against the last one, isn't it? > >> > > IIRC, after compaction. even if the row key is not wiped, all the CF are > > replaced by the youngest tombstone. I do not understand very clearly the > > benefit of wiping out the whole row as early as possible. > > > The only problem I saw is the bloom filter might be filled up, if it was inserted too many tombstones for rows non existed. > > I think it is not a "benefit", but a potencial issue, if you delete > columns or rows without checking them before you could make them live > as long as you keep issuing deletions, maybe it's a strange use-case, > but certainly Cassandra provides new non-traditional ways of > processing high-volume of information. > > As the original example depicted clearly: > day 1 -> insert Row1.Col1 > day 2 -> delete Row1.Col1 > day 11 (before gc-grace-seconds) -> delete Row1.Col1 > > In the last command I've extended the life of a tombstone, maybe the > check before the deletion could have a performance impact in the > process, so I think it might be handled server-side instead of > client-side. > > //GK > http://twitter.com/germanklf > http://code.google.com/p/seide/ > > >> > >> Maybe knowing that all the subsequent versions of a deletion are > deletions > >> too, it could take the first timestamp against the gc-grace-seconds when > is > >> reducing & compacting. > >> > >> // Germán Kondolf > >> http://twitter.com/germanklf > >> http://code.google.com/p/seide/ > >> // @i4 > >> > >> On 19/01/2011, at 00:16, Jonathan Ellis <jbel...@gmail.com> wrote: > >> > >> > If you mean that multiple tombstones for the same row or column should > >> > be merged into a single one at compaction time, then yes, that is what > >> > happens. > >> > > >> > On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf > >> > <german.kond...@gmail.com> wrote: > >> >> Maybe it could be taken into account when the compaction is executed, > >> >> if I only have a consecutive list of uninterrupted tombstones it > could > >> >> only care about the first. It sounds like the-way-it-should-be, maybe > >> >> as a part of the "row-reduce" process. > >> >> > >> >> Is it feasible? Looking into the CASSANDRA-1074 sounds like it > should. > >> >> > >> >> //GK > >> >> http://twitter.com/germanklf > >> >> http://code.google.com/p/seide/ > >> >> > >> >> On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne > >> >> <sylv...@riptano.com> wrote: > >> >>> On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn <da...@lookin2.com > > > >> >>> wrote: > >> >>>> Thanks, Aaron, but I'm not 100% clear. > >> >>>> > >> >>>> My situation is this: My use case spins off rows (not columns) that > I > >> >>>> no > >> >>>> longer need and want to delete. It is possible that these rows were > >> >>>> never > >> >>>> created in the first place, or were already deleted. This is a very > >> >>>> large > >> >>>> cleanup task that normally deletes a lot of rows, and the last > thing > >> >>>> that I > >> >>>> want to do is create tombstones for rows that didn't exist in the > >> >>>> first > >> >>>> place, or lengthen the life on disk of tombstones of rows that are > >> >>>> already > >> >>>> deleted. > >> >>>> > >> >>>> So the question is: before I delete, do I have to retrieve the row > to > >> >>>> see if > >> >>>> it exists in the first place? > >> >>> > >> >>> Yes, in your situation you do. > >> >>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton > >> >>>> <aa...@thelastpickle.com> > >> >>>> wrote: > >> >>>>> > >> >>>>> AFAIK that's not necessary, there is no need to worry about > previous > >> >>>>> deletes. You can delete stuff that does not even exist, neither > >> >>>>> batch_mutate > >> >>>>> or remove are going to throw an error. > >> >>>>> All the columns that were (roughly speaking) present at your first > >> >>>>> deletion will be available for GC at the end of the first > tombstones > >> >>>>> life. > >> >>>>> Same for the second. > >> >>>>> Say you were to write a col between the two deletes with the same > >> >>>>> name as > >> >>>>> one present at the start. The first version of the col is avail > for > >> >>>>> GC after > >> >>>>> tombstone 1, and the second after tombstone 2. > >> >>>>> Hope that helps > >> >>>>> Aaron > >> >>>>> On 18/01/2011, at 9:37 PM, David Boxenhorn <da...@lookin2.com> > >> >>>>> wrote: > >> >>>>> > >> >>>>> Thanks. In other words, before I delete something, I should check > to > >> >>>>> see > >> >>>>> whether it exists as a live row in the first place. > >> >>>>> > >> >>>>> On Tue, Jan 18, 2011 at 9:24 AM, Ryan King <r...@twitter.com> > wrote: > >> >>>>>> > >> >>>>>> On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn > >> >>>>>> <da...@lookin2.com> > >> >>>>>> wrote: > >> >>>>>>> If I delete a row, and later on delete it again, before > >> >>>>>>> GCGraceSeconds > >> >>>>>>> has > >> >>>>>>> elapsed, does the tombstone live longer? > >> >>>>>> > >> >>>>>> Each delete is a new tombstone, which should answer your > question. > >> >>>>>> > >> >>>>>> -ryan > >> >>>>>> > >> >>>>>>> In other words, if I have the following scenario: > >> >>>>>>> > >> >>>>>>> GCGraceSeconds = 10 days > >> >>>>>>> On day 1 I delete a row > >> >>>>>>> On day 5 I delete the row again > >> >>>>>>> > >> >>>>>>> Will the tombstone be removed on day 10 or day 15? > >> >>>>>>> > >> >>>>> > >> >>>> > >> >>>> > >> >>> > >> >> > >> > > >> > > >> > > >> > -- > >> > Jonathan Ellis > >> > Project Chair, Apache Cassandra > >> > co-founder of Riptano, the source for professional Cassandra support > >> > http://riptano.com > >> > > > > > > //GK > http://twitter.com/germanklf > http://code.google.com/p/seide/ >