On Wed, Jan 19, 2011 at 8:41 PM, Germán Kondolf <german.kond...@gmail.com>wrote:

> On Wed, Jan 19, 2011 at 12:59 AM, Zhu Han <schumi....@gmail.com> wrote:
> >
> >
> > On Wed, Jan 19, 2011 at 11:35 AM, Germán Kondolf <
> german.kond...@gmail.com>
> > wrote:
> >>
> >> Yes, that's what I meant, but correct me if I'm wrong, when a deletion
> >> comes after another deletion for the same row or column will the
> gc-before
> >> count against the last one, isn't it?
> >>
> > IIRC, after compaction. even if the row key is not wiped, all the CF are
> > replaced by the youngest tombstone.  I do not understand very clearly the
> > benefit of wiping out the whole row as early as possible.
> >
>

The only problem I saw is the bloom filter might be filled up, if it was
inserted too many tombstones for rows non existed.

>
> I think it is not a "benefit", but a potencial issue, if you delete
> columns or rows without checking them before you could make them live
> as long as you keep issuing deletions, maybe it's a strange use-case,
> but certainly Cassandra provides new non-traditional ways of
> processing high-volume of information.
>
> As the original example depicted clearly:
> day 1 -> insert Row1.Col1
> day 2 -> delete Row1.Col1
> day 11 (before gc-grace-seconds) -> delete Row1.Col1
>
> In the last command I've extended the life of a tombstone, maybe the
> check before the deletion could have a performance impact in the
> process, so I think it might be handled server-side instead of
> client-side.
>
> //GK
> http://twitter.com/germanklf
> http://code.google.com/p/seide/
>
> >>
> >> Maybe knowing that all the subsequent versions of a deletion are
> deletions
> >> too, it could take the first timestamp against the gc-grace-seconds when
> is
> >> reducing & compacting.
> >>
> >> // Germán Kondolf
> >> http://twitter.com/germanklf
> >> http://code.google.com/p/seide/
> >> // @i4
> >>
> >> On 19/01/2011, at 00:16, Jonathan Ellis <jbel...@gmail.com> wrote:
> >>
> >> > If you mean that multiple tombstones for the same row or column should
> >> > be merged into a single one at compaction time, then yes, that is what
> >> > happens.
> >> >
> >> > On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf
> >> > <german.kond...@gmail.com> wrote:
> >> >> Maybe it could be taken into account when the compaction is executed,
> >> >> if I only have a consecutive list of uninterrupted tombstones it
> could
> >> >> only care about the first. It sounds like the-way-it-should-be, maybe
> >> >> as a part of the "row-reduce" process.
> >> >>
> >> >> Is it feasible? Looking into the CASSANDRA-1074 sounds like it
> should.
> >> >>
> >> >> //GK
> >> >> http://twitter.com/germanklf
> >> >> http://code.google.com/p/seide/
> >> >>
> >> >> On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne
> >> >> <sylv...@riptano.com> wrote:
> >> >>> On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn <da...@lookin2.com
> >
> >> >>> wrote:
> >> >>>> Thanks, Aaron, but I'm not 100% clear.
> >> >>>>
> >> >>>> My situation is this: My use case spins off rows (not columns) that
> I
> >> >>>> no
> >> >>>> longer need and want to delete. It is possible that these rows were
> >> >>>> never
> >> >>>> created in the first place, or were already deleted. This is a very
> >> >>>> large
> >> >>>> cleanup task that normally deletes a lot of rows, and the last
> thing
> >> >>>> that I
> >> >>>> want to do is create tombstones for rows that didn't exist in the
> >> >>>> first
> >> >>>> place, or lengthen the life on disk of tombstones of rows that are
> >> >>>> already
> >> >>>> deleted.
> >> >>>>
> >> >>>> So the question is: before I delete, do I have to retrieve the row
> to
> >> >>>> see if
> >> >>>> it exists in the first place?
> >> >>>
> >> >>> Yes, in your situation you do.
> >> >>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton
> >> >>>> <aa...@thelastpickle.com>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> AFAIK that's not necessary, there is no need to worry about
> previous
> >> >>>>> deletes. You can delete stuff that does not even exist, neither
> >> >>>>> batch_mutate
> >> >>>>> or remove are going to throw an error.
> >> >>>>> All the columns that were (roughly speaking) present at your first
> >> >>>>> deletion will be available for GC at the end of the first
> tombstones
> >> >>>>> life.
> >> >>>>> Same for the second.
> >> >>>>> Say you were to write a col between the two deletes with the same
> >> >>>>> name as
> >> >>>>> one present at the start. The first version of the col is avail
> for
> >> >>>>> GC after
> >> >>>>> tombstone 1, and the second after tombstone 2.
> >> >>>>> Hope that helps
> >> >>>>> Aaron
> >> >>>>> On 18/01/2011, at 9:37 PM, David Boxenhorn <da...@lookin2.com>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>> Thanks. In other words, before I delete something, I should check
> to
> >> >>>>> see
> >> >>>>> whether it exists as a live row in the first place.
> >> >>>>>
> >> >>>>> On Tue, Jan 18, 2011 at 9:24 AM, Ryan King <r...@twitter.com>
> wrote:
> >> >>>>>>
> >> >>>>>> On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn
> >> >>>>>> <da...@lookin2.com>
> >> >>>>>> wrote:
> >> >>>>>>> If I delete a row, and later on delete it again, before
> >> >>>>>>> GCGraceSeconds
> >> >>>>>>> has
> >> >>>>>>> elapsed, does the tombstone live longer?
> >> >>>>>>
> >> >>>>>> Each delete is a new tombstone, which should answer your
> question.
> >> >>>>>>
> >> >>>>>> -ryan
> >> >>>>>>
> >> >>>>>>> In other words, if I have the following scenario:
> >> >>>>>>>
> >> >>>>>>> GCGraceSeconds = 10 days
> >> >>>>>>> On day 1 I delete a row
> >> >>>>>>> On day 5 I delete the row again
> >> >>>>>>>
> >> >>>>>>> Will the tombstone be removed on day 10 or day 15?
> >> >>>>>>>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Jonathan Ellis
> >> > Project Chair, Apache Cassandra
> >> > co-founder of Riptano, the source for professional Cassandra support
> >> > http://riptano.com
> >>
> >
> >
>
> //GK
> http://twitter.com/germanklf
> http://code.google.com/p/seide/
>

Reply via email to