Thanks for the replies.  My table is set to store only one version, but I'd
probably delete all previous versions to be safe.  I'd therefore use one of
these 2 methods:

- Delete.deleteColumns(byte[] family, byte[]qualifier)
- Delete.deleteColumns(byte[] family, byte[]qualifier, long timestamp)

The problem is that both have the client generate the timestamp.  If you
don't specify it, it uses the HConstants.LATEST_TIMESTAMP which causes the
get-before-put (10x slowdown in my use case).  If you do specify it, which
is required because the method takes a primitive long, then you're relying
on the client's clock to be perfect.  I chose the latter option for better
performance, but was surprised to see there's not an option to let the
server generate the currentTimeMillis, since that is what happens on a Put
operation.  Not a big deal, but wanted see if there was a technical reason
behind it or if it's just that nobody's needed that functionality.

Thanks again,
Matt

On Fri, Jan 21, 2011 at 6:41 PM, Bill Graham <[email protected]> wrote:

> Thanks Ryan, that clears it up.
>
>
> On Fri, Jan 21, 2011 at 3:29 PM, Ryan Rawson <[email protected]> wrote:
> > No, the storage model does not work like that.  The storage model
> > revolves around the KeyValue, which is roughly:
> >
> > rowid/family/qualifier/timestamp/data
> >
> > and we store sequences of these in sorted order in HFiles.
> >
> > Note, we store the row with every single version of every column/cell.
> >
> > Therefore there is no such thing as "removing the bytes that represent
> > the actual row key", they are part of every cell, and once those cells
> > go away, then so does the row key.
> >
> > I hope this helps,
> > -ryan
> >
> > On Fri, Jan 21, 2011 at 3:26 PM, Bill Graham <[email protected]>
> wrote:
> >> I follow the tombstone/compact/delete cycle of the column values, but
> >> I'm still unclear of the row key life cycle.
> >>
> >> Is it that the bytes that represent the actual row key are associated
> >> with and removed with each column value? Or are they removed upon
> >> compaction when no column values exist for a given row key?
> >>
> >>
> >>
> >> On Fri, Jan 21, 2011 at 2:26 PM, Ryan Rawson <[email protected]>
> wrote:
> >>> Any of the deletes merely insert a 'tombstone' which doesnt delete the
> >>> data immediately but does mark it so queries no longer return it.
> >>>
> >>> During the compactions we prune these delete values and they disappear
> >>> for good.  (Barring other backups of course)
> >>>
> >>> Because of our variable length storage model, we dont store rows in
> >>> particular blocks and rewrite said blocks, so notions of rows
> >>> 'existing' or not, don't event apply to HBase as they do to RDBMS
> >>> systems.
> >>>
> >>> -ryan
> >>>
> >>> On Fri, Jan 21, 2011 at 2:21 PM, Bill Graham <[email protected]>
> wrote:
> >>>> If you use some combination of delete requests and leave a row without
> >>>> any column data will the row/rowkey still exist? I'm thinking of the
> >>>> use case where you want to prune all old data, including row keys,
> >>>> from a table.
> >>>>
> >>>>
> >>>> On Fri, Jan 21, 2011 at 2:04 PM, Ryan Rawson <[email protected]>
> wrote:
> >>>>> There are 3 kinds of deletes (with a 4th for win):
> >>>>>
> >>>>> - Delete.deleteFamily(byte [] family, [long])
> >>>>> -- This removes all data from the given family before the given
> >>>>> timestamp, or if none is given, System.currentTimeMillis()
> >>>>> - Delete.deleteColumns(byte[] family, byte[]qualifier, [long])
> >>>>> -- This removes all data from the given qualifier, before the given
> >>>>> timestamp, or if none is given, System.currentTimeMillis()
> >>>>> - Delete.deleteColumn(byte[]family, byte[]qualifier, [long])
> >>>>> -- This removes A SINGLE VERSION at the given time, or if none is
> >>>>> given, the most recent version is Get'ed and deleted.
> >>>>> - Delete()
> >>>>> -- Calls deleteFamily() on server side on every family.
> >>>>>
> >>>>> Stack is talking about the LAST delete form.
> >>>>>
> >>>>> I think what you want is probably deleteColumns() (plural!), or
> >>>>> perhaps deleteFamily().
> >>>>>
> >>>>> One rarely wants to call deleteColumn(), since it removes just a
> >>>>> single version, thus exposing older versions, which MAY be what you
> >>>>> want, but I'm guessing probably isn't.
> >>>>>
> >>>>> Only the last form (deleteColumn (singlar!)) calls Get, the rest do
> >>>>> not call Get and are very fast.
> >>>>>
> >>>>> -ryan
> >>>>>
> >>>>> On Fri, Jan 21, 2011 at 1:51 PM, Stack <[email protected]> wrote:
> >>>>>> On Fri, Jan 21, 2011 at 12:30 PM, Matt Corgan <[email protected]>
> wrote:
> >>>>>>> Is there a way to issue a delete using the server's current
> timestamp?  I
> >>>>>>> see methods using HConstants.LATEST_TIMESTAMP which is extremely
> expensive
> >>>>>>> since it triggers a Get call.
> >>>>>>
> >>>>>> Yes.  Deleting latest version involves a Get to figure the most
> >>>>>> recents timestamp.  And yes, in src code it says this is
> 'expensive'.
> >>>>>> Seems like it does this lookup anything LATEST_TIMESTAMP is passed
> >>>>>> whether column, columns, or family only to ensure the delete goes in
> >>>>>> ahead of whatever is currently in the Store.
> >>>>>>
> >>>>>> St.Ack
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Reply via email to