Thanks for the replies. My table is set to store only one version, but I'd probably delete all previous versions to be safe. I'd therefore use one of these 2 methods:
- Delete.deleteColumns(byte[] family, byte[]qualifier) - Delete.deleteColumns(byte[] family, byte[]qualifier, long timestamp) The problem is that both have the client generate the timestamp. If you don't specify it, it uses the HConstants.LATEST_TIMESTAMP which causes the get-before-put (10x slowdown in my use case). If you do specify it, which is required because the method takes a primitive long, then you're relying on the client's clock to be perfect. I chose the latter option for better performance, but was surprised to see there's not an option to let the server generate the currentTimeMillis, since that is what happens on a Put operation. Not a big deal, but wanted see if there was a technical reason behind it or if it's just that nobody's needed that functionality. Thanks again, Matt On Fri, Jan 21, 2011 at 6:41 PM, Bill Graham <[email protected]> wrote: > Thanks Ryan, that clears it up. > > > On Fri, Jan 21, 2011 at 3:29 PM, Ryan Rawson <[email protected]> wrote: > > No, the storage model does not work like that. The storage model > > revolves around the KeyValue, which is roughly: > > > > rowid/family/qualifier/timestamp/data > > > > and we store sequences of these in sorted order in HFiles. > > > > Note, we store the row with every single version of every column/cell. > > > > Therefore there is no such thing as "removing the bytes that represent > > the actual row key", they are part of every cell, and once those cells > > go away, then so does the row key. > > > > I hope this helps, > > -ryan > > > > On Fri, Jan 21, 2011 at 3:26 PM, Bill Graham <[email protected]> > wrote: > >> I follow the tombstone/compact/delete cycle of the column values, but > >> I'm still unclear of the row key life cycle. > >> > >> Is it that the bytes that represent the actual row key are associated > >> with and removed with each column value? Or are they removed upon > >> compaction when no column values exist for a given row key? > >> > >> > >> > >> On Fri, Jan 21, 2011 at 2:26 PM, Ryan Rawson <[email protected]> > wrote: > >>> Any of the deletes merely insert a 'tombstone' which doesnt delete the > >>> data immediately but does mark it so queries no longer return it. > >>> > >>> During the compactions we prune these delete values and they disappear > >>> for good. (Barring other backups of course) > >>> > >>> Because of our variable length storage model, we dont store rows in > >>> particular blocks and rewrite said blocks, so notions of rows > >>> 'existing' or not, don't event apply to HBase as they do to RDBMS > >>> systems. > >>> > >>> -ryan > >>> > >>> On Fri, Jan 21, 2011 at 2:21 PM, Bill Graham <[email protected]> > wrote: > >>>> If you use some combination of delete requests and leave a row without > >>>> any column data will the row/rowkey still exist? I'm thinking of the > >>>> use case where you want to prune all old data, including row keys, > >>>> from a table. > >>>> > >>>> > >>>> On Fri, Jan 21, 2011 at 2:04 PM, Ryan Rawson <[email protected]> > wrote: > >>>>> There are 3 kinds of deletes (with a 4th for win): > >>>>> > >>>>> - Delete.deleteFamily(byte [] family, [long]) > >>>>> -- This removes all data from the given family before the given > >>>>> timestamp, or if none is given, System.currentTimeMillis() > >>>>> - Delete.deleteColumns(byte[] family, byte[]qualifier, [long]) > >>>>> -- This removes all data from the given qualifier, before the given > >>>>> timestamp, or if none is given, System.currentTimeMillis() > >>>>> - Delete.deleteColumn(byte[]family, byte[]qualifier, [long]) > >>>>> -- This removes A SINGLE VERSION at the given time, or if none is > >>>>> given, the most recent version is Get'ed and deleted. > >>>>> - Delete() > >>>>> -- Calls deleteFamily() on server side on every family. > >>>>> > >>>>> Stack is talking about the LAST delete form. > >>>>> > >>>>> I think what you want is probably deleteColumns() (plural!), or > >>>>> perhaps deleteFamily(). > >>>>> > >>>>> One rarely wants to call deleteColumn(), since it removes just a > >>>>> single version, thus exposing older versions, which MAY be what you > >>>>> want, but I'm guessing probably isn't. > >>>>> > >>>>> Only the last form (deleteColumn (singlar!)) calls Get, the rest do > >>>>> not call Get and are very fast. > >>>>> > >>>>> -ryan > >>>>> > >>>>> On Fri, Jan 21, 2011 at 1:51 PM, Stack <[email protected]> wrote: > >>>>>> On Fri, Jan 21, 2011 at 12:30 PM, Matt Corgan <[email protected]> > wrote: > >>>>>>> Is there a way to issue a delete using the server's current > timestamp? I > >>>>>>> see methods using HConstants.LATEST_TIMESTAMP which is extremely > expensive > >>>>>>> since it triggers a Get call. > >>>>>> > >>>>>> Yes. Deleting latest version involves a Get to figure the most > >>>>>> recents timestamp. And yes, in src code it says this is > 'expensive'. > >>>>>> Seems like it does this lookup anything LATEST_TIMESTAMP is passed > >>>>>> whether column, columns, or family only to ensure the delete goes in > >>>>>> ahead of whatever is currently in the Store. > >>>>>> > >>>>>> St.Ack > >>>>>> > >>>>> > >>>> > >>> > >> > > >
