I follow the tombstone/compact/delete cycle of the column values, but
I'm still unclear of the row key life cycle.

Is it that the bytes that represent the actual row key are associated
with and removed with each column value? Or are they removed upon
compaction when no column values exist for a given row key?



On Fri, Jan 21, 2011 at 2:26 PM, Ryan Rawson <[email protected]> wrote:
> Any of the deletes merely insert a 'tombstone' which doesnt delete the
> data immediately but does mark it so queries no longer return it.
>
> During the compactions we prune these delete values and they disappear
> for good.  (Barring other backups of course)
>
> Because of our variable length storage model, we dont store rows in
> particular blocks and rewrite said blocks, so notions of rows
> 'existing' or not, don't event apply to HBase as they do to RDBMS
> systems.
>
> -ryan
>
> On Fri, Jan 21, 2011 at 2:21 PM, Bill Graham <[email protected]> wrote:
>> If you use some combination of delete requests and leave a row without
>> any column data will the row/rowkey still exist? I'm thinking of the
>> use case where you want to prune all old data, including row keys,
>> from a table.
>>
>>
>> On Fri, Jan 21, 2011 at 2:04 PM, Ryan Rawson <[email protected]> wrote:
>>> There are 3 kinds of deletes (with a 4th for win):
>>>
>>> - Delete.deleteFamily(byte [] family, [long])
>>> -- This removes all data from the given family before the given
>>> timestamp, or if none is given, System.currentTimeMillis()
>>> - Delete.deleteColumns(byte[] family, byte[]qualifier, [long])
>>> -- This removes all data from the given qualifier, before the given
>>> timestamp, or if none is given, System.currentTimeMillis()
>>> - Delete.deleteColumn(byte[]family, byte[]qualifier, [long])
>>> -- This removes A SINGLE VERSION at the given time, or if none is
>>> given, the most recent version is Get'ed and deleted.
>>> - Delete()
>>> -- Calls deleteFamily() on server side on every family.
>>>
>>> Stack is talking about the LAST delete form.
>>>
>>> I think what you want is probably deleteColumns() (plural!), or
>>> perhaps deleteFamily().
>>>
>>> One rarely wants to call deleteColumn(), since it removes just a
>>> single version, thus exposing older versions, which MAY be what you
>>> want, but I'm guessing probably isn't.
>>>
>>> Only the last form (deleteColumn (singlar!)) calls Get, the rest do
>>> not call Get and are very fast.
>>>
>>> -ryan
>>>
>>> On Fri, Jan 21, 2011 at 1:51 PM, Stack <[email protected]> wrote:
>>>> On Fri, Jan 21, 2011 at 12:30 PM, Matt Corgan <[email protected]> wrote:
>>>>> Is there a way to issue a delete using the server's current timestamp?  I
>>>>> see methods using HConstants.LATEST_TIMESTAMP which is extremely expensive
>>>>> since it triggers a Get call.
>>>>
>>>> Yes.  Deleting latest version involves a Get to figure the most
>>>> recents timestamp.  And yes, in src code it says this is 'expensive'.
>>>> Seems like it does this lookup anything LATEST_TIMESTAMP is passed
>>>> whether column, columns, or family only to ensure the delete goes in
>>>> ahead of whatever is currently in the Store.
>>>>
>>>> St.Ack
>>>>
>>>
>>
>

Reply via email to