Re: Get on a row with multiple columns

Ted Fri, 08 Feb 2013 22:30:22 -0800

How often do you need to perform such delete operation ?

Is there way to utilize ttl so that you can avoid deletions ?


Pardon me for not knowing your use case very well. 

On Feb 8, 2013, at 10:16 PM, Varun Sharma <[email protected]> wrote:

> Using hbase 0.94.3. Tried that too, ran into performance issues with having
> to retrieve the entire row first (this was getting slow when one particular
> row is hammered) since row can be big (few megs, some times 10s of megs)
> and then finding the columns and then doing a delete.
> 
> To me, it looks like the current implementation of deleteColumn is
> suboptimal because of the 300 gets vs doing 1.
> 
> Thanks
> Varun
> 
> On Fri, Feb 8, 2013 at 10:09 PM, Ted Yu <[email protected]> wrote:
> 
>> Which HBase version are you using ?
>> 
>> Is there a way to place 10 delete markers from application side instead of
>> 300 ?
>> 
>> Thanks
>> 
>> On Fri, Feb 8, 2013 at 10:05 PM, Varun Sharma <[email protected]> wrote:
>> 
>>> We are given a set of 300 columns to delete. I tested two cases:
>>> 
>>> 1) deleteColumns() - with the 's'
>>> 
>>> This function simply adds delete markers for 300 columns, in our case,
>>> typically only a fraction of these columns are actually present - 10.
>> After
>>> starting to use deleteColumns, we starting seeing a drop in cluster wide
>>> random read performance - 90th percentile latency worsened, so did 99th
>>> probably because of having to traverse delete markers. I attribute this
>> to
>>> profusion of delete markers in the cluster. Major compactions slowed down
>>> by almost 50 percent probably because of having to clean out
>> significantly
>>> more delete markers.
>>> 
>>> 2) deleteColumn()
>>> 
>>> Ended up with untolerable 15 second calls, which clogged all the
>> handlers.
>>> Making the cluster pretty much unresponsive.
>>> 
>>> On Fri, Feb 8, 2013 at 9:55 PM, Ted Yu <[email protected]> wrote:
>>> 
>>>> For the 300 column deletes, can you show us how the Delete(s) are
>>>> constructed ?
>>>> 
>>>> Do you use this method ?
>>>> 
>>>>  public Delete deleteColumns(byte [] family, byte [] qualifier) {
>>>> Thanks
>>>> 
>>>> On Fri, Feb 8, 2013 at 9:44 PM, Varun Sharma <[email protected]>
>>> wrote:
>>>> 
>>>>> So a Get call with multiple columns on a single row should be much
>>> faster
>>>>> than independent Get(s) on each of those columns for that row. I am
>>>>> basically seeing severely poor performance (~ 15 seconds) for certain
>>>>> deleteColumn() calls and I am seeing that there is a
>>>>> prepareDeleteTimestamps() function in HRegion.java which first tries
>> to
>>>>> locate the column by doing individual gets on each column you want to
>>>>> delete (I am doing 300 column deletes). Now, I think this should
>> ideall
>>>> by
>>>>> 1 get call with the batch of 300 columns so that one scan can
>> retrieve
>>>> the
>>>>> columns and the columns that are found, are indeed deleted.
>>>>> 
>>>>> Before I try this fix, I wanted to get an opinion if it will make a
>>>>> difference to batch the get() and it seems from your answer, it
>> should.
>>>>> 
>>>>> On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <[email protected]>
>>> wrote:
>>>>> 
>>>>>> Everything is stored as a KeyValue in HBase.
>>>>>> The Key part of a KeyValue contains the row key, column family,
>>> column
>>>>>> name, and timestamp in that order.
>>>>>> Each column family has it's own store and store files.
>>>>>> 
>>>>>> So in a nutshell a get is executed by starting a scan at the row
>> key
>>>>>> (which is a prefix of the key) in each store (CF) and then scanning
>>>>> forward
>>>>>> in each store until the next row key is reached. (in reality it is
>> a
>>>> bit
>>>>>> more complicated due to multiple versions, skipping columns, etc)
>>>>>> 
>>>>>> 
>>>>>> -- Lars
>>>>>> ________________________________
>>>>>> From: Varun Sharma <[email protected]>
>>>>>> To: [email protected]
>>>>>> Sent: Friday, February 8, 2013 9:22 PM
>>>>>> Subject: Re: Get on a row with multiple columns
>>>>>> 
>>>>>> Sorry, I was a little unclear with my question.
>>>>>> 
>>>>>> Lets say you have
>>>>>> 
>>>>>> Get get = new Get(row)
>>>>>> get.addColumn("1");
>>>>>> get.addColumn("2");
>>>>>> .
>>>>>> .
>>>>>> .
>>>>>> 
>>>>>> When internally hbase executes the batch get, it will seek to
>> column
>>>> "1",
>>>>>> now since data is lexicographically sorted, it does not need to
>> seek
>>>> from
>>>>>> the beginning to get to "2", it can continue seeking, henceforth
>>> since
>>>>>> column "2" will always be after column "1". I want to know whether
>>> this
>>>>> is
>>>>>> how a multicolumn get on a row works or not.
>>>>>> 
>>>>>> Thanks
>>>>>> Varun
>>>>>> 
>>>>>> On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz <[email protected]>
>> wrote:
>>>>>> 
>>>>>>> Like Ishan said, a get give an instance of the Result class.
>>>>>>> All utility methods that you can use are:
>>>>>>> byte[] getValue(byte[] family, byte[] qualifier)
>>>>>>> byte[] value()
>>>>>>> byte[] getRow()
>>>>>>> int size()
>>>>>>> boolean isEmpty()
>>>>>>> KeyValue[] raw() # Like Ishan said, all data here is sorted
>>>>>>> List<KeyValue> list()
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 02/08/2013 11:29 PM, Ishan Chhabra wrote:
>>>>>>> 
>>>>>>>> Based on what I read in Lars' book, a get will return a result a
>>>>> Result,
>>>>>>>> which is internally a KeyValue[]. This KeyValue[] is sorted by
>> the
>>>> key
>>>>>> and
>>>>>>>> you access this array using raw or list methods on the Result
>>>> object.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Feb 8, 2013 at 5:40 PM, Varun Sharma <
>> [email protected]
>>>> 
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> +user
>>>>>>>>> 
>>>>>>>>> On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma <
>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> When I do a Get on a row with multiple column qualifiers. Do
>> we
>>>> sort
>>>>>> the
>>>>>>>>>> column qualifers and make use of the sorted order when we get
>>> the
>>>>>>>>> results ?
>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Varun
>>>>>>> --
>>>>>>> Marcos Ortiz Valmaseda,
>>>>>>> Product Manager && Data Scientist at UCI
>>>>>>> Blog: http://marcosluis2186.**posterous.com<
>>>>>> http://marcosluis2186.posterous.com>
>>>>>>> Twitter: @marcosluis2186 <http://twitter.com/**marcosluis2186<
>>>>>> http://twitter.com/marcosluis2186>
>>

Re: Get on a row with multiple columns

Reply via email to