Re: Slow row deletion performance in comparison to insertion

Ted Yu Wed, 27 Jun 2012 16:51:12 -0700

I created HBASE-6287 <https://issues.apache.org/jira/browse/HBASE-6287> for
porting HBASE-5941 to trunk.


Jeff:
What version of HBase are you using ?

Since HBASE-5941 is an improvement, a vote may be raised for porting it to
other branches.

On Wed, Jun 27, 2012 at 4:15 PM, Jeff Whiting <[email protected]> wrote:

> Looking at HBASE-6284 it seems that deletes are not batched at the
> regionserver level so that is the reason for the performance degradation.
>  Additionally HBASE-5941 with the locks is also contributing to the
> performance degradation.
>
> So until those changes get into an hbase release I just have to live with
> the slower performance.  Is there anything I need to do on my end?
>
> Just as a sanity check, I tried setting a timestamp in the delete object
> but it made no difference.  I'll batch my deletes at end as you suggested
> (as memory allows).
>
> Thanks,
> ~Jeff
>
> On 6/27/2012 4:11 PM, Ted Yu wrote:
>
>> Amit:
>> Can you point us to the JIRA or changelist in 0.89-fb ?
>>
>> Thanks
>>
>>
>> On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer <[email protected]>
>> wrote:
>>
>>  There was some difference in the way locks are taken for batched deletes
>>> and puts.  This was fixed for 89.
>>>
>>> I wonder if the same could be the issue here.
>>>
>>> Sent from my iPhone
>>>
>>> On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" <[email protected]> wrote:
>>>
>>>  I'm struggling to understand why my deletes are taking longer than my
>>>>
>>> inserts.  My understanding is that a delete is just an insertion of a
>>> tombstone.  And I'm deleting the entire row.
>>>
>>>> I do a simple loop (pseudo code) and insert the 100 byte rows:
>>>>
>>>> for (int i=0; i < 50000; i++)
>>>> {
>>>>    puts.append(new Put(rowkey[i], oneHundredBytes[i]));
>>>>
>>>>    if (puts.size() % 1000 == 0)
>>>>    {
>>>>        Benchmark.start();
>>>>        table.batch(puts);
>>>>        Benchmark.stop();
>>>>    }
>>>> }
>>>>
>>>>
>>>> The above takes about 8282ms total.
>>>>
>>>> However the delete takes more than twice as long:
>>>>
>>>> Iterator it = table.getScannerScan(rowkey[0]**,
>>>>
>>> rowkey[50000-1]).iterator();
>>>
>>>> while(it.hasNext())
>>>> {
>>>>    r = it.next();
>>>>    deletes.append(new Delete(r.getRow()));
>>>>    if (deletes.size() % 1000 == 0)
>>>>    {
>>>>        Benchmark.start();
>>>>        table.batch(deletes);
>>>>        Benchmark.stop();
>>>>    }
>>>> }
>>>>
>>>> The above takes 17369ms total.
>>>>
>>>> I'm only benchmarking the deletion time and not the scan time.
>>>>
>>> Additionally if I batch the deletes into one big one at the end (rather
>>> than while I'm scanning) it takes about the same amount of time. I am
>>> deleting the entire row so I wouldn't think it would be doing a read
>>> before
>>> the delete (
>>> http://mail-archives.apache.**org/mod_mbox/hbase-user/**201206.mbox/%**
>>> 3CE83D30E8F408F94A96F992785FC2**9D82063395D6@s2k3mntaexc1.**
>>> mentacapital.local%3E<http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%3CE83D30E8F408F94A96F992785FC29D82063395D6@s2k3mntaexc1.mentacapital.local%3E>
>>> ).
>>>
>>>> Any thoughts on why it is slower and how I can speed it up?
>>>>
>>>> Thanks,
>>>> ~Jeff
>>>>
>>>> --
>>>> Jeff Whiting
>>>> Qualtrics Senior Software Engineer
>>>> [email protected]
>>>>
>>>>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> [email protected]
>
>
>
>

Re: Slow row deletion performance in comparison to insertion

Reply via email to