Re: [RFC] Implementing auto increment for row id

Ryan Rawson Tue, 19 Oct 2010 22:16:09 -0700

To answer a previous question, we use incrementColumnValue (ICV) to
keep track of counts here at stumbleupon.  On a single column it
scales to 2000-3000 ops/sec easily.


Glad to hear your code is faster though!
-ryan

On Tue, Oct 19, 2010 at 10:11 PM, Imran M Yousuf <[email protected]> wrote:
> Hi,
>
> Just wanted to update that the fixes suggested are in hbase branch.
> Please let me know if further improvements are possible. Changes made
> as per suggestions-
> * Use concurrent hash map
> * Use AtomicLong
> * Remove lock
>
> Result is using HBase test util 3000 row auto-increment+put+get takes
> 4~4.5s (performance increased by 90% by getting using concurrent API)
> and without auto increment 1.5s.
>
> Thanks a lot,
>
> Imran
>
> On Wed, Oct 20, 2010 at 9:10 AM, Imran M Yousuf <[email protected]> wrote:
>> Hi Ryan,
>>
>> Thanks a lot for your feedback, please find some clarifications and
>> queries inline below.
>>
>> On Wed, Oct 20, 2010 at 6:41 AM, Ryan Rawson <[email protected]> wrote:
>>> One should never* call lockRow(), and prefer to do something else
>>> instead.  CheckAndPut works like CompareAndSet (we just call it Put
>>> since that is what you are doing in our API, putting), and there is
>>> also the incrementColumnValue() call.
>>>
>>
>> I did see the incrementColumnValue in HTableInterface, but I wanted to
>> avoid needing to perform an additional operation on HBase for an
>> insert not knowing its performance issues; have you used it and would
>> you recommend it? Singleton with AtomicLong would solve the problem
>> without any HBase operation being needed, what do you think about
>> that?
>>
>>> I'm not really following your code (I'm also sick), but why not just
>>> do something like this:
>>
>> Praying for your speedy recovery.
>>
>>> - Table: Sequences
>>> rowid: table_name  column: id  value: sequence
>>>
>>> So you just call:
>>> table.incrementColumnValue("Sequences",
>>> tableNameThatYouWantSequenceFor, "id", 1);
>>>
>>> and the result is your sequence id to use as a primary key.  No need
>>> to worry about non-existant values, the call creates the value, so the
>>> sequence starts at 1 always.
>>>
>>> -ryan
>>>
>>> * ok you can call lockRow, but be aware that your milage may vary, you
>>> reduce the performance of HBase, and generally can cause a lot of
>>> problems. Eg: you can DOS yourself!
>>>
>>
>> Yes the DOS is a worrying issue, since I faced it due a bug in my code
>> in test where I did not unlock a row upon PUT, so planning to avoid it
>> all together.
>>
>> Thank you,
>>
>> Imran
>>
>>> On Tue, Oct 19, 2010 at 12:39 PM, tsuna <[email protected]> wrote:
>>>> I would like to add that you can probably get rid of RowLock and use
>>>> checkAndPut instead to atomically create the row if it doesn't already
>>>> exist.  This would probably solve the last problem I outlined where 2
>>>> different instances of your web service attempt to assign the same ID
>>>> at the same time.  The code would also be simpler and more efficient.
>>>>
>>>> --
>>>> Benoit "tsuna" Sigoure
>>>> Software Engineer @ www.StumbleUpon.com
>>>>
>>>
>>
>>
>>
>> --
>> Imran M Yousuf
>> Entrepreneur & CEO
>> Smart IT Engineering Ltd.
>> Dhaka, Bangladesh
>> Twitter: @imyousuf - http://twitter.com/imyousuf
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>>
>
>
>
> --
> Imran M Yousuf
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
>

Re: [RFC] Implementing auto increment for row id

Reply via email to