To answer a previous question, we use incrementColumnValue (ICV) to keep track of counts here at stumbleupon. On a single column it scales to 2000-3000 ops/sec easily.
Glad to hear your code is faster though! -ryan On Tue, Oct 19, 2010 at 10:11 PM, Imran M Yousuf <[email protected]> wrote: > Hi, > > Just wanted to update that the fixes suggested are in hbase branch. > Please let me know if further improvements are possible. Changes made > as per suggestions- > * Use concurrent hash map > * Use AtomicLong > * Remove lock > > Result is using HBase test util 3000 row auto-increment+put+get takes > 4~4.5s (performance increased by 90% by getting using concurrent API) > and without auto increment 1.5s. > > Thanks a lot, > > Imran > > On Wed, Oct 20, 2010 at 9:10 AM, Imran M Yousuf <[email protected]> wrote: >> Hi Ryan, >> >> Thanks a lot for your feedback, please find some clarifications and >> queries inline below. >> >> On Wed, Oct 20, 2010 at 6:41 AM, Ryan Rawson <[email protected]> wrote: >>> One should never* call lockRow(), and prefer to do something else >>> instead. CheckAndPut works like CompareAndSet (we just call it Put >>> since that is what you are doing in our API, putting), and there is >>> also the incrementColumnValue() call. >>> >> >> I did see the incrementColumnValue in HTableInterface, but I wanted to >> avoid needing to perform an additional operation on HBase for an >> insert not knowing its performance issues; have you used it and would >> you recommend it? Singleton with AtomicLong would solve the problem >> without any HBase operation being needed, what do you think about >> that? >> >>> I'm not really following your code (I'm also sick), but why not just >>> do something like this: >> >> Praying for your speedy recovery. >> >>> - Table: Sequences >>> rowid: table_name column: id value: sequence >>> >>> So you just call: >>> table.incrementColumnValue("Sequences", >>> tableNameThatYouWantSequenceFor, "id", 1); >>> >>> and the result is your sequence id to use as a primary key. No need >>> to worry about non-existant values, the call creates the value, so the >>> sequence starts at 1 always. >>> >>> -ryan >>> >>> * ok you can call lockRow, but be aware that your milage may vary, you >>> reduce the performance of HBase, and generally can cause a lot of >>> problems. Eg: you can DOS yourself! >>> >> >> Yes the DOS is a worrying issue, since I faced it due a bug in my code >> in test where I did not unlock a row upon PUT, so planning to avoid it >> all together. >> >> Thank you, >> >> Imran >> >>> On Tue, Oct 19, 2010 at 12:39 PM, tsuna <[email protected]> wrote: >>>> I would like to add that you can probably get rid of RowLock and use >>>> checkAndPut instead to atomically create the row if it doesn't already >>>> exist. This would probably solve the last problem I outlined where 2 >>>> different instances of your web service attempt to assign the same ID >>>> at the same time. The code would also be simpler and more efficient. >>>> >>>> -- >>>> Benoit "tsuna" Sigoure >>>> Software Engineer @ www.StumbleUpon.com >>>> >>> >> >> >> >> -- >> Imran M Yousuf >> Entrepreneur & CEO >> Smart IT Engineering Ltd. >> Dhaka, Bangladesh >> Twitter: @imyousuf - http://twitter.com/imyousuf >> Blog: http://imyousuf-tech.blogs.smartitengineering.com/ >> Mobile: +880-1711402557 >> > > > > -- > Imran M Yousuf > Twitter: @imyousuf - http://twitter.com/imyousuf > Blog: http://imyousuf-tech.blogs.smartitengineering.com/ > Mobile: +880-1711402557 >
