Hi Ian,

Thank you very much, that pretty much answers it.

Best regards,
Andre Medeiros
________________________________________
From: Ian Varley [[email protected]]
Sent: Wednesday, April 18, 2012 17:11
To: [email protected]
Subject: Re: Performance issues of prepending a table

I would guess that this approach would be susceptible to the same kind of "hot 
spotting" as inserting sequential keys; if you're prepending globally (i.e. 
there's one global "first" row), then all activity will be taking place on the 
same region server, so you wouldn't be taking advantage of the natural 
parallelism of a clustered system like HBase.

That aside, I can't think of anything architectural about HBase that would 
making it perform poorly to be continually inserting rows that sort before 
other rows; I think the log structured merge trees that hbase uses for storage 
will handle any kind of insert activity more or less identically, and write to 
the WAL and the memstore with equal speed regardless of row key position (and, 
flushes to storefiles on disk are based on the sorted arrangement in memory, 
which has already taken place by that point). There may be some smaller order 
differences in the speed of inserting into the memstore, depending on position, 
but that'd be something you'd have to benchmark, and my guess is you'd get 
nothing discernible. But as always, the best way to know is to try it. :)

Ian

On Apr 18, 2012, at 8:59 AM, de Souza Medeiros Andre wrote:

Hi all,

For some specific reason, I have a HBase table that should be frequently 
prepended. The row keys in this table are long integers (converted to bytes of 
course). "Prepend" is an operation that does the following:
1. Scans the table just for the purpose of getting the row key X of the first 
row, then stops the scan.
2. CheckAndSet on X-1, checking if row X-1 is null and putting data at row key 
X-1.
3. If the CAS failed, try CAS on X-2, etc.

I'd like to know if there are any obvious performance drawbacks with this 
approach, compared to inserting rows randomly in the table. With "obvious 
performance drawbacks" I mean something that doesn't need to be benchmarked to 
know its effects.

I am aware that scanning plus CAS will be slower than a simple Put, but I'd 
like to know if prepending has any negative effect regarding region management 
and misc.

Thank you,
Andre Medeiros

Reply via email to