RE: Insert into tall table 50% faster than wide table

Michael Segel Wed, 22 Dec 2010 18:54:28 -0800

HBase does version cells.

But I saw something of interest:
"
>>> In my test, there are 10,000 customers, each customer has 600 orders and 
>>> each order has 10 columns. The tall table approach results in 6 mil rows of 
>>> 10 columns. The wide table approach results is 10,000 rows of 6,000 
>>> columns. I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the 
>>> orders using a Put for each order, submitted in batches of 1000 as a list 
>>> of Puts.
>>> 
>>> Are there techniques to speed up inserts with the wide table approach that 
>>> I am perhaps overlooking?
>>> 
>> 
> "


Ok, so you have 10K by 600 by 10. So the 'tall' design has a row key of 
customer_id and Order_id with 10 columns in a single column family.
So you get 6 million rows and 10 column puts.

Now if you do a 'wide' table...
Your row key is the 'customer_id' only. Each column is the order so you write 
one column for each order and you have to figure out how you represent your 
columns in the order. 
(An example... your order of 10 items is represented by a string with a 
'special character' used as a column separator in the order.)
So you're doing one column write for each order and you have a total of 10K 
rows.

Unless I'm missing something part of the 'slowness' could be how your writing 
your orders on your wide table. There are a couple other unknowns. Are you 
hashing your keys? 
I mean are you getting a bit of 'randomness' in your keys?

So what am I missing?

-Mike


> Subject: Re: Insert into tall table 50% faster than wide table
> From: [email protected]
> Date: Wed, 22 Dec 2010 18:24:05 -0800
> To: [email protected]
> 
> Actually I don't think this is the problem as HBase versions cells, not rows, 
> if I understand correctly.
> 
> On Dec 22, 2010, at 5:03 PM, Bryan Keller wrote:
> 
> > Perhaps slow wide table insert performance is related to row versioning? If 
> > I have a customer row and keep adding order columns one by one, I'm 
> > thinking that there might be a version kept of the row for every order I 
> > add? If I am simply inserting a new row for every order, there is no 
> > versioning going on. Could this be causing performance problems?
> > 
> > On Dec 22, 2010, at 4:16 PM, Bryan Keller wrote:
> > 
> >> It appears to be the same or better, not to derail my original question. 
> >> The much slower write performance will cause problems for me unless I can 
> >> resolve that.
> >> 
> >> On Dec 22, 2010, at 3:52 PM, Peter Haidinyak wrote:
> >> 
> >>> Interesting, do you know what the time difference would be on the other 
> >>> side, doing a lookup/scan?
> >>> 
> >>> Thanks
> >>> 
> >>> -Pete
> >>> 
> >>> -----Original Message-----
> >>> From: Bryan Keller [mailto:[email protected]] 
> >>> Sent: Wednesday, December 22, 2010 3:41 PM
> >>> To: [email protected]
> >>> Subject: Insert into tall table 50% faster than wide table
> >>> 
> >>> I have been testing a couple of different approaches to storing customer 
> >>> orders. One is a tall table, where each order is a row. The other is a 
> >>> wide table where each customer is a row, and orders are columns in the 
> >>> row. I am finding that inserts into the tall table, i.e. adding rows for 
> >>> every order, is roughly 50% faster than inserts into the wide table, i.e. 
> >>> adding a row for a customer and then adding columns for orders.
> >>> 
> >>> In my test, there are 10,000 customers, each customer has 600 orders and 
> >>> each order has 10 columns. The tall table approach results in 6 mil rows 
> >>> of 10 columns. The wide table approach results is 10,000 rows of 6,000 
> >>> columns. I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the 
> >>> orders using a Put for each order, submitted in batches of 1000 as a list 
> >>> of Puts.
> >>> 
> >>> Are there techniques to speed up inserts with the wide table approach 
> >>> that I am perhaps overlooking?
> >>> 
> >> 
> > 
>

RE: Insert into tall table 50% faster than wide table

Reply via email to