Perhaps slow wide table insert performance is related to row versioning? If I 
have a customer row and keep adding order columns one by one, I'm thinking that 
there might be a version kept of the row for every order I add? If I am simply 
inserting a new row for every order, there is no versioning going on. Could 
this be causing performance problems?

On Dec 22, 2010, at 4:16 PM, Bryan Keller wrote:

> It appears to be the same or better, not to derail my original question. The 
> much slower write performance will cause problems for me unless I can resolve 
> that.
> 
> On Dec 22, 2010, at 3:52 PM, Peter Haidinyak wrote:
> 
>> Interesting, do you know what the time difference would be on the other 
>> side, doing a lookup/scan?
>> 
>> Thanks
>> 
>> -Pete
>> 
>> -----Original Message-----
>> From: Bryan Keller [mailto:[email protected]] 
>> Sent: Wednesday, December 22, 2010 3:41 PM
>> To: [email protected]
>> Subject: Insert into tall table 50% faster than wide table
>> 
>> I have been testing a couple of different approaches to storing customer 
>> orders. One is a tall table, where each order is a row. The other is a wide 
>> table where each customer is a row, and orders are columns in the row. I am 
>> finding that inserts into the tall table, i.e. adding rows for every order, 
>> is roughly 50% faster than inserts into the wide table, i.e. adding a row 
>> for a customer and then adding columns for orders.
>> 
>> In my test, there are 10,000 customers, each customer has 600 orders and 
>> each order has 10 columns. The tall table approach results in 6 mil rows of 
>> 10 columns. The wide table approach results is 10,000 rows of 6,000 columns. 
>> I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the orders 
>> using a Put for each order, submitted in batches of 1000 as a list of Puts.
>> 
>> Are there techniques to speed up inserts with the wide table approach that I 
>> am perhaps overlooking?
>> 
> 

Reply via email to