> Each column is the order so you write one column for each order As stated earlier, wide table has 6,000 columns instead of 600. :-)
Bryan: Can you describe how you form row keys in each case ? On Wed, Dec 22, 2010 at 6:53 PM, Michael Segel <[email protected]>wrote: > > HBase does version cells. > > But I saw something of interest: > " > >>> In my test, there are 10,000 customers, each customer has 600 orders > and each order has 10 columns. The tall table approach results in 6 mil rows > of 10 columns. The wide table approach results is 10,000 rows of 6,000 > columns. I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the > orders using a Put for each order, submitted in batches of 1000 as a list of > Puts. > >>> > >>> Are there techniques to speed up inserts with the wide table approach > that I am perhaps overlooking? > >>> > >> > > " > > Ok, so you have 10K by 600 by 10. So the 'tall' design has a row key of > customer_id and Order_id with 10 columns in a single column family. > So you get 6 million rows and 10 column puts. > > Now if you do a 'wide' table... > Your row key is the 'customer_id' only. Each column is the order so you > write one column for each order and you have to figure out how you represent > your columns in the order. > (An example... your order of 10 items is represented by a string with a > 'special character' used as a column separator in the order.) > So you're doing one column write for each order and you have a total of 10K > rows. > > Unless I'm missing something part of the 'slowness' could be how your > writing your orders on your wide table. There are a couple other unknowns. > Are you hashing your keys? > I mean are you getting a bit of 'randomness' in your keys? > > So what am I missing? > > -Mike > > > > Subject: Re: Insert into tall table 50% faster than wide table > > From: [email protected] > > Date: Wed, 22 Dec 2010 18:24:05 -0800 > > To: [email protected] > > > > Actually I don't think this is the problem as HBase versions cells, not > rows, if I understand correctly. > > > > On Dec 22, 2010, at 5:03 PM, Bryan Keller wrote: > > > > > Perhaps slow wide table insert performance is related to row > versioning? If I have a customer row and keep adding order columns one by > one, I'm thinking that there might be a version kept of the row for every > order I add? If I am simply inserting a new row for every order, there is no > versioning going on. Could this be causing performance problems? > > > > > > On Dec 22, 2010, at 4:16 PM, Bryan Keller wrote: > > > > > >> It appears to be the same or better, not to derail my original > question. The much slower write performance will cause problems for me > unless I can resolve that. > > >> > > >> On Dec 22, 2010, at 3:52 PM, Peter Haidinyak wrote: > > >> > > >>> Interesting, do you know what the time difference would be on the > other side, doing a lookup/scan? > > >>> > > >>> Thanks > > >>> > > >>> -Pete > > >>> > > >>> -----Original Message----- > > >>> From: Bryan Keller [mailto:[email protected]] > > >>> Sent: Wednesday, December 22, 2010 3:41 PM > > >>> To: [email protected] > > >>> Subject: Insert into tall table 50% faster than wide table > > >>> > > >>> I have been testing a couple of different approaches to storing > customer orders. One is a tall table, where each order is a row. The other > is a wide table where each customer is a row, and orders are columns in the > row. I am finding that inserts into the tall table, i.e. adding rows for > every order, is roughly 50% faster than inserts into the wide table, i.e. > adding a row for a customer and then adding columns for orders. > > >>> > > >>> In my test, there are 10,000 customers, each customer has 600 orders > and each order has 10 columns. The tall table approach results in 6 mil rows > of 10 columns. The wide table approach results is 10,000 rows of 6,000 > columns. I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the > orders using a Put for each order, submitted in batches of 1000 as a list of > Puts. > > >>> > > >>> Are there techniques to speed up inserts with the wide table approach > that I am perhaps overlooking? > > >>> > > >> > > > > > > >
