Re: Insert into tall table 50% faster than wide table

Ted Yu Wed, 22 Dec 2010 19:00:55 -0800

> Each column is the order so you write one column for each order
As stated earlier, wide table has 6,000 columns instead of 600. :-)


Bryan:
Can you describe how you form row keys in each case ?


On Wed, Dec 22, 2010 at 6:53 PM, Michael Segel <[email protected]>wrote:

>
> HBase does version cells.
>
> But I saw something of interest:
> "
> >>> In my test, there are 10,000 customers, each customer has 600 orders
> and each order has 10 columns. The tall table approach results in 6 mil rows
> of 10 columns. The wide table approach results is 10,000 rows of 6,000
> columns. I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the
> orders using a Put for each order, submitted in batches of 1000 as a list of
> Puts.
> >>>
> >>> Are there techniques to speed up inserts with the wide table approach
> that I am perhaps overlooking?
> >>>
> >>
> > "
>
> Ok, so you have 10K by 600 by 10. So the 'tall' design has a row key of
> customer_id and Order_id with 10 columns in a single column family.
> So you get 6 million rows and 10 column puts.
>
> Now if you do a 'wide' table...
> Your row key is the 'customer_id' only. Each column is the order so you
> write one column for each order and you have to figure out how you represent
> your columns in the order.
> (An example... your order of 10 items is represented by a string with a
> 'special character' used as a column separator in the order.)
> So you're doing one column write for each order and you have a total of 10K
> rows.
>
> Unless I'm missing something part of the 'slowness' could be how your
> writing your orders on your wide table. There are a couple other unknowns.
> Are you hashing your keys?
> I mean are you getting a bit of 'randomness' in your keys?
>
> So what am I missing?
>
> -Mike
>
>
> > Subject: Re: Insert into tall table 50% faster than wide table
> > From: [email protected]
> > Date: Wed, 22 Dec 2010 18:24:05 -0800
> > To: [email protected]
> >
> > Actually I don't think this is the problem as HBase versions cells, not
> rows, if I understand correctly.
> >
> > On Dec 22, 2010, at 5:03 PM, Bryan Keller wrote:
> >
> > > Perhaps slow wide table insert performance is related to row
> versioning? If I have a customer row and keep adding order columns one by
> one, I'm thinking that there might be a version kept of the row for every
> order I add? If I am simply inserting a new row for every order, there is no
> versioning going on. Could this be causing performance problems?
> > >
> > > On Dec 22, 2010, at 4:16 PM, Bryan Keller wrote:
> > >
> > >> It appears to be the same or better, not to derail my original
> question. The much slower write performance will cause problems for me
> unless I can resolve that.
> > >>
> > >> On Dec 22, 2010, at 3:52 PM, Peter Haidinyak wrote:
> > >>
> > >>> Interesting, do you know what the time difference would be on the
> other side, doing a lookup/scan?
> > >>>
> > >>> Thanks
> > >>>
> > >>> -Pete
> > >>>
> > >>> -----Original Message-----
> > >>> From: Bryan Keller [mailto:[email protected]]
> > >>> Sent: Wednesday, December 22, 2010 3:41 PM
> > >>> To: [email protected]
> > >>> Subject: Insert into tall table 50% faster than wide table
> > >>>
> > >>> I have been testing a couple of different approaches to storing
> customer orders. One is a tall table, where each order is a row. The other
> is a wide table where each customer is a row, and orders are columns in the
> row. I am finding that inserts into the tall table, i.e. adding rows for
> every order, is roughly 50% faster than inserts into the wide table, i.e.
> adding a row for a customer and then adding columns for orders.
> > >>>
> > >>> In my test, there are 10,000 customers, each customer has 600 orders
> and each order has 10 columns. The tall table approach results in 6 mil rows
> of 10 columns. The wide table approach results is 10,000 rows of 6,000
> columns. I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the
> orders using a Put for each order, submitted in batches of 1000 as a list of
> Puts.
> > >>>
> > >>> Are there techniques to speed up inserts with the wide table approach
> that I am perhaps overlooking?
> > >>>
> > >>
> > >
> >
>
>

Re: Insert into tall table 50% faster than wide table

Reply via email to