Re: Insert into tall table 50% faster than wide table

Lars George Thu, 23 Dec 2010 02:56:13 -0800

Writing data only hits the WAL and MemStore, so that should equal in
the same performance for both models. One thing that Mike mentioned is
how you distribute the load. How many servers are you using? How are
inserting your data (sequential or random)? Why do you use a Put since
this sounds like a bulk insert and hence should be much better done
with a HFileOutputFormat based MapReduce job?


You do have some row locking happening as mentioned earlier, which may
block concurrent updates to the same row. Are you sending updates for
one row in a single Put instance? Or are you creating many Put's for
each order but the same row?

Lars

On Thu, Dec 23, 2010 at 9:57 AM, Andrey Stepachev <[email protected]> wrote:
> 2010/12/23 Ted Dunning <[email protected]>
>
>> But the tall table is FASTER than the wide table.
>>
>
> Opps. :).
>
> Maybe you put more data? Do you using compression? (in case of prefixed
> qualifiers you
> get more data, that uuid can has comparable length as an order row)
>
>
>>
>> On Wed, Dec 22, 2010 at 11:14 PM, Andrey Stepachev <[email protected]>
>> wrote:
>>
>> > I think row locks slows down here. Each row you inserted tries to aquire
>> > lock, and then release it. Wide table has significally less rows, and
>> much
>> > less locks acquired during insert.
>> >
>> >
>> > 2010/12/23 Bryan Keller <[email protected]>
>> >
>> > > I have been testing a couple of different approaches to storing
>> customer
>> > > orders. One is a tall table, where each order is a row. The other is a
>> > wide
>> > > table where each customer is a row, and orders are columns in the row.
>> I
>> > am
>> > > finding that inserts into the tall table, i.e. adding rows for every
>> > order,
>> > > is roughly 50% faster than inserts into the wide table, i.e. adding a
>> row
>> > > for a customer and then adding columns for orders.
>> > >
>> > > In my test, there are 10,000 customers, each customer has 600 orders
>> and
>> > > each order has 10 columns. The tall table approach results in 6 mil
>> rows
>> > of
>> > > 10 columns. The wide table approach results is 10,000 rows of 6,000
>> > columns.
>> > > I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the orders
>> > > using a Put for each order, submitted in batches of 1000 as a list of
>> > Puts.
>> > >
>> > > Are there techniques to speed up inserts with the wide table approach
>> > that
>> > > I am perhaps overlooking?
>> > >
>> > >
>> >
>>
>

Re: Insert into tall table 50% faster than wide table

Reply via email to