For a total of 1.5kb with 4 columns = 384 bytes/column
bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100
-num_keys 1000000
13/03/25 14:54:45 INFO util.MultiThreadedAction: [W:100] Keys=991664,
cols=3,8m, time=00:03:55 Overall: [keys/s= 4218, latency=23 ms]
Current: [keys/s=4097, latency=24 ms], insertedUpTo=-1
For a total of 1.5kb with 100 columns = 15 bytes/column
bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:15:100
-num_keys 1000000
13/03/25 16:27:44 INFO util.MultiThreadedAction: [W:100] Keys=999721,
cols=95,3m, time=01:27:46 Overall: [keys/s= 189, latency=525 ms]
Current: [keys/s=162, latency=616 ms], insertedUpTo=-1
So overall, the speed is the same. A bit faster with 100 columns than
with 4. I don't think there is any negative impact on HBase side
because of all those columns. Might be interesting to test the same
thing over Thrift...
JM
2013/3/25 Pankaj Misra <[email protected]>:
> Yes Ted, we have been observing Thrift API to clearly outperform Java native
> Hbase API, due to binary communication protocol, at higher loads.
>
> Tariq, the specs of the machine on which we are performing these tests are as
> given below.
>
> Processor : i3770K, 8 logical cores (4 physical, with 2 logical per physical
> core), 3.5 Ghz clock speed
> RAM: 32 GB DDR3
> HDD: Single SATA 2 TB disk, Two 250 GB SATA HDD - Total of 3 disks
> HDFS and Hbase deployed in pseudo-distributed mode.
> We are having 4 parallel streams writing to HBase.
>
> We used the same setup for the previous tests as well, and to be very frank,
> we did expect a bit of drop in performance when we had to test with 40
> columns, but did not expect to get half the performance. When we tested with
> 20 columns, we were consistently getting a performance of 200 mbps of writes.
> But with 40 columns we are getting 90 mbps of throughput only on the same
> setup.
>
> Thanks and Regards
> Pankaj Misra
>
>
> ________________________________________
> From: Ted Yu [[email protected]]
> Sent: Tuesday, March 26, 2013 1:09 AM
> To: [email protected]
> Subject: Re: HBase Writes With Large Number of Columns
>
> bq. These records are being written using batch mutation with thrift API
> This is an important information, I think.
>
> Batch mutation through Java API would incur lower overhead.
>
> On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
> <[email protected]>wrote:
>
>> Firstly, Thanks a lot Jean and Ted for your extended help, very much
>> appreciate it.
>>
>> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
>> distributed across these columns.
>>
>> Jean, some columns are storing as small as a single byte value, while few
>> of the columns are storing as much as 80-125 bytes of data. The overall
>> record size is 1.5 KB. These records are being written using batch mutation
>> with thrift API, where in we are writing 100 records per batch mutation.
>>
>> Thanks and Regards
>> Pankaj Misra
>>
>>
>> ________________________________________
>> From: Jean-Marc Spaggiari [[email protected]]
>> Sent: Monday, March 25, 2013 11:57 PM
>> To: [email protected]
>> Subject: Re: HBase Writes With Large Number of Columns
>>
>> I just ran some LoadTest to see if I can reproduce that.
>>
>> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
>> -num_keys 1000000
>> 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
>> cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
>> Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1
>>
>> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
>> -num_keys 1000000
>>
>> This one crashed because I don't have enought disk space, so I'm
>> re-running it, but just before it crashed it was showing about 24.5
>> slower. which is coherent since it's writing 25 more columns.
>>
>> What size of data do you have? Big cells? Small cells? I will retry
>> the test above with more lines and keep you posted.
>>
>> 2013/3/25 Pankaj Misra <[email protected]>:
>> > Yes Ted, you are right, we are having table regions pre-split, and we
>> see that both regions are almost evenly filled in both the tests.
>> >
>> > This does not seem to be a regression though, since we were getting good
>> write rates when we had lesser number of columns.
>> >
>> > Thanks and Regards
>> > Pankaj Misra
>> >
>> >
>> > ________________________________________
>> > From: Ted Yu [[email protected]]
>> > Sent: Monday, March 25, 2013 11:15 PM
>> > To: [email protected]
>> > Cc: [email protected]
>> > Subject: Re: HBase Writes With Large Number of Columns
>> >
>> > Copying Ankit who raised the same question soon after Pankaj's initial
>> > question.
>> >
>> > On one hand I wonder if this was a regression in 0.94.5 (though
>> unlikely).
>> >
>> > Did the region servers receive (relatively) same write load for the
>> second
>> > test case ? I assume you have pre-split your tables in both cases.
>> >
>> > Cheers
>> >
>> > On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
>> > <[email protected]>wrote:
>> >
>> >> Hi Ted,
>> >>
>> >> Sorry for missing that detail, we are using HBase version 0.94.5
>> >>
>> >> Regards
>> >> Pankaj Misra
>> >>
>> >>
>> >> ________________________________________
>> >> From: Ted Yu [[email protected]]
>> >> Sent: Monday, March 25, 2013 10:29 PM
>> >> To: [email protected]
>> >> Subject: Re: HBase Writes With Large Number of Columns
>> >>
>> >> If you give us the version of HBase you're using, that would give us
>> some
>> >> more information to help you.
>> >>
>> >> Cheers
>> >>
>> >> On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra <
>> [email protected]
>> >> >wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > The issue that I am facing is around the performance drop of Hbase,
>> when
>> >> I
>> >> > was having 20 columns in a column family Vs now when I am having 40
>> >> columns
>> >> > in a column family. The number of columns have doubled and the
>> >> > ingestion/write speed has also dropped by half. I am writing 1.5 KB of
>> >> data
>> >> > per row across 40 columns.
>> >> >
>> >> > Are there any settings that I should look into for tweaking Hbase to
>> >> write
>> >> > higher number of columns faster?
>> >> >
>> >> > I would request community's help to let me know how can I write to a
>> >> > column family with large number of columns efficiently.
>> >> >
>> >> > Would greatly appreciate any help /clues around this issue.
>> >> >
>> >> > Thanks and Regards
>> >> > Pankaj Misra
>> >> >
>> >> > ________________________________
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > NOTE: This message may contain information that is confidential,
>> >> > proprietary, privileged or otherwise protected by law. The message is
>> >> > intended solely for the named addressee. If received in error, please
>> >> > destroy and notify the sender. Any use of this email is prohibited
>> when
>> >> > received in error. Impetus does not represent, warrant and/or
>> guarantee,
>> >> > that the integrity of this communication has been maintained nor that
>> the
>> >> > communication is free of errors, virus, interception or interference.
>> >> >
>> >>
>> >> ________________________________
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> NOTE: This message may contain information that is confidential,
>> >> proprietary, privileged or otherwise protected by law. The message is
>> >> intended solely for the named addressee. If received in error, please
>> >> destroy and notify the sender. Any use of this email is prohibited when
>> >> received in error. Impetus does not represent, warrant and/or guarantee,
>> >> that the integrity of this communication has been maintained nor that
>> the
>> >> communication is free of errors, virus, interception or interference.
>> >>
>> >
>> > ________________________________
>> >
>> >
>> >
>> >
>> >
>> >
>> > NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>> ________________________________
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential, proprietary,
> privileged or otherwise protected by law. The message is intended solely for
> the named addressee. If received in error, please destroy and notify the
> sender. Any use of this email is prohibited when received in error. Impetus
> does not represent, warrant and/or guarantee, that the integrity of this
> communication has been maintained nor that the communication is free of
> errors, virus, interception or interference.