Re: Poor HBase map-reduce scan performance

lars hofhansl Tue, 30 Apr 2013 23:22:19 -0700

If you can, try 0.94.4+; it should significantly reduce the amount of bytes 
copied around in RAM during scanning, especially if you have wide rows and/or 
large key portions. That in turns makes scans scale better across cores, since 
RAM is shared resource between cores (much like disk).



It's not hard to build the latest HBase against Cloudera's version of Hadoop. I 
can send along a simple patch to pom.xml to do that.

-- Lars



________________________________
 From: Bryan Keller <brya...@gmail.com>
To: user@hbase.apache.org 
Sent: Tuesday, April 30, 2013 11:02 PM
Subject: Re: Poor HBase map-reduce scan performance
 

The table has hashed keys so rows are evenly distributed amongst the 
regionservers, and load on each regionserver is pretty much the same. I also 
have per-table balancing turned on. I get mostly data local mappers with only a 
few rack local (maybe 10 of the 250 mappers).

Currently the table is a wide table schema, with lists of data structures 
stored as columns with column prefixes grouping the data structures (e.g. 
1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I was thinking of moving 
those data structures to protobuf which would cut down on the number of 
columns. The downside is I can't filter on one value with that, but it is a 
tradeoff I would make for performance. I was also considering restructuring the 
table into a tall table.

Something interesting is that my old regionserver machines had five 15k SCSI 
drives instead of 2 SSDs, and performance was about the same. Also, my old 
network was 1gbit, now it is 10gbit. So neither network nor disk I/O appear to 
be the bottleneck. The CPU is rather high for the regionserver so it seems like 
the best candidate to investigate. I will try profiling it tomorrow and will 
report back. I may revisit compression on vs off since that is adding load to 
the CPU.

I'll also come up with a sample program that generates data similar to my table.


On Apr 30, 2013, at 10:01 PM, lars hofhansl <la...@apache.org> wrote:

> Your average row is 35k so scanner caching would not make a huge difference, 
> although I would have expected some improvements by setting it to 10 or 50 
> since you have a wide 10ge pipe.
> 
> I assume your table is split sufficiently to touch all RegionServer... Do you 
> see the same load/IO on all region servers?
> 
> A bunch of scan improvements went into HBase since 0.94.2.
> I blogged about some of these changes here: 
> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
> 
> In your case - since you have many columns, each of which carry the rowkey - 
> you might benefit a lot from HBASE-7279.
> 
> In the end HBase *is* slower than straight HDFS for full scans. How could it 
> not be?
> So I would start by looking at HDFS first. Make sure Nagle's is disbaled in 
> both HBase and HDFS.
> 
> And lastly SSDs are somewhat new territory for HBase. Maybe Andy Purtell is 
> listening, I think he did some tests with HBase on SSDs.
> With rotating media you typically see an improvement with compression. With 
> SSDs the added CPU needed for decompression might outweigh the benefits.
> 
> At the risk of starting a larger discussion here, I would posit that HBase's 
> LSM based design, which trades random IO with sequential IO, might be a bit 
> more questionable on SSDs.
> 
> If you can, it would be nice to run a profiler against one of the 
> RegionServers (or maybe do it with the single RS setup) and see where it is 
> bottlenecked.
> (And if you send me a sample program to generate some data - not 700g, though 
> :) - I'll try to do a bit of profiling during the next days as my day job 
> permits, but I do not have any machines with SSDs).
> 
> -- Lars
> 
> 
> 
> 
> ________________________________
> From: Bryan Keller <brya...@gmail.com>
> To: user@hbase.apache.org 
> Sent: Tuesday, April 30, 2013 9:31 PM
> Subject: Re: Poor HBase map-reduce scan performance
> 
> 
> Yes, I have tried various settings for setCaching() and I have 
> setCacheBlocks(false)
> 
> On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
>> From http://hbase.apache.org/book.html#mapreduce.example :
>> 
>> scan.setCaching(500);        // 1 is the default in Scan, which will
>> be bad for MapReduce jobs
>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>> 
>> I guess you have used the above setting.
>> 
>> 0.94.x releases are compatible. Have you considered upgrading to, say
>> 0.94.7 which was recently released ?
>> 
>> Cheers
>> 
>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <brya...@gmail.com> wrote:
>> 
>>> I have been attempting to speed up my HBase map-reduce scans for a while
>>> now. I have tried just about everything without much luck. I'm running out
>>> of ideas and was hoping for some suggestions. This is HBase 0.94.2 and
>>> Hadoop 2.0.0 (CDH4.2.1).
>>> 
>>> The table I'm scanning:
>>> 20 mil rows
>>> Hundreds of columns/row
>>> Column keys can be 30-40 bytes
>>> Column values are generally not large, 1k would be on the large side
>>> 250 regions
>>> Snappy compression
>>> 8gb region size
>>> 512mb memstore flush
>>> 128k block size
>>> 700gb of data on HDFS
>>> 
>>> My cluster has 8 datanodes which are also regionservers. Each has 8 cores
>>> (16 HT), 64gb RAM, and 2 SSDs. The network is 10gbit. I have a separate
>>> machine acting as namenode, HMaster, and zookeeper (single instance). I
>>> have disk local reads turned on.
>>> 
>>> I'm seeing around 5 gbit/sec on average network IO. Each disk is getting
>>> 400mb/sec read IO. Theoretically I could get 400mb/sec * 16 = 6.4gb/sec.
>>> 
>>> Using Hadoop's TestDFSIO tool, I'm seeing around 1.4gb/sec read speed. Not
>>> really that great compared to the theoretical I/O. However this is far
>>> better than I am seeing with HBase map-reduce scans of my table.
>>> 
>>> I have a simple no-op map-only job (using TableInputFormat) that scans the
>>> table and does nothing with data. This takes 45 minutes. That's about
>>> 260mb/sec read speed. This is over 5x slower than straight HDFS.
>>> Basically, with HBase I'm seeing read performance of my 16 SSD cluster
>>> performing nearly 35% slower than a single SSD.
>>> 
>>> Here are some things I have changed to no avail:
>>> Scan caching values
>>> HDFS block sizes
>>> HBase block sizes
>>> Region file sizes
>>> Memory settings
>>> GC settings
>>> Number of mappers/node
>>> Compressed vs not compressed
>>> 
>>> One thing I notice is that the regionserver is using quite a bit of CPU
>>> during the map reduce job. When dumping the jstack of the process, it seems
>>> like it is usually in some type of memory allocation or decompression
>>> routine which didn't seem abnormal.
>>> 
>>> I can't seem to pinpoint the bottleneck. CPU use by the regionserver is
>>> high but not maxed out. Disk I/O and network I/O are low, IO wait is low.
>>> I'm on the verge of just writing the dataset out to sequence files once a
>>> day for scan purposes. Is that what others are doing?

Re: Poor HBase map-reduce scan performance

Reply via email to