I'd say go to Avro over protobufs in terms of redesigning your schema. 

With respect to CPUs, you don't say what your system looks like. Intel vs AMD , 
Num physical cores, what else you're running on the machine (#Mappers/Reducer 
slots) etc ... 

In terms of the schema... 

How are you accessing your data? 
You said that you want to filter on a column value... using avro to store the 
address record in lets say a JSON string... write a custom filter? 

And people laughed at me when I said that schema design was critical and often 
misunderstood. ;-) 
(Ok the truth was that they laughed at me because I thought I looked cool 
wearing a plaid suit.) 

HTH


On May 1, 2013, at 1:02 AM, Bryan Keller <[email protected]> wrote:

> The table has hashed keys so rows are evenly distributed amongst the 
> regionservers, and load on each regionserver is pretty much the same. I also 
> have per-table balancing turned on. I get mostly data local mappers with only 
> a few rack local (maybe 10 of the 250 mappers).
> 
> Currently the table is a wide table schema, with lists of data structures 
> stored as columns with column prefixes grouping the data structures (e.g. 
> 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I was thinking of 
> moving those data structures to protobuf which would cut down on the number 
> of columns. The downside is I can't filter on one value with that, but it is 
> a tradeoff I would make for performance. I was also considering restructuring 
> the table into a tall table.
> 
> Something interesting is that my old regionserver machines had five 15k SCSI 
> drives instead of 2 SSDs, and performance was about the same. Also, my old 
> network was 1gbit, now it is 10gbit. So neither network nor disk I/O appear 
> to be the bottleneck. The CPU is rather high for the regionserver so it seems 
> like the best candidate to investigate. I will try profiling it tomorrow and 
> will report back. I may revisit compression on vs off since that is adding 
> load to the CPU.
> 
> I'll also come up with a sample program that generates data similar to my 
> table.
> 
> 
> On Apr 30, 2013, at 10:01 PM, lars hofhansl <[email protected]> wrote:
> 
>> Your average row is 35k so scanner caching would not make a huge difference, 
>> although I would have expected some improvements by setting it to 10 or 50 
>> since you have a wide 10ge pipe.
>> 
>> I assume your table is split sufficiently to touch all RegionServer... Do 
>> you see the same load/IO on all region servers?
>> 
>> A bunch of scan improvements went into HBase since 0.94.2.
>> I blogged about some of these changes here: 
>> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
>> 
>> In your case - since you have many columns, each of which carry the rowkey - 
>> you might benefit a lot from HBASE-7279.
>> 
>> In the end HBase *is* slower than straight HDFS for full scans. How could it 
>> not be?
>> So I would start by looking at HDFS first. Make sure Nagle's is disbaled in 
>> both HBase and HDFS.
>> 
>> And lastly SSDs are somewhat new territory for HBase. Maybe Andy Purtell is 
>> listening, I think he did some tests with HBase on SSDs.
>> With rotating media you typically see an improvement with compression. With 
>> SSDs the added CPU needed for decompression might outweigh the benefits.
>> 
>> At the risk of starting a larger discussion here, I would posit that HBase's 
>> LSM based design, which trades random IO with sequential IO, might be a bit 
>> more questionable on SSDs.
>> 
>> If you can, it would be nice to run a profiler against one of the 
>> RegionServers (or maybe do it with the single RS setup) and see where it is 
>> bottlenecked.
>> (And if you send me a sample program to generate some data - not 700g, 
>> though :) - I'll try to do a bit of profiling during the next days as my day 
>> job permits, but I do not have any machines with SSDs).
>> 
>> -- Lars
>> 
>> 
>> 
>> 
>> ________________________________
>> From: Bryan Keller <[email protected]>
>> To: [email protected] 
>> Sent: Tuesday, April 30, 2013 9:31 PM
>> Subject: Re: Poor HBase map-reduce scan performance
>> 
>> 
>> Yes, I have tried various settings for setCaching() and I have 
>> setCacheBlocks(false)
>> 
>> On Apr 30, 2013, at 9:17 PM, Ted Yu <[email protected]> wrote:
>> 
>>> From http://hbase.apache.org/book.html#mapreduce.example :
>>> 
>>> scan.setCaching(500);        // 1 is the default in Scan, which will
>>> be bad for MapReduce jobs
>>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>> 
>>> I guess you have used the above setting.
>>> 
>>> 0.94.x releases are compatible. Have you considered upgrading to, say
>>> 0.94.7 which was recently released ?
>>> 
>>> Cheers
>>> 
>>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <[email protected]> wrote:
>>> 
>>>> I have been attempting to speed up my HBase map-reduce scans for a while
>>>> now. I have tried just about everything without much luck. I'm running out
>>>> of ideas and was hoping for some suggestions. This is HBase 0.94.2 and
>>>> Hadoop 2.0.0 (CDH4.2.1).
>>>> 
>>>> The table I'm scanning:
>>>> 20 mil rows
>>>> Hundreds of columns/row
>>>> Column keys can be 30-40 bytes
>>>> Column values are generally not large, 1k would be on the large side
>>>> 250 regions
>>>> Snappy compression
>>>> 8gb region size
>>>> 512mb memstore flush
>>>> 128k block size
>>>> 700gb of data on HDFS
>>>> 
>>>> My cluster has 8 datanodes which are also regionservers. Each has 8 cores
>>>> (16 HT), 64gb RAM, and 2 SSDs. The network is 10gbit. I have a separate
>>>> machine acting as namenode, HMaster, and zookeeper (single instance). I
>>>> have disk local reads turned on.
>>>> 
>>>> I'm seeing around 5 gbit/sec on average network IO. Each disk is getting
>>>> 400mb/sec read IO. Theoretically I could get 400mb/sec * 16 = 6.4gb/sec.
>>>> 
>>>> Using Hadoop's TestDFSIO tool, I'm seeing around 1.4gb/sec read speed. Not
>>>> really that great compared to the theoretical I/O. However this is far
>>>> better than I am seeing with HBase map-reduce scans of my table.
>>>> 
>>>> I have a simple no-op map-only job (using TableInputFormat) that scans the
>>>> table and does nothing with data. This takes 45 minutes. That's about
>>>> 260mb/sec read speed. This is over 5x slower than straight HDFS.
>>>> Basically, with HBase I'm seeing read performance of my 16 SSD cluster
>>>> performing nearly 35% slower than a single SSD.
>>>> 
>>>> Here are some things I have changed to no avail:
>>>> Scan caching values
>>>> HDFS block sizes
>>>> HBase block sizes
>>>> Region file sizes
>>>> Memory settings
>>>> GC settings
>>>> Number of mappers/node
>>>> Compressed vs not compressed
>>>> 
>>>> One thing I notice is that the regionserver is using quite a bit of CPU
>>>> during the map reduce job. When dumping the jstack of the process, it seems
>>>> like it is usually in some type of memory allocation or decompression
>>>> routine which didn't seem abnormal.
>>>> 
>>>> I can't seem to pinpoint the bottleneck. CPU use by the regionserver is
>>>> high but not maxed out. Disk I/O and network I/O are low, IO wait is low.
>>>> I'm on the verge of just writing the dataset out to sequence files once a
>>>> day for scan purposes. Is that what others are doing?
> 
> 

Reply via email to