Re: Poor HBase map-reduce scan performance

2013-07-01 Thread Bryan Keller
013 at 3:59 AM, lars hofhansl wrote: > >> Absolutely. >> >> >> >> - Original Message - >> From: Ted Yu >> To: user@hbase.apache.org >> Cc: >> Sent: Sunday, June 30, 2013 9:32 PM >> Subject: Re: Poor HBase map-reduce scan pe

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread Enis Söztutar
On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl wrote: > Absolutely. > > > > - Original Message - > From: Ted Yu > To: user@hbase.apache.org > Cc: > Sent: Sunday, June 30, 2013 9:32 PM > Subject: Re: Poor HBase map-reduce scan performance > > Looking a

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread lars hofhansl
Absolutely. - Original Message - From: Ted Yu To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail of HBASE-8369, there were some comments which are yet to be addressed. I think trunk patch should

Re: Poor HBase map-reduce scan performance

2013-06-30 Thread Ted Yu
; > To: user@hbase.apache.org; lars hofhansl > > Cc: > > Sent: Tuesday, June 25, 2013 1:56 AM > > Subject: Re: Poor HBase map-reduce scan performance > > > > I tweaked Enis's snapshot input format and backported it to 0.94.6 and > have snapshot scanning fu

Re: Poor HBase map-reduce scan performance

2013-06-30 Thread Bryan Keller
-- > From: Bryan Keller > To: user@hbase.apache.org; lars hofhansl > Cc: > Sent: Tuesday, June 25, 2013 1:56 AM > Subject: Re: Poor HBase map-reduce scan performance > > I tweaked Enis's snapshot input format and backported it to 0.94.6 and have > snapshot scanning func

Re: Poor HBase map-reduce scan performance

2013-06-28 Thread lars hofhansl
__ > From: Sandy Pratt > To: "user@hbase.apache.org" > Sent: Wednesday, June 5, 2013 10:58 AM > Subject: Re: Poor HBase map-reduce scan performance > > > Yong, > > As a thought experiment, imagine how it impacts the throughput of TCP

Re: Poor HBase map-reduce scan performance

2013-06-25 Thread Bryan Keller
__ > From: Sandy Pratt > To: "user@hbase.apache.org" > Sent: Wednesday, June 5, 2013 10:58 AM > Subject: Re: Poor HBase map-reduce scan performance > > > Yong, > > As a thought experiment, imagine how it impacts the throughput of TCP

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread lars hofhansl
Lars From: Sandy Pratt To: "user@hbase.apache.org" Sent: Wednesday, June 5, 2013 10:58 AM Subject: Re: Poor HBase map-reduce scan performance Yong, As a thought experiment, imagine how it impacts the throughput of TCP to keep the window size at 1.  That means there'

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Sandy Pratt
That's my understanding of how the current scan API works, yes. The client calls next() to fetch a batch. While it's waiting for the response from the server, it blocks. After the server responds to the next() call, it does nothing for that scanner until the following next() call. That makes fo

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread yonghu
Dear Sandy, Thanks for your explanation. However, what I don't get is your term "client", is this "client" means MapReduce jobs? If I understand you right, this means Map function will process the tuples and during this processing time, the regionserver did nothing? regards! Yong On Wed, Jun

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Sandy Pratt
Yong, As a thought experiment, imagine how it impacts the throughput of TCP to keep the window size at 1. That means there's only one packet in flight at a time, and total throughput is a fraction of what it could be. That's effectively what happens with RPC. The server sends a batch, then does

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Ted Yu
bq. the Regionserver and Tasktracker are the same node when you use MapReduce to scan the HBase table. The scan performed by the Tasktracker on that node would very likely access data hosted by region server on other node(s). So there would be RPC involved. There is some discussion on providing s

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread yonghu
Can anyone explain why client + rpc + server will decrease the performance of scanning? I mean the Regionserver and Tasktracker are the same node when you use MapReduce to scan the HBase table. So, in my understanding, there will be no rpc cost. Thanks! Yong On Wed, Jun 5, 2013 at 10:09 AM, San

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Sandy Pratt
https://issues.apache.org/jira/browse/HBASE-8691 On 6/4/13 6:11 PM, "Sandy Pratt" wrote: >Haven't had a chance to write a JIRA yet, but I thought I'd pop in here >with an update in the meantime. > >I tried a number of different approaches to eliminate latency and >"bubbles" in the scan pipeline

Re: Poor HBase map-reduce scan performance

2013-06-04 Thread Sandy Pratt
Haven't had a chance to write a JIRA yet, but I thought I'd pop in here with an update in the meantime. I tried a number of different approaches to eliminate latency and "bubbles" in the scan pipeline, and eventually arrived at adding a streaming scan API to the region server, along with refactori

Re: Poor HBase map-reduce scan performance

2013-06-04 Thread Bryan Keller
not (yet) a strength of HBase. >> >> So with HDFS you get to 75% of the theoretical maximum read throughput; >> hence with HBase you to 25% of the theoretical cluster wide maximum disk >> throughput? >> >> >> -- Lars >> >> >> >> --

Re: Poor HBase map-reduce scan performance

2013-05-29 Thread Enis Söztutar
o 75% of the theoretical maximum read throughput; > hence with HBase you to 25% of the theoretical cluster wide maximum disk > throughput? > > > -- Lars > > > > - Original Message - > From: Bryan Keller > To: user@hbase.apache.org > Cc: > Sent: Friday

Re: Poor HBase map-reduce scan performance

2013-05-24 Thread lars hofhansl
to do some profiling myself if there is an easy way to >> generate data of similar shape. >> >> -- Lars >> >> >> >> ____________________ >> From: Bryan Keller >> To: user@hbase.apache.org >> Sent: Friday, May 3, 2013 3:44 A

Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Ted Yu
Thanks for the update, Sandy. If you can open a JIRA and attach your producer / consumer scanner there, that would be great. On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt wrote: > I wrote myself a Scanner wrapper that uses a producer/consumer queue to > keep the client fed with a full buffer as

Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Sandy Pratt
I wrote myself a Scanner wrapper that uses a producer/consumer queue to keep the client fed with a full buffer as much as possible. When scanning my table with scanner caching at 100 records, I see about a 24% uplift in performance (~35k records/sec with the ClientScanner and ~44k records/sec with

Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Bryan Keller
e ends up being just >>>>> barely larger than the table, either compressed or uncompressed >>>>> >>>>> So in sum, compression slows down I/O 3x, but the file is 3x smaller so >>>>> the time to scan is about the same. Adding in HBase slows things

Re: Poor HBase map-reduce scan performance

2013-05-22 Thread Ted Yu
I/O scanning an uncompressed > >>sequence > >> >file vs scanning a compressed table. > >> > > >> > > >> >On May 8, 2013, at 10:15 AM, Bryan Keller wrote: > >> > > >> >> Thanks for the offer Lars! I haven't made much progress speeding > >>things > >> >>up. > >

Re: Poor HBase map-reduce scan performance

2013-05-22 Thread Sandy Pratt
test program that populates a table that is >> >>similar to my production dataset. I have a readme that should describe >> >>things, hopefully enough to make it useable. There is code to >>populate a >> >>test table, code to scan the table, and code to scan sequence files >>from >> >>an export (to c

Re: Poor HBase map-reduce scan performance

2013-05-22 Thread Ted Yu
ou can find the code here: > >> > >> https://dl.dropboxusercontent.com/u/6880177/hbasetest.zip > >> > >> > >> On May 4, 2013, at 6:33 PM, lars hofhansl wrote: > >> > >>> The blockbuffers are not reused, but that by itself should not be a >

Re: Poor HBase map-reduce scan performance

2013-05-22 Thread Sandy Pratt
g sessions). >>> >>> My offer still stands to do some profiling myself if there is an easy >>>way to generate data of similar shape. >>> >>> -- Lars >>> >>> >>> >>> >>> From: Brya

Re: Poor HBase map-reduce scan performance

2013-05-10 Thread Bryan Keller
myself if there is an easy way to >> generate data of similar shape. >> >> -- Lars >> >> >> >> ____________ >> From: Bryan Keller >> To: user@hbase.apache.org >> Sent: Friday, May 3, 2013 3:44 AM >> Subject: Re: Poor HBase map-reduce s

Re: Poor HBase map-reduce scan performance

2013-05-08 Thread Bryan Keller
lar shape. > > -- Lars > > > > > From: Bryan Keller > To: user@hbase.apache.org > Sent: Friday, May 3, 2013 3:44 AM > Subject: Re: Poor HBase map-reduce scan performance > > > Actually I'm not too confident in my resul

Re: Poor HBase map-reduce scan performance

2013-05-05 Thread Michael Segel
except for blocks read from disk (and they should all >> be the same size, thus allocation should be cheap). >> >> -- Lars >> >> >> >> >> From: Bryan Keller >> To: user@hbase.apache.org >> Sent: Thursday,

Re: Poor HBase map-reduce scan performance

2013-05-04 Thread lars hofhansl
. -- Lars From: Bryan Keller To: user@hbase.apache.org Sent: Friday, May 3, 2013 3:44 AM Subject: Re: Poor HBase map-reduce scan performance Actually I'm not too confident in my results re block size, they may have been related to major compaction. I'

Re: Poor HBase map-reduce scan performance

2013-05-03 Thread Bryan Keller
k (and they should all >> be the same size, thus allocation should be cheap). >> >> -- Lars >> >> >> >> >> From: Bryan Keller >> To: user@hbase.apache.org >> Sent: Thursday, May 2, 2013 10:5

Re: Poor HBase map-reduce scan performance

2013-05-03 Thread Bryan Keller
t; > > > ________________ > From: Bryan Keller > To: user@hbase.apache.org > Sent: Thursday, May 2, 2013 10:54 AM > Subject: Re: Poor HBase map-reduce scan performance > > > I ran one of my regionservers through VisualVM. It looks like the top ho

Re: Poor HBase map-reduce scan performance

2013-05-02 Thread lars hofhansl
yan Keller To: user@hbase.apache.org Sent: Thursday, May 2, 2013 10:54 AM Subject: Re: Poor HBase map-reduce scan performance I ran one of my regionservers through VisualVM. It looks like the top hot spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). It appears at fi

Re: Poor HBase map-reduce scan performance

2013-05-02 Thread Nicolas Liochon
>> -- Lars > >> > >> > >> > >> - Original Message ----- > >> From: Bryan Keller > >> To: "user@hbase.apache.org" > >> Cc: > >> Sent: Wednesday, May 1, 2013 6:01 PM > >> Subject: Re: Poor HBase

Re: Poor HBase map-reduce scan performance

2013-05-02 Thread Bryan Keller
- Original Message - >> From: Bryan Keller >> To: "user@hbase.apache.org" >> Cc: >> Sent: Wednesday, May 1, 2013 6:01 PM >> Subject: Re: Poor HBase map-reduce scan performance >> >> I tried running my test with 0.94.4, unfortunately performance was abou

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
nal Message - > From: Bryan Keller > To: "user@hbase.apache.org" > Cc: > Sent: Wednesday, May 1, 2013 6:01 PM > Subject: Re: Poor HBase map-reduce scan performance > > I tried running my test with 0.94.4, unfortunately performance was about the > same. I'

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread lars hofhansl
send along a simple patch to pom.xml to do that. > > -- Lars > > > > >  From: Bryan Keller > To: user@hbase.apache.org > Sent: Tuesday, April 30, 2013 11:02 PM > Subject: Re: Poor HBase map-reduce scan performance > > > The table has hashed keys

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
a simple patch to pom.xml to do that. > > -- Lars > > > > > From: Bryan Keller > To: user@hbase.apache.org > Sent: Tuesday, April 30, 2013 11:02 PM > Subject: Re: Poor HBase map-reduce scan performance > > > The table has hashed keys so rows are eve

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
rsion of >>> Hadoop. I can send along a simple patch to pom.xml to do that. >>> >>> -- Lars >>> >>> >>> >>> >>> From: Bryan Keller >>> To: user@hbase.apache.org >>> Sent:

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
simple patch to pom.xml to do that. > > -- Lars > > > > > From: Bryan Keller > > To: user@hbase.apache.org > Sent: Tuesday, April 30, 2013 11:02 PM > Subject: Re: Poor HBase map-reduce scan performance > > > The table has hashed keys

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Michael Segel
it with the single RS setup) and see where it is >> bottlenecked. >> (And if you send me a sample program to generate some data - not 700g, >> though :) - I'll try to do a bit of profiling during the next days as my day >> job permits, but I do not have any machines with SSDs).

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Jean-Marc Spaggiari
> > > > > It's not hard to build the latest HBase against Cloudera's version of > > Hadoop. I can send along a simple patch to pom.xml to do that. > > > > -- Lars > > > > > > > > ____________ > > From:

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread ramkrishna vasudevan
t; > >> > > > In your case - since you have many columns, each of which carry the >> > > rowkey - you might benefit a lot from HBASE-7279. >> > > > >> > > > In the end HBase *is* slower than straight HDFS for full scans. How >> > >

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread ramkrishna vasudevan
ows and/or large key portions. That in turns makes scans scale better > > > across cores, since RAM is shared resource between cores (much like > > disk). > > > > > > > > > It's not hard to build the latest HBase against Cloudera's version of > >

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Naidu MS
t hard to build the latest HBase against Cloudera's version of > > Hadoop. I can send along a simple patch to pom.xml to do that. > > > > -- Lars > > > > > > > > ________________ > > From: Bryan Keller > > To: user@hbas

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Matt Corgan
__ > From: Bryan Keller > To: user@hbase.apache.org > Sent: Tuesday, April 30, 2013 11:02 PM > Subject: Re: Poor HBase map-reduce scan performance > > > The table has hashed keys so rows are evenly distributed amongst the > regionservers, and load on each regionserver i

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread lars hofhansl
(And if you send me a sample program to generate some data - not 700g, though > :) - I'll try to do a bit of profiling during the next days as my day job > permits, but I do not have any machines with SSDs). > > -- Lars > > > > > _______

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
mple program to generate some data - not 700g, though > :) - I'll try to do a bit of profiling during the next days as my day job > permits, but I do not have any machines with SSDs). > > -- Lars > > > > > ________________ > From: Bryan Kelle

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread lars hofhansl
:) - I'll try to do a bit of profiling during the next days as my day job permits, but I do not have any machines with SSDs). -- Lars From: Bryan Keller To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 9:31 PM Subject: Re: Poor HBase map-reduce s

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
Yes, I have it enabled (forgot to mention that). On Apr 30, 2013, at 9:56 PM, Ted Yu wrote: > Have you tried enabling short circuit read ? > > Thanks > > On Apr 30, 2013, at 9:31 PM, Bryan Keller wrote: > >> Yes, I have tried various settings for setCaching() and I have >> setCacheBlocks(fa

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Ted Yu
Have you tried enabling short circuit read ? Thanks On Apr 30, 2013, at 9:31 PM, Bryan Keller wrote: > Yes, I have tried various settings for setCaching() and I have > setCacheBlocks(false) > > On Apr 30, 2013, at 9:17 PM, Ted Yu wrote: > >> From http://hbase.apache.org/book.html#mapreduce.

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
Yes, I have tried various settings for setCaching() and I have setCacheBlocks(false) On Apr 30, 2013, at 9:17 PM, Ted Yu wrote: > From http://hbase.apache.org/book.html#mapreduce.example : > > scan.setCaching(500);// 1 is the default in Scan, which will > be bad for MapReduce jobs > sc

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Ted Yu
>From http://hbase.apache.org/book.html#mapreduce.example : scan.setCaching(500);// 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs I guess you have used the above setting. 0.94.x releases are compatible. Have y