013 at 3:59 AM, lars hofhansl wrote:
>
>> Absolutely.
>>
>>
>>
>> - Original Message -
>> From: Ted Yu
>> To: user@hbase.apache.org
>> Cc:
>> Sent: Sunday, June 30, 2013 9:32 PM
>> Subject: Re: Poor HBase map-reduce scan pe
On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl wrote:
> Absolutely.
>
>
>
> - Original Message -
> From: Ted Yu
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, June 30, 2013 9:32 PM
> Subject: Re: Poor HBase map-reduce scan performance
>
> Looking a
Absolutely.
- Original Message -
From: Ted Yu
To: user@hbase.apache.org
Cc:
Sent: Sunday, June 30, 2013 9:32 PM
Subject: Re: Poor HBase map-reduce scan performance
Looking at the tail of HBASE-8369, there were some comments which are yet
to be addressed.
I think trunk patch should
; > To: user@hbase.apache.org; lars hofhansl
> > Cc:
> > Sent: Tuesday, June 25, 2013 1:56 AM
> > Subject: Re: Poor HBase map-reduce scan performance
> >
> > I tweaked Enis's snapshot input format and backported it to 0.94.6 and
> have snapshot scanning fu
--
> From: Bryan Keller
> To: user@hbase.apache.org; lars hofhansl
> Cc:
> Sent: Tuesday, June 25, 2013 1:56 AM
> Subject: Re: Poor HBase map-reduce scan performance
>
> I tweaked Enis's snapshot input format and backported it to 0.94.6 and have
> snapshot scanning func
__
> From: Sandy Pratt
> To: "user@hbase.apache.org"
> Sent: Wednesday, June 5, 2013 10:58 AM
> Subject: Re: Poor HBase map-reduce scan performance
>
>
> Yong,
>
> As a thought experiment, imagine how it impacts the throughput of TCP
__
> From: Sandy Pratt
> To: "user@hbase.apache.org"
> Sent: Wednesday, June 5, 2013 10:58 AM
> Subject: Re: Poor HBase map-reduce scan performance
>
>
> Yong,
>
> As a thought experiment, imagine how it impacts the throughput of TCP
Lars
From: Sandy Pratt
To: "user@hbase.apache.org"
Sent: Wednesday, June 5, 2013 10:58 AM
Subject: Re: Poor HBase map-reduce scan performance
Yong,
As a thought experiment, imagine how it impacts the throughput of TCP to
keep the window size at 1. That means there'
That's my understanding of how the current scan API works, yes. The
client calls next() to fetch a batch. While it's waiting for the response
from the server, it blocks. After the server responds to the next() call,
it does nothing for that scanner until the following next() call. That
makes fo
Dear Sandy,
Thanks for your explanation.
However, what I don't get is your term "client", is this "client" means
MapReduce jobs? If I understand you right, this means Map function will
process the tuples and during this processing time, the regionserver did
nothing?
regards!
Yong
On Wed, Jun
Yong,
As a thought experiment, imagine how it impacts the throughput of TCP to
keep the window size at 1. That means there's only one packet in flight
at a time, and total throughput is a fraction of what it could be.
That's effectively what happens with RPC. The server sends a batch, then
does
bq. the Regionserver and Tasktracker are the same node when you use
MapReduce to scan the HBase table.
The scan performed by the Tasktracker on that node would very likely access
data hosted by region server on other node(s). So there would be RPC
involved.
There is some discussion on providing s
Can anyone explain why client + rpc + server will decrease the performance
of scanning? I mean the Regionserver and Tasktracker are the same node when
you use MapReduce to scan the HBase table. So, in my understanding, there
will be no rpc cost.
Thanks!
Yong
On Wed, Jun 5, 2013 at 10:09 AM, San
https://issues.apache.org/jira/browse/HBASE-8691
On 6/4/13 6:11 PM, "Sandy Pratt" wrote:
>Haven't had a chance to write a JIRA yet, but I thought I'd pop in here
>with an update in the meantime.
>
>I tried a number of different approaches to eliminate latency and
>"bubbles" in the scan pipeline
Haven't had a chance to write a JIRA yet, but I thought I'd pop in here
with an update in the meantime.
I tried a number of different approaches to eliminate latency and
"bubbles" in the scan pipeline, and eventually arrived at adding a
streaming scan API to the region server, along with refactori
not (yet) a strength of HBase.
>>
>> So with HDFS you get to 75% of the theoretical maximum read throughput;
>> hence with HBase you to 25% of the theoretical cluster wide maximum disk
>> throughput?
>>
>>
>> -- Lars
>>
>>
>>
>> --
o 75% of the theoretical maximum read throughput;
> hence with HBase you to 25% of the theoretical cluster wide maximum disk
> throughput?
>
>
> -- Lars
>
>
>
> - Original Message -
> From: Bryan Keller
> To: user@hbase.apache.org
> Cc:
> Sent: Friday
to do some profiling myself if there is an easy way to
>> generate data of similar shape.
>>
>> -- Lars
>>
>>
>>
>> ____________________
>> From: Bryan Keller
>> To: user@hbase.apache.org
>> Sent: Friday, May 3, 2013 3:44 A
Thanks for the update, Sandy.
If you can open a JIRA and attach your producer / consumer scanner there,
that would be great.
On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt wrote:
> I wrote myself a Scanner wrapper that uses a producer/consumer queue to
> keep the client fed with a full buffer as
I wrote myself a Scanner wrapper that uses a producer/consumer queue to
keep the client fed with a full buffer as much as possible. When scanning
my table with scanner caching at 100 records, I see about a 24% uplift in
performance (~35k records/sec with the ClientScanner and ~44k records/sec
with
e ends up being just
>>>>> barely larger than the table, either compressed or uncompressed
>>>>>
>>>>> So in sum, compression slows down I/O 3x, but the file is 3x smaller so
>>>>> the time to scan is about the same. Adding in HBase slows things
I/O scanning an uncompressed
> >>sequence
> >> >file vs scanning a compressed table.
> >> >
> >> >
> >> >On May 8, 2013, at 10:15 AM, Bryan Keller wrote:
> >> >
> >> >> Thanks for the offer Lars! I haven't made much progress speeding
> >>things
> >> >>up.
> >
test program that populates a table that is
>> >>similar to my production dataset. I have a readme that should describe
>> >>things, hopefully enough to make it useable. There is code to
>>populate a
>> >>test table, code to scan the table, and code to scan sequence files
>>from
>> >>an export (to c
ou can find the code here:
> >>
> >> https://dl.dropboxusercontent.com/u/6880177/hbasetest.zip
> >>
> >>
> >> On May 4, 2013, at 6:33 PM, lars hofhansl wrote:
> >>
> >>> The blockbuffers are not reused, but that by itself should not be a
>
g sessions).
>>>
>>> My offer still stands to do some profiling myself if there is an easy
>>>way to generate data of similar shape.
>>>
>>> -- Lars
>>>
>>>
>>>
>>>
>>> From: Brya
myself if there is an easy way to
>> generate data of similar shape.
>>
>> -- Lars
>>
>>
>>
>> ____________
>> From: Bryan Keller
>> To: user@hbase.apache.org
>> Sent: Friday, May 3, 2013 3:44 AM
>> Subject: Re: Poor HBase map-reduce s
lar shape.
>
> -- Lars
>
>
>
>
> From: Bryan Keller
> To: user@hbase.apache.org
> Sent: Friday, May 3, 2013 3:44 AM
> Subject: Re: Poor HBase map-reduce scan performance
>
>
> Actually I'm not too confident in my resul
except for blocks read from disk (and they should all
>> be the same size, thus allocation should be cheap).
>>
>> -- Lars
>>
>>
>>
>>
>> From: Bryan Keller
>> To: user@hbase.apache.org
>> Sent: Thursday,
.
-- Lars
From: Bryan Keller
To: user@hbase.apache.org
Sent: Friday, May 3, 2013 3:44 AM
Subject: Re: Poor HBase map-reduce scan performance
Actually I'm not too confident in my results re block size, they may have been
related to major compaction. I'
k (and they should all
>> be the same size, thus allocation should be cheap).
>>
>> -- Lars
>>
>>
>>
>>
>> From: Bryan Keller
>> To: user@hbase.apache.org
>> Sent: Thursday, May 2, 2013 10:5
t;
>
>
> ________________
> From: Bryan Keller
> To: user@hbase.apache.org
> Sent: Thursday, May 2, 2013 10:54 AM
> Subject: Re: Poor HBase map-reduce scan performance
>
>
> I ran one of my regionservers through VisualVM. It looks like the top ho
yan Keller
To: user@hbase.apache.org
Sent: Thursday, May 2, 2013 10:54 AM
Subject: Re: Poor HBase map-reduce scan performance
I ran one of my regionservers through VisualVM. It looks like the top hot spots
are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). It appears
at fi
>> -- Lars
> >>
> >>
> >>
> >> - Original Message -----
> >> From: Bryan Keller
> >> To: "user@hbase.apache.org"
> >> Cc:
> >> Sent: Wednesday, May 1, 2013 6:01 PM
> >> Subject: Re: Poor HBase
- Original Message -
>> From: Bryan Keller
>> To: "user@hbase.apache.org"
>> Cc:
>> Sent: Wednesday, May 1, 2013 6:01 PM
>> Subject: Re: Poor HBase map-reduce scan performance
>>
>> I tried running my test with 0.94.4, unfortunately performance was abou
nal Message -
> From: Bryan Keller
> To: "user@hbase.apache.org"
> Cc:
> Sent: Wednesday, May 1, 2013 6:01 PM
> Subject: Re: Poor HBase map-reduce scan performance
>
> I tried running my test with 0.94.4, unfortunately performance was about the
> same. I'
send along a simple patch to pom.xml to do that.
>
> -- Lars
>
>
>
>
> From: Bryan Keller
> To: user@hbase.apache.org
> Sent: Tuesday, April 30, 2013 11:02 PM
> Subject: Re: Poor HBase map-reduce scan performance
>
>
> The table has hashed keys
a simple patch to pom.xml to do that.
>
> -- Lars
>
>
>
>
> From: Bryan Keller
> To: user@hbase.apache.org
> Sent: Tuesday, April 30, 2013 11:02 PM
> Subject: Re: Poor HBase map-reduce scan performance
>
>
> The table has hashed keys so rows are eve
rsion of
>>> Hadoop. I can send along a simple patch to pom.xml to do that.
>>>
>>> -- Lars
>>>
>>>
>>>
>>>
>>> From: Bryan Keller
>>> To: user@hbase.apache.org
>>> Sent:
simple patch to pom.xml to do that.
>
> -- Lars
>
>
>
>
> From: Bryan Keller >
> To: user@hbase.apache.org
> Sent: Tuesday, April 30, 2013 11:02 PM
> Subject: Re: Poor HBase map-reduce scan performance
>
>
> The table has hashed keys
it with the single RS setup) and see where it is
>> bottlenecked.
>> (And if you send me a sample program to generate some data - not 700g,
>> though :) - I'll try to do a bit of profiling during the next days as my day
>> job permits, but I do not have any machines with SSDs).
>
> >
> > It's not hard to build the latest HBase against Cloudera's version of
> > Hadoop. I can send along a simple patch to pom.xml to do that.
> >
> > -- Lars
> >
> >
> >
> > ____________
> > From:
t; >
>> > > > In your case - since you have many columns, each of which carry the
>> > > rowkey - you might benefit a lot from HBASE-7279.
>> > > >
>> > > > In the end HBase *is* slower than straight HDFS for full scans. How
>> > >
ows and/or large key portions. That in turns makes scans scale better
> > > across cores, since RAM is shared resource between cores (much like
> > disk).
> > >
> > >
> > > It's not hard to build the latest HBase against Cloudera's version of
> >
t hard to build the latest HBase against Cloudera's version of
> > Hadoop. I can send along a simple patch to pom.xml to do that.
> >
> > -- Lars
> >
> >
> >
> > ________________
> > From: Bryan Keller
> > To: user@hbas
__
> From: Bryan Keller
> To: user@hbase.apache.org
> Sent: Tuesday, April 30, 2013 11:02 PM
> Subject: Re: Poor HBase map-reduce scan performance
>
>
> The table has hashed keys so rows are evenly distributed amongst the
> regionservers, and load on each regionserver i
(And if you send me a sample program to generate some data - not 700g, though
> :) - I'll try to do a bit of profiling during the next days as my day job
> permits, but I do not have any machines with SSDs).
>
> -- Lars
>
>
>
>
> _______
mple program to generate some data - not 700g, though
> :) - I'll try to do a bit of profiling during the next days as my day job
> permits, but I do not have any machines with SSDs).
>
> -- Lars
>
>
>
>
> ________________
> From: Bryan Kelle
:) - I'll try to do a bit of profiling during the next days as my day job
permits, but I do not have any machines with SSDs).
-- Lars
From: Bryan Keller
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 9:31 PM
Subject: Re: Poor HBase map-reduce s
Yes, I have it enabled (forgot to mention that).
On Apr 30, 2013, at 9:56 PM, Ted Yu wrote:
> Have you tried enabling short circuit read ?
>
> Thanks
>
> On Apr 30, 2013, at 9:31 PM, Bryan Keller wrote:
>
>> Yes, I have tried various settings for setCaching() and I have
>> setCacheBlocks(fa
Have you tried enabling short circuit read ?
Thanks
On Apr 30, 2013, at 9:31 PM, Bryan Keller wrote:
> Yes, I have tried various settings for setCaching() and I have
> setCacheBlocks(false)
>
> On Apr 30, 2013, at 9:17 PM, Ted Yu wrote:
>
>> From http://hbase.apache.org/book.html#mapreduce.
Yes, I have tried various settings for setCaching() and I have
setCacheBlocks(false)
On Apr 30, 2013, at 9:17 PM, Ted Yu wrote:
> From http://hbase.apache.org/book.html#mapreduce.example :
>
> scan.setCaching(500);// 1 is the default in Scan, which will
> be bad for MapReduce jobs
> sc
>From http://hbase.apache.org/book.html#mapreduce.example :
scan.setCaching(500);// 1 is the default in Scan, which will
be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
I guess you have used the above setting.
0.94.x releases are compatible. Have y
52 matches
Mail list logo