Setting the log level to trace helps, but overall, lack of "traditional" db
metrics has been a huge pain point for us as well.
On Wed, Sep 14, 2016 at 10:04 AM, Josh Elser wrote:
> Nope! My test harness (the github repo) doesn't show any noticeable
> difference between
the size of the table.
>>
>> -Original Message-
>> From: Josh Elser [mailto:josh.el...@gmail.com]
>> Sent: Monday, September 12, 2016 2:43 PM
>> To: user@accumulo.apache.org
>> Subject: Re: Accumulo Seek performance
>>
>> What does a "large
of the table.
>
> -Original Message-
> From: Josh Elser [mailto:josh.el...@gmail.com]
> Sent: Monday, September 12, 2016 2:43 PM
> To: user@accumulo.apache.org
> Subject: Re: Accumulo Seek performance
>
> What does a "large scan" mean here, Dan?
>
> Sven
: Accumulo Seek performance
What does a "large scan" mean here, Dan?
Sven's original problem statement was running many small/pointed Ranges
(e.g. point lookups). My observation was that BatchScanners were slower
than running each in a Scanner when using multiple BS's concurrently.
Dan
M
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance
I had increased the readahead threed pool to 32 (from 16). I had also
increased the minimum thread pool size from 20 to 40. I had 10 tablets
with the data block cache turned on (probably only 256M tho).
Each tablet had a single file
Note I was running a single tserver, datanode, and zookeeper on my workstation.
On Mon, Sep 12, 2016 at 2:02 PM, Keith Turner wrote:
> Josh helped me get up and running w/ YCSB and I Am seeing very
> different results. I am going to make a pull req to Josh's GH repo
> to add
Josh helped me get up and running w/ YCSB and I Am seeing very
different results. I am going to make a pull req to Josh's GH repo
to add a Readme w/ what I learned from Josh in IRC.
The link below is the Accumulo config I used for running a local 1.8.0 instance.
:42 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance
I had increased the readahead threed pool to 32 (from 16). I had also
increased the minimum thread pool size from 20 to 40. I had 10 tablets
with the data block cache turned on (probably only 256M tho).
Each tablet had
I had increased the readahead threed pool to 32 (from 16). I had also
increased the minimum thread pool size from 20 to 40. I had 10 tablets
with the data block cache turned on (probably only 256M tho).
Each tablet had a single file (manually compacted). Did not observe
cache rates.
I've
Sorry, Monday morning poor reading skills, I guess. :)
So, 3000 ranges in 40 seconds with the BatchScanner. In my past experience
HDFS seeks tend to take something like 10-100ms, and I would expect that
time to dominate here. With 60 client threads your bottleneck should be the
readahead pool,
5 iterations, figured that would be apparent from the log messages :)
The code is already posted in my original message.
Adam Fuchs wrote:
Josh,
Two questions:
1. How many iterations did you do? I would like to see an absolute
number of lookups per second to compare against other
Josh,
Two questions:
1. How many iterations did you do? I would like to see an absolute number
of lookups per second to compare against other observations.
2. Can you post your code somewhere so I can run it?
Thanks,
Adam
On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
Keith Turner wrote:
On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser wrote:
> Good call. I kind of forgot about BatchScanner threads and trying to factor
> those in:). I guess doing one thread in the BatchScanners would be more
> accurate.
>
> Although, I only had one
:01 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance
Sven, et al:
So, it would appear that I have been able to reproduce this one (better
late than never, I guess...). tl;dr Serially using Scanners to do point
lookups instead of a BatchScanner is ~20x faster. This sounds like
@accumulo.apache.org
Subject: Re: Accumulo Seek performance
Good call. I kind of forgot about BatchScanner threads and trying to
factor those in :). I guess doing one thread in the BatchScanners would
be more accurate.
Although, I only had one TServer, so I don't *think* there would be any
On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser wrote:
> Good call. I kind of forgot about BatchScanner threads and trying to factor
> those in :). I guess doing one thread in the BatchScanners would be more
> accurate.
>
> Although, I only had one TServer, so I don't *think*
Good call. I kind of forgot about BatchScanner threads and trying to
factor those in :). I guess doing one thread in the BatchScanners would
be more accurate.
Although, I only had one TServer, so I don't *think* there would be any
difference. I don't believe we have concurrent requests from
Is this a problem specific to 1.8.0, or is it likely to affect earlier versions?
-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Saturday, September 10, 2016 6:01 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance
Sven, et al:
So, it would
Nice setup Josh. Thank you for putting together the tests. A few
questions:
The serial scanner implementation uses 6 threads: one for each thread in
the thread pool.
The batch scanner implementation uses 60 threads: 10 for each thread in the
thread pool, since the BatchScanner was configured
Sven, et al:
So, it would appear that I have been able to reproduce this one (better
late than never, I guess...). tl;dr Serially using Scanners to do point
lookups instead of a BatchScanner is ~20x faster. This sounds like a
pretty serious performance issue to me.
Here's a general outline
nt of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hod...@scai.fraunhofer.de
> www.scai.fraunhofer.de
>
> - Ursprüngliche Mail -
> > Von: "Keith Turner" <ke...@deenlo.com>
> > An: "user" <user@accumulo.apache.org&
h Turner" <ke...@deenlo.com>
> An: "user" <user@accumulo.apache.org>
> Gesendet: Montag, 29. August 2016 22:37:32
> Betreff: Re: Accumulo Seek performance
> On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
> <sven.hod...@scai.fraunhofer.de> wrote:
>>
On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
wrote:
> Hi there,
>
> currently we're experimenting with a two node Accumulo cluster (two tablet
> servers) setup for document storage.
> This documents are decomposed up to the sentence level.
>
> Now I'm using a
Sven,
Strange results. BatchScanners most definitely can be processed in
parallel by the tabletservers.
There is a dynamically resizing threadpool in the TabletServers that
respond to load on the system. As the pool remains full, it will grow.
As it remains empty, it will shrink.
A few
user@accumulo.apache.org>
> Sent: Thursday, August 25, 2016 9:42:00 AM
> Subject: Re: Accumulo Seek performance
>
> Hi dlmarion,
>
> toList should also call iterator(), and that is done in independently for each
> batch scanner iterator in the context of the Future.
>
>
Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de
- Ursprüngliche Mail -
> Von: dlmar...@comcast.net
> An: "user" <user@accumulo.apache.org>
> Gesendet: Donnerstag, 25. August 2016 14:34:39
> Betreff: Re: Accumulo
<user@accumulo.apache.org>
Sent: Thursday, August 25, 2016 4:53:41 AM
Subject: Re: Accumulo Seek performance
Hi,
I've changed the code a little bit, so that it uses a thread pool (via the
Future):
val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will
be cr
ankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de
- Ursprüngliche Mail -
> Von: "Josh Elser" <josh.el...@gmail.com>
> An: "user" <user@accumulo.apache.org>
> Gesendet: Mittwoch, 24. August 2016 18:36:42
> Betreff: Re:
Doesn't this use the 6 batch scanners serially?
- Original Message -
From: "Sven Hodapp" <sven.hod...@scai.fraunhofer.de>
To: "user" <user@accumulo.apache.org>
Sent: Wednesday, August 24, 2016 11:56:14 AM
Subject: Re: Accumulo Seek performance
ai.fraunhofer.de
- Ursprüngliche Mail -
> Von: "Josh Elser" <josh.el...@gmail.com>
> An: "user" <user@accumulo.apache.org>
> Gesendet: Mittwoch, 24. August 2016 16:33:37
> Betreff: Re: Accumulo Seek performance
> This reminded me of https://issues.a
This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
I don't feel like 3000 ranges is too many, but this isn't quantitative.
IIRC, the BatchScanner will take each Range you provide, bin each Range
to the TabletServer(s) currently hosting the corresponding data, clip
Hi there,
currently we're experimenting with a two node Accumulo cluster (two tablet
servers) setup for document storage.
This documents are decomposed up to the sentence level.
Now I'm using a BatchScanner to assemble the full document like this:
val bscan =
32 matches
Mail list logo