As an aside, this is actually pretty relevant to the work I've been doing
for Presto/Accumulo integration. It isn't uncommon to have around a
million exact Ranges (that is, Ranges with a single row ID) spread across
the five Presto worker nodes we use for scanning Accumulo. Right now,
these
I think the 450 ranges returned a total of about 7.5M entries, but the ranges
were in fact quite small relative to the size of the table.
-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Monday, September 12, 2016 2:43 PM
To: user@accumulo.apache.org
Subject: Re:
What does a "large scan" mean here, Dan?
Sven's original problem statement was running many small/pointed Ranges
(e.g. point lookups). My observation was that BatchScanners were slower
than running each in a Scanner when using multiple BS's concurrently.
Dan Blum wrote:
I tested a large
Note I was running a single tserver, datanode, and zookeeper on my workstation.
On Mon, Sep 12, 2016 at 2:02 PM, Keith Turner wrote:
> Josh helped me get up and running w/ YCSB and I Am seeing very
> different results. I am going to make a pull req to Josh's GH repo
> to add
Josh helped me get up and running w/ YCSB and I Am seeing very
different results. I am going to make a pull req to Josh's GH repo
to add a Readme w/ what I learned from Josh in IRC.
The link below is the Accumulo config I used for running a local 1.8.0 instance.
I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using
Scanners was much slower than using a BatchScanner with 11 threads, by about a
5:1 ratio. There were 450 ranges.
-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Monday, September 12, 2016
I had increased the readahead threed pool to 32 (from 16). I had also
increased the minimum thread pool size from 20 to 40. I had 10 tablets
with the data block cache turned on (probably only 256M tho).
Each tablet had a single file (manually compacted). Did not observe
cache rates.
I've
Sorry, Monday morning poor reading skills, I guess. :)
So, 3000 ranges in 40 seconds with the BatchScanner. In my past experience
HDFS seeks tend to take something like 10-100ms, and I would expect that
time to dominate here. With 60 client threads your bottleneck should be the
readahead pool,
5 iterations, figured that would be apparent from the log messages :)
The code is already posted in my original message.
Adam Fuchs wrote:
Josh,
Two questions:
1. How many iterations did you do? I would like to see an absolute
number of lookups per second to compare against other
Josh,
Two questions:
1. How many iterations did you do? I would like to see an absolute number
of lookups per second to compare against other observations.
2. Can you post your code somewhere so I can run it?
Thanks,
Adam
On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
Keith Turner wrote:
On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser wrote:
> Good call. I kind of forgot about BatchScanner threads and trying to factor
> those in:). I guess doing one thread in the BatchScanners would be more
> accurate.
>
> Although, I only had one
I don't have enough context to say definitively, but I'd assume earlier
versions too.
Dan Blum wrote:
Is this a problem specific to 1.8.0, or is it likely to affect earlier versions?
-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Saturday, September 10, 2016
I am not sure - my recollection is that the 1.6.x code capped the number of
threads requested at 1 per tablet (covered by the requested ranges), not 1 per
tablet server.
-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Monday, September 12, 2016 10:58 AM
To:
On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser wrote:
> Good call. I kind of forgot about BatchScanner threads and trying to factor
> those in :). I guess doing one thread in the BatchScanners would be more
> accurate.
>
> Although, I only had one TServer, so I don't *think*
Good call. I kind of forgot about BatchScanner threads and trying to
factor those in :). I guess doing one thread in the BatchScanners would
be more accurate.
Although, I only had one TServer, so I don't *think* there would be any
difference. I don't believe we have concurrent requests from
Is this a problem specific to 1.8.0, or is it likely to affect earlier versions?
-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Saturday, September 10, 2016 6:01 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance
Sven, et al:
So, it would
Nice setup Josh. Thank you for putting together the tests. A few
questions:
The serial scanner implementation uses 6 threads: one for each thread in
the thread pool.
The batch scanner implementation uses 60 threads: 10 for each thread in the
thread pool, since the BatchScanner was configured
17 matches
Mail list logo