Re: Accumulo Seek performance

2016-09-12 Thread Adam J. Shook
As an aside, this is actually pretty relevant to the work I've been doing for Presto/Accumulo integration. It isn't uncommon to have around a million exact Ranges (that is, Ranges with a single row ID) spread across the five Presto worker nodes we use for scanning Accumulo. Right now, these

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
I think the 450 ranges returned a total of about 7.5M entries, but the ranges were in fact quite small relative to the size of the table. -Original Message- From: Josh Elser [mailto:josh.el...@gmail.com] Sent: Monday, September 12, 2016 2:43 PM To: user@accumulo.apache.org Subject: Re:

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
What does a "large scan" mean here, Dan? Sven's original problem statement was running many small/pointed Ranges (e.g. point lookups). My observation was that BatchScanners were slower than running each in a Scanner when using multiple BS's concurrently. Dan Blum wrote: I tested a large

Re: Accumulo Seek performance

2016-09-12 Thread Keith Turner
Note I was running a single tserver, datanode, and zookeeper on my workstation. On Mon, Sep 12, 2016 at 2:02 PM, Keith Turner wrote: > Josh helped me get up and running w/ YCSB and I Am seeing very > different results. I am going to make a pull req to Josh's GH repo > to add

Re: Accumulo Seek performance

2016-09-12 Thread Keith Turner
Josh helped me get up and running w/ YCSB and I Am seeing very different results. I am going to make a pull req to Josh's GH repo to add a Readme w/ what I learned from Josh in IRC. The link below is the Accumulo config I used for running a local 1.8.0 instance.

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using Scanners was much slower than using a BatchScanner with 11 threads, by about a 5:1 ratio. There were 450 ranges. -Original Message- From: Josh Elser [mailto:josh.el...@gmail.com] Sent: Monday, September 12, 2016

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
I had increased the readahead threed pool to 32 (from 16). I had also increased the minimum thread pool size from 20 to 40. I had 10 tablets with the data block cache turned on (probably only 256M tho). Each tablet had a single file (manually compacted). Did not observe cache rates. I've

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
Sorry, Monday morning poor reading skills, I guess. :) So, 3000 ranges in 40 seconds with the BatchScanner. In my past experience HDFS seeks tend to take something like 10-100ms, and I would expect that time to dominate here. With 60 client threads your bottleneck should be the readahead pool,

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
5 iterations, figured that would be apparent from the log messages :) The code is already posted in my original message. Adam Fuchs wrote: Josh, Two questions: 1. How many iterations did you do? I would like to see an absolute number of lookups per second to compare against other

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
Josh, Two questions: 1. How many iterations did you do? I would like to see an absolute number of lookups per second to compare against other observations. 2. Can you post your code somewhere so I can run it? Thanks, Adam On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
Keith Turner wrote: On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser wrote: > Good call. I kind of forgot about BatchScanner threads and trying to factor > those in:). I guess doing one thread in the BatchScanners would be more > accurate. > > Although, I only had one

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
I don't have enough context to say definitively, but I'd assume earlier versions too. Dan Blum wrote: Is this a problem specific to 1.8.0, or is it likely to affect earlier versions? -Original Message- From: Josh Elser [mailto:josh.el...@gmail.com] Sent: Saturday, September 10, 2016

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
I am not sure - my recollection is that the 1.6.x code capped the number of threads requested at 1 per tablet (covered by the requested ranges), not 1 per tablet server. -Original Message- From: Josh Elser [mailto:josh.el...@gmail.com] Sent: Monday, September 12, 2016 10:58 AM To:

Re: Accumulo Seek performance

2016-09-12 Thread Keith Turner
On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser wrote: > Good call. I kind of forgot about BatchScanner threads and trying to factor > those in :). I guess doing one thread in the BatchScanners would be more > accurate. > > Although, I only had one TServer, so I don't *think*

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
Good call. I kind of forgot about BatchScanner threads and trying to factor those in :). I guess doing one thread in the BatchScanners would be more accurate. Although, I only had one TServer, so I don't *think* there would be any difference. I don't believe we have concurrent requests from

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
Is this a problem specific to 1.8.0, or is it likely to affect earlier versions? -Original Message- From: Josh Elser [mailto:josh.el...@gmail.com] Sent: Saturday, September 10, 2016 6:01 PM To: user@accumulo.apache.org Subject: Re: Accumulo Seek performance Sven, et al: So, it would

Re: Accumulo Seek performance

2016-09-12 Thread Dylan Hutchison
Nice setup Josh. Thank you for putting together the tests. A few questions: The serial scanner implementation uses 6 threads: one for each thread in the thread pool. The batch scanner implementation uses 60 threads: 10 for each thread in the thread pool, since the BatchScanner was configured