Re: Accumulo Seek performance

2016-09-14 Thread Michael Moss
Setting the log level to trace helps, but overall, lack of "traditional" db metrics has been a huge pain point for us as well. On Wed, Sep 14, 2016 at 10:04 AM, Josh Elser wrote: > Nope! My test harness (the github repo) doesn't show any noticeable > difference between

Re: Accumulo Seek performance

2016-09-13 Thread Keith Turner
the size of the table. >> >> -Original Message- >> From: Josh Elser [mailto:josh.el...@gmail.com] >> Sent: Monday, September 12, 2016 2:43 PM >> To: user@accumulo.apache.org >> Subject: Re: Accumulo Seek performance >> >> What does a "large

Re: Accumulo Seek performance

2016-09-12 Thread Adam J. Shook
of the table. > > -Original Message- > From: Josh Elser [mailto:josh.el...@gmail.com] > Sent: Monday, September 12, 2016 2:43 PM > To: user@accumulo.apache.org > Subject: Re: Accumulo Seek performance > > What does a "large scan" mean here, Dan? > > Sven

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
: Accumulo Seek performance What does a "large scan" mean here, Dan? Sven's original problem statement was running many small/pointed Ranges (e.g. point lookups). My observation was that BatchScanners were slower than running each in a Scanner when using multiple BS's concurrently. Dan

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
M To: user@accumulo.apache.org Subject: Re: Accumulo Seek performance I had increased the readahead threed pool to 32 (from 16). I had also increased the minimum thread pool size from 20 to 40. I had 10 tablets with the data block cache turned on (probably only 256M tho). Each tablet had a single file

Re: Accumulo Seek performance

2016-09-12 Thread Keith Turner
Note I was running a single tserver, datanode, and zookeeper on my workstation. On Mon, Sep 12, 2016 at 2:02 PM, Keith Turner wrote: > Josh helped me get up and running w/ YCSB and I Am seeing very > different results. I am going to make a pull req to Josh's GH repo > to add

Re: Accumulo Seek performance

2016-09-12 Thread Keith Turner
Josh helped me get up and running w/ YCSB and I Am seeing very different results. I am going to make a pull req to Josh's GH repo to add a Readme w/ what I learned from Josh in IRC. The link below is the Accumulo config I used for running a local 1.8.0 instance.

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
:42 PM To: user@accumulo.apache.org Subject: Re: Accumulo Seek performance I had increased the readahead threed pool to 32 (from 16). I had also increased the minimum thread pool size from 20 to 40. I had 10 tablets with the data block cache turned on (probably only 256M tho). Each tablet had

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
I had increased the readahead threed pool to 32 (from 16). I had also increased the minimum thread pool size from 20 to 40. I had 10 tablets with the data block cache turned on (probably only 256M tho). Each tablet had a single file (manually compacted). Did not observe cache rates. I've

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
Sorry, Monday morning poor reading skills, I guess. :) So, 3000 ranges in 40 seconds with the BatchScanner. In my past experience HDFS seeks tend to take something like 10-100ms, and I would expect that time to dominate here. With 60 client threads your bottleneck should be the readahead pool,

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
5 iterations, figured that would be apparent from the log messages :) The code is already posted in my original message. Adam Fuchs wrote: Josh, Two questions: 1. How many iterations did you do? I would like to see an absolute number of lookups per second to compare against other

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
Josh, Two questions: 1. How many iterations did you do? I would like to see an absolute number of lookups per second to compare against other observations. 2. Can you post your code somewhere so I can run it? Thanks, Adam On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
Keith Turner wrote: On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser wrote: > Good call. I kind of forgot about BatchScanner threads and trying to factor > those in:). I guess doing one thread in the BatchScanners would be more > accurate. > > Although, I only had one

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
:01 PM To: user@accumulo.apache.org Subject: Re: Accumulo Seek performance Sven, et al: So, it would appear that I have been able to reproduce this one (better late than never, I guess...). tl;dr Serially using Scanners to do point lookups instead of a BatchScanner is ~20x faster. This sounds like

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
@accumulo.apache.org Subject: Re: Accumulo Seek performance Good call. I kind of forgot about BatchScanner threads and trying to factor those in :). I guess doing one thread in the BatchScanners would be more accurate. Although, I only had one TServer, so I don't *think* there would be any

Re: Accumulo Seek performance

2016-09-12 Thread Keith Turner
On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser wrote: > Good call. I kind of forgot about BatchScanner threads and trying to factor > those in :). I guess doing one thread in the BatchScanners would be more > accurate. > > Although, I only had one TServer, so I don't *think*

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
Good call. I kind of forgot about BatchScanner threads and trying to factor those in :). I guess doing one thread in the BatchScanners would be more accurate. Although, I only had one TServer, so I don't *think* there would be any difference. I don't believe we have concurrent requests from

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
Is this a problem specific to 1.8.0, or is it likely to affect earlier versions? -Original Message- From: Josh Elser [mailto:josh.el...@gmail.com] Sent: Saturday, September 10, 2016 6:01 PM To: user@accumulo.apache.org Subject: Re: Accumulo Seek performance Sven, et al: So, it would

Re: Accumulo Seek performance

2016-09-12 Thread Dylan Hutchison
Nice setup Josh. Thank you for putting together the tests. A few questions: The serial scanner implementation uses 6 threads: one for each thread in the thread pool. The batch scanner implementation uses 60 threads: 10 for each thread in the thread pool, since the BatchScanner was configured

Re: Accumulo Seek performance

2016-09-10 Thread Josh Elser
Sven, et al: So, it would appear that I have been able to reproduce this one (better late than never, I guess...). tl;dr Serially using Scanners to do point lookups instead of a BatchScanner is ~20x faster. This sounds like a pretty serious performance issue to me. Here's a general outline

Re: Accumulo Seek performance

2016-08-31 Thread Dylan Hutchison
nt of Bioinformatics > Schloss Birlinghoven, 53754 Sankt Augustin, Germany > sven.hod...@scai.fraunhofer.de > www.scai.fraunhofer.de > > - Ursprüngliche Mail - > > Von: "Keith Turner" <ke...@deenlo.com> > > An: "user" <user@accumulo.apache.org&

Re: Accumulo Seek performance

2016-08-31 Thread Sven Hodapp
h Turner" <ke...@deenlo.com> > An: "user" <user@accumulo.apache.org> > Gesendet: Montag, 29. August 2016 22:37:32 > Betreff: Re: Accumulo Seek performance > On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp > <sven.hod...@scai.fraunhofer.de> wrote: >>

Re: Accumulo Seek performance

2016-08-29 Thread Keith Turner
On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp wrote: > Hi there, > > currently we're experimenting with a two node Accumulo cluster (two tablet > servers) setup for document storage. > This documents are decomposed up to the sentence level. > > Now I'm using a

Re: Accumulo Seek performance

2016-08-25 Thread Josh Elser
Sven, Strange results. BatchScanners most definitely can be processed in parallel by the tabletservers. There is a dynamically resizing threadpool in the TabletServers that respond to load on the system. As the pool remains full, it will grow. As it remains empty, it will shrink. A few

Re: Accumulo Seek performance

2016-08-25 Thread Sven Hodapp
user@accumulo.apache.org> > Sent: Thursday, August 25, 2016 9:42:00 AM > Subject: Re: Accumulo Seek performance > > Hi dlmarion, > > toList should also call iterator(), and that is done in independently for each > batch scanner iterator in the context of the Future. > >

Re: Accumulo Seek performance

2016-08-25 Thread Sven Hodapp
Birlinghoven, 53754 Sankt Augustin, Germany sven.hod...@scai.fraunhofer.de www.scai.fraunhofer.de - Ursprüngliche Mail - > Von: dlmar...@comcast.net > An: "user" <user@accumulo.apache.org> > Gesendet: Donnerstag, 25. August 2016 14:34:39 > Betreff: Re: Accumulo

Re: Accumulo Seek performance

2016-08-25 Thread dlmarion
<user@accumulo.apache.org> Sent: Thursday, August 25, 2016 4:53:41 AM Subject: Re: Accumulo Seek performance Hi, I've changed the code a little bit, so that it uses a thread pool (via the Future): val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will be cr

Re: Accumulo Seek performance

2016-08-25 Thread Sven Hodapp
ankt Augustin, Germany sven.hod...@scai.fraunhofer.de www.scai.fraunhofer.de - Ursprüngliche Mail - > Von: "Josh Elser" <josh.el...@gmail.com> > An: "user" <user@accumulo.apache.org> > Gesendet: Mittwoch, 24. August 2016 18:36:42 > Betreff: Re:

Re: Accumulo Seek performance

2016-08-24 Thread dlmarion
Doesn't this use the 6 batch scanners serially? - Original Message - From: "Sven Hodapp" <sven.hod...@scai.fraunhofer.de> To: "user" <user@accumulo.apache.org> Sent: Wednesday, August 24, 2016 11:56:14 AM Subject: Re: Accumulo Seek performance

Re: Accumulo Seek performance

2016-08-24 Thread Sven Hodapp
ai.fraunhofer.de - Ursprüngliche Mail - > Von: "Josh Elser" <josh.el...@gmail.com> > An: "user" <user@accumulo.apache.org> > Gesendet: Mittwoch, 24. August 2016 16:33:37 > Betreff: Re: Accumulo Seek performance > This reminded me of https://issues.a

Re: Accumulo Seek performance

2016-08-24 Thread Josh Elser
This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710 I don't feel like 3000 ranges is too many, but this isn't quantitative. IIRC, the BatchScanner will take each Range you provide, bin each Range to the TabletServer(s) currently hosting the corresponding data, clip

Accumulo Seek performance

2016-08-24 Thread Sven Hodapp
Hi there, currently we're experimenting with a two node Accumulo cluster (two tablet servers) setup for document storage. This documents are decomposed up to the sentence level. Now I'm using a BatchScanner to assemble the full document like this: val bscan =