Re: Accumulo Seek performance

2016-08-25 Thread Josh Elser

Sven,

Strange results. BatchScanners most definitely can be processed in 
parallel by the tabletservers.


There is a dynamically resizing threadpool in the TabletServers that 
respond to load on the system. As the pool remains full, it will grow. 
As it remains empty, it will shrink.


A few more questions: how many TabletServers do you have and did you run 
this benchmark multiple times in succession to see if the results 
changed? Also, have you tried increasing the number of threads per 
batchscanner to see if that makes a difference?


I might have to try to run a similar later today. I am curious :)

Sven Hodapp wrote:

Hi,

I've changed the code a little bit, so that it uses a thread pool (via the 
Future):

 val ranges500 = ranges.asScala.grouped(500)  // this means 6 BatchScanners 
will be created

 for (ranges<- ranges500) {
   val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
   bscan.setRanges(ranges.asJava)
   Future {
 time("mult-scanner") {
   bscan.asScala.toList  // toList forces the iteration of the iterator
 }
   }
 }

Here are the results:

 background log: info: mult-scanner time: 4807.289358 ms
 background log: info: mult-scanner time: 4930.996522 ms
 background log: info: mult-scanner time: 9510.010808 ms
 background log: info: mult-scanner time: 11394.152391 ms
 background log: info: mult-scanner time: 13297.247295 ms
 background log: info: mult-scanner time: 14032.704837 ms

 background log: info: single-scanner time: 15322.624393 ms

Every Future completes independent, but in return every batch scanner iterator 
needs more time to complete. :(
This means the batch scanners aren't really processed in parallel on the server 
side?
Should I reconfigure something? Maybe the tablet servers haven't/can't allocate 
enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory 
and a storage with ~300MB/s...)

Regards,
Sven



Re: Accumulo Seek performance

2016-08-25 Thread Sven Hodapp
Hi Dave,

toList will exhaust the iterator. But all 6 iterators will be concurrently 
exhausted within the Future object 
(http://docs.scala-lang.org/overviews/core/futures.html).

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de

- Ursprüngliche Mail -
> Von: dlmar...@comcast.net
> An: "user" 
> Gesendet: Donnerstag, 25. August 2016 16:22:35
> Betreff: Re: Accumulo Seek performance

> But does toList exhaust the first iterator() before going to the next?
> 
> - Dave
> 
> 
> - Original Message -
> 
> From: "Sven Hodapp" 
> To: "user" 
> Sent: Thursday, August 25, 2016 9:42:00 AM
> Subject: Re: Accumulo Seek performance
> 
> Hi dlmarion,
> 
> toList should also call iterator(), and that is done in independently for each
> batch scanner iterator in the context of the Future.
> 
> Regards,
> Sven
> 
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hod...@scai.fraunhofer.de
> www.scai.fraunhofer.de
> 
> - Ursprüngliche Mail -
>> Von: dlmar...@comcast.net
>> An: "user" 
>> Gesendet: Donnerstag, 25. August 2016 14:34:39
>> Betreff: Re: Accumulo Seek performance
> 
>> Calling BatchScanner.iterator() is what starts the work on the server side. 
>> You
>> should do this first for all 6 batch scanners, then iterate over all of them 
>> in
>> parallel.
>> 
>> - Original Message -
>> 
>> From: "Sven Hodapp" 
>> To: "user" 
>> Sent: Thursday, August 25, 2016 4:53:41 AM
>> Subject: Re: Accumulo Seek performance
>> 
>> Hi,
>> 
>> I've changed the code a little bit, so that it uses a thread pool (via the
>> Future):
>> 
>> val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners 
>> will
>> be created
>> 
>> for (ranges <- ranges500) {
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
>> bscan.setRanges(ranges.asJava)
>> Future {
>> time("mult-scanner") {
>> bscan.asScala.toList // toList forces the iteration of the iterator
>> }
>> }
>> }
>> 
>> Here are the results:
>> 
>> background log: info: mult-scanner time: 4807.289358 ms
>> background log: info: mult-scanner time: 4930.996522 ms
>> background log: info: mult-scanner time: 9510.010808 ms
>> background log: info: mult-scanner time: 11394.152391 ms
>> background log: info: mult-scanner time: 13297.247295 ms
>> background log: info: mult-scanner time: 14032.704837 ms
>> 
>> background log: info: single-scanner time: 15322.624393 ms
>> 
>> Every Future completes independent, but in return every batch scanner 
>> iterator
>> needs more time to complete. :(
>> This means the batch scanners aren't really processed in parallel on the 
>> server
>> side?
>> Should I reconfigure something? Maybe the tablet servers haven't/can't 
>> allocate
>> enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory
>> and a storage with ~300MB/s...)
>> 
>> Regards,
>> Sven
>> 
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> sven.hod...@scai.fraunhofer.de
>> www.scai.fraunhofer.de
>> 
>> - Ursprüngliche Mail -
>>> Von: "Josh Elser" 
>>> An: "user" 
>>> Gesendet: Mittwoch, 24. August 2016 18:36:42
>>> Betreff: Re: Accumulo Seek performance
>> 
>>> Ahh duh. Bad advice from me in the first place :)
>>> 
>>> Throw 'em in a threadpool locally.
>>> 
>>> dlmar...@comcast.net wrote:
 Doesn't this use the 6 batch scanners serially?
 
 
 *From: *"Sven Hodapp" 
 *To: *"user" 
 *Sent: *Wednesday, August 24, 2016 11:56:14 AM
 *Subject: *Re: Accumulo Seek performance
 
 Hi Josh,
 
 thanks for your reply!
 
 I've tested your suggestion with a implementation like that:
 
 val ranges500 = ranges.asScala.grouped(500) // this means 6
 BatchScanners will be created
 
 time("mult-scanner") {
 for (ranges <- ranges500) {
 val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
 bscan.setRanges(ranges.asJava)
 for (entry <- bscan.asScala) yield {
 entry.getKey()
 }
 }
 }
 
 And the result is a bit disappointing:
 
 background log: info: mult-scanner time: 18064.969281 ms
 background log: info: single-scanner time: 6527.482383 ms
 
 I'm doing something wrong here?
 
 
 

Re: Accumulo Seek performance

2016-08-25 Thread Sven Hodapp
Hi dlmarion,

toList should also call iterator(), and that is done in independently for each 
batch scanner iterator in the context of the Future.

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de

- Ursprüngliche Mail -
> Von: dlmar...@comcast.net
> An: "user" 
> Gesendet: Donnerstag, 25. August 2016 14:34:39
> Betreff: Re: Accumulo Seek performance

> Calling BatchScanner.iterator() is what starts the work on the server side. 
> You
> should do this first for all 6 batch scanners, then iterate over all of them 
> in
> parallel.
> 
> - Original Message -
> 
> From: "Sven Hodapp" 
> To: "user" 
> Sent: Thursday, August 25, 2016 4:53:41 AM
> Subject: Re: Accumulo Seek performance
> 
> Hi,
> 
> I've changed the code a little bit, so that it uses a thread pool (via the
> Future):
> 
> val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will
> be created
> 
> for (ranges <- ranges500) {
> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
> bscan.setRanges(ranges.asJava)
> Future {
> time("mult-scanner") {
> bscan.asScala.toList // toList forces the iteration of the iterator
> }
> }
> }
> 
> Here are the results:
> 
> background log: info: mult-scanner time: 4807.289358 ms
> background log: info: mult-scanner time: 4930.996522 ms
> background log: info: mult-scanner time: 9510.010808 ms
> background log: info: mult-scanner time: 11394.152391 ms
> background log: info: mult-scanner time: 13297.247295 ms
> background log: info: mult-scanner time: 14032.704837 ms
> 
> background log: info: single-scanner time: 15322.624393 ms
> 
> Every Future completes independent, but in return every batch scanner iterator
> needs more time to complete. :(
> This means the batch scanners aren't really processed in parallel on the 
> server
> side?
> Should I reconfigure something? Maybe the tablet servers haven't/can't 
> allocate
> enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory
> and a storage with ~300MB/s...)
> 
> Regards,
> Sven
> 
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hod...@scai.fraunhofer.de
> www.scai.fraunhofer.de
> 
> - Ursprüngliche Mail -
>> Von: "Josh Elser" 
>> An: "user" 
>> Gesendet: Mittwoch, 24. August 2016 18:36:42
>> Betreff: Re: Accumulo Seek performance
> 
>> Ahh duh. Bad advice from me in the first place :)
>> 
>> Throw 'em in a threadpool locally.
>> 
>> dlmar...@comcast.net wrote:
>>> Doesn't this use the 6 batch scanners serially?
>>> 
>>> 
>>> *From: *"Sven Hodapp" 
>>> *To: *"user" 
>>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM
>>> *Subject: *Re: Accumulo Seek performance
>>> 
>>> Hi Josh,
>>> 
>>> thanks for your reply!
>>> 
>>> I've tested your suggestion with a implementation like that:
>>> 
>>> val ranges500 = ranges.asScala.grouped(500) // this means 6
>>> BatchScanners will be created
>>> 
>>> time("mult-scanner") {
>>> for (ranges <- ranges500) {
>>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
>>> bscan.setRanges(ranges.asJava)
>>> for (entry <- bscan.asScala) yield {
>>> entry.getKey()
>>> }
>>> }
>>> }
>>> 
>>> And the result is a bit disappointing:
>>> 
>>> background log: info: mult-scanner time: 18064.969281 ms
>>> background log: info: single-scanner time: 6527.482383 ms
>>> 
>>> I'm doing something wrong here?
>>> 
>>> 
>>> Regards,
>>> Sven
>>> 
>>> --
>>> Sven Hodapp, M.Sc.,
>>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>>> Department of Bioinformatics
>>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>>> sven.hod...@scai.fraunhofer.de
>>> www.scai.fraunhofer.de
>>> 
>>> - Ursprüngliche Mail -
>>> > Von: "Josh Elser" 
>>> > An: "user" 
>>> > Gesendet: Mittwoch, 24. August 2016 16:33:37
>>> > Betreff: Re: Accumulo Seek performance
>>> 
>>> > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
>>> > 
>>> > I don't feel like 3000 ranges is too many, but this isn't quantitative.
>>> > 
>>> > IIRC, the BatchScanner will take each Range you provide, bin each Range
>>> > to the TabletServer(s) currently hosting the corresponding data, clip
>>> > (truncate) each Range to match the Tablet boundaries, and then does an
>>> > RPC to each TabletServer with just the Ranges hosted there.
>>> > 
>>> > Inside the TabletServer, it will then have many Ranges, binned by Tablet
>>> > (KeyExtent, to 

Re: Accumulo Seek performance

2016-08-25 Thread dlmarion

Calling BatchScanner.iterator() is what starts the work on the server side. You 
should do this first for all 6 batch scanners, then iterate over all of them in 
parallel. 

- Original Message -

From: "Sven Hodapp"  
To: "user"  
Sent: Thursday, August 25, 2016 4:53:41 AM 
Subject: Re: Accumulo Seek performance 

Hi, 

I've changed the code a little bit, so that it uses a thread pool (via the 
Future): 

val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will 
be created 

for (ranges <- ranges500) { 
val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2) 
bscan.setRanges(ranges.asJava) 
Future { 
time("mult-scanner") { 
bscan.asScala.toList // toList forces the iteration of the iterator 
} 
} 
} 

Here are the results: 

background log: info: mult-scanner time: 4807.289358 ms 
background log: info: mult-scanner time: 4930.996522 ms 
background log: info: mult-scanner time: 9510.010808 ms 
background log: info: mult-scanner time: 11394.152391 ms 
background log: info: mult-scanner time: 13297.247295 ms 
background log: info: mult-scanner time: 14032.704837 ms 

background log: info: single-scanner time: 15322.624393 ms 

Every Future completes independent, but in return every batch scanner iterator 
needs more time to complete. :( 
This means the batch scanners aren't really processed in parallel on the server 
side? 
Should I reconfigure something? Maybe the tablet servers haven't/can't allocate 
enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory 
and a storage with ~300MB/s...) 

Regards, 
Sven 

-- 
Sven Hodapp, M.Sc., 
Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
Department of Bioinformatics 
Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
sven.hod...@scai.fraunhofer.de 
www.scai.fraunhofer.de 

- Ursprüngliche Mail - 
> Von: "Josh Elser"  
> An: "user"  
> Gesendet: Mittwoch, 24. August 2016 18:36:42 
> Betreff: Re: Accumulo Seek performance 

> Ahh duh. Bad advice from me in the first place :) 
> 
> Throw 'em in a threadpool locally. 
> 
> dlmar...@comcast.net wrote: 
>> Doesn't this use the 6 batch scanners serially? 
>> 
>>  
>> *From: *"Sven Hodapp"  
>> *To: *"user"  
>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM 
>> *Subject: *Re: Accumulo Seek performance 
>> 
>> Hi Josh, 
>> 
>> thanks for your reply! 
>> 
>> I've tested your suggestion with a implementation like that: 
>> 
>> val ranges500 = ranges.asScala.grouped(500) // this means 6 
>> BatchScanners will be created 
>> 
>> time("mult-scanner") { 
>> for (ranges <- ranges500) { 
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1) 
>> bscan.setRanges(ranges.asJava) 
>> for (entry <- bscan.asScala) yield { 
>> entry.getKey() 
>> } 
>> } 
>> } 
>> 
>> And the result is a bit disappointing: 
>> 
>> background log: info: mult-scanner time: 18064.969281 ms 
>> background log: info: single-scanner time: 6527.482383 ms 
>> 
>> I'm doing something wrong here? 
>> 
>> 
>> Regards, 
>> Sven 
>> 
>> -- 
>> Sven Hodapp, M.Sc., 
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
>> Department of Bioinformatics 
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
>> sven.hod...@scai.fraunhofer.de 
>> www.scai.fraunhofer.de 
>> 
>> - Ursprüngliche Mail - 
>> > Von: "Josh Elser"  
>> > An: "user"  
>> > Gesendet: Mittwoch, 24. August 2016 16:33:37 
>> > Betreff: Re: Accumulo Seek performance 
>> 
>> > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710 
>> > 
>> > I don't feel like 3000 ranges is too many, but this isn't quantitative. 
>> > 
>> > IIRC, the BatchScanner will take each Range you provide, bin each Range 
>> > to the TabletServer(s) currently hosting the corresponding data, clip 
>> > (truncate) each Range to match the Tablet boundaries, and then does an 
>> > RPC to each TabletServer with just the Ranges hosted there. 
>> > 
>> > Inside the TabletServer, it will then have many Ranges, binned by Tablet 
>> > (KeyExtent, to be precise). This will spawn a 
>> > org.apache.accumulo.tserver.scan.LookupTask will will start collecting 
>> > results to send back to the client. 
>> > 
>> > The caveat here is that those ranges are processed serially on a 
>> > TabletServer. Maybe, you're swamping one TabletServer with lots of 
>> > Ranges that it could be processing in parallel. 
>> > 
>> > Could you experiment with using multiple BatchScanners and something 
>> > like Guava's Iterables.concat to make it appear like one Iterator? 
>> > 
>> > I'm curious if we should put an optimization into the BatchScanner 
>> > itself to limit the number of ranges we send in one RPC to a 
>> > TabletServer (e.g. one 

Re: Accumulo Seek performance

2016-08-25 Thread Sven Hodapp
Hi,

I've changed the code a little bit, so that it uses a thread pool (via the 
Future):

val ranges500 = ranges.asScala.grouped(500)  // this means 6 BatchScanners 
will be created

for (ranges <- ranges500) {
  val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
  bscan.setRanges(ranges.asJava)
  Future {
time("mult-scanner") {
  bscan.asScala.toList  // toList forces the iteration of the iterator
}
  }
}

Here are the results:

background log: info: mult-scanner time: 4807.289358 ms
background log: info: mult-scanner time: 4930.996522 ms
background log: info: mult-scanner time: 9510.010808 ms
background log: info: mult-scanner time: 11394.152391 ms
background log: info: mult-scanner time: 13297.247295 ms
background log: info: mult-scanner time: 14032.704837 ms

background log: info: single-scanner time: 15322.624393 ms

Every Future completes independent, but in return every batch scanner iterator 
needs more time to complete. :(
This means the batch scanners aren't really processed in parallel on the server 
side?
Should I reconfigure something? Maybe the tablet servers haven't/can't allocate 
enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory 
and a storage with ~300MB/s...)

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de

- Ursprüngliche Mail -
> Von: "Josh Elser" 
> An: "user" 
> Gesendet: Mittwoch, 24. August 2016 18:36:42
> Betreff: Re: Accumulo Seek performance

> Ahh duh. Bad advice from me in the first place :)
> 
> Throw 'em in a threadpool locally.
> 
> dlmar...@comcast.net wrote:
>> Doesn't this use the 6 batch scanners serially?
>>
>> 
>> *From: *"Sven Hodapp" 
>> *To: *"user" 
>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM
>> *Subject: *Re: Accumulo Seek performance
>>
>> Hi Josh,
>>
>> thanks for your reply!
>>
>> I've tested your suggestion with a implementation like that:
>>
>> val ranges500 = ranges.asScala.grouped(500) // this means 6
>> BatchScanners will be created
>>
>> time("mult-scanner") {
>> for (ranges <- ranges500) {
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
>> bscan.setRanges(ranges.asJava)
>> for (entry <- bscan.asScala) yield {
>> entry.getKey()
>> }
>> }
>> }
>>
>> And the result is a bit disappointing:
>>
>> background log: info: mult-scanner time: 18064.969281 ms
>> background log: info: single-scanner time: 6527.482383 ms
>>
>> I'm doing something wrong here?
>>
>>
>> Regards,
>> Sven
>>
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> sven.hod...@scai.fraunhofer.de
>> www.scai.fraunhofer.de
>>
>> - Ursprüngliche Mail -
>>  > Von: "Josh Elser" 
>>  > An: "user" 
>>  > Gesendet: Mittwoch, 24. August 2016 16:33:37
>>  > Betreff: Re: Accumulo Seek performance
>>
>>  > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
>>  >
>>  > I don't feel like 3000 ranges is too many, but this isn't quantitative.
>>  >
>>  > IIRC, the BatchScanner will take each Range you provide, bin each Range
>>  > to the TabletServer(s) currently hosting the corresponding data, clip
>>  > (truncate) each Range to match the Tablet boundaries, and then does an
>>  > RPC to each TabletServer with just the Ranges hosted there.
>>  >
>>  > Inside the TabletServer, it will then have many Ranges, binned by Tablet
>>  > (KeyExtent, to be precise). This will spawn a
>>  > org.apache.accumulo.tserver.scan.LookupTask will will start collecting
>>  > results to send back to the client.
>>  >
>>  > The caveat here is that those ranges are processed serially on a
>>  > TabletServer. Maybe, you're swamping one TabletServer with lots of
>>  > Ranges that it could be processing in parallel.
>>  >
>>  > Could you experiment with using multiple BatchScanners and something
>>  > like Guava's Iterables.concat to make it appear like one Iterator?
>>  >
>>  > I'm curious if we should put an optimization into the BatchScanner
>>  > itself to limit the number of ranges we send in one RPC to a
>>  > TabletServer (e.g. one BatchScanner might open multiple
>>  > MultiScanSessions to a TabletServer).
>>  >
>>  > Sven Hodapp wrote:
>>  >> Hi there,
>>  >>
>>  >> currently we're experimenting with a two node Accumulo cluster (two
>> tablet
>>  >> servers) setup for document storage.
>>  >> This documents are decomposed up to the sentence level.
>>  >>
>>  >> Now I'm using a BatchScanner to assemble the