Re: Accumulo Seek performance

2016-09-14 Thread Michael Moss
Setting the log level to trace helps, but overall, lack of "traditional" db
metrics has been a huge pain point for us as well.

On Wed, Sep 14, 2016 at 10:04 AM, Josh Elser  wrote:

> Nope! My test harness (the github repo) doesn't show any noticeable
> difference between BatchScanner and Scanner. Would have to do more digging
> with Sven to figure out what's happening.
>
> One takeaway is lack of metrics to tell us what is actually happening is a
> major defect, imo.
>
> On Sep 14, 2016 9:53 AM, "Dylan Hutchison" 
> wrote:
>
>> Do we have a (hopefully reproducible) conclusion from this thread,
>> regarding Scanners and BatchScanners?
>>
>> On Sep 13, 2016 11:17 PM, "Josh Elser"  wrote:
>>
>>> Yeah, this seems to have been osx causing me grief.
>>>
>>> Spun up a 3tserver cluster (on openstack, even) and reran the same
>>> experiment. I could not reproduce the issues, even without substantial
>>> config tweaking.
>>>
>>> Josh Elser wrote:
>>>
 I'm playing around with this a little more today and something is
 definitely weird on my local machine. I'm seeing insane spikes in
 performance using Scanners too.

 Coupled with Keith's inability to repro this, I am starting to think
 that these are not worthwhile numbers to put weight behind. Something I
 haven't been able to figure out is quite screwy for me.

 Josh Elser wrote:

> Sven, et al:
>
> So, it would appear that I have been able to reproduce this one (better
> late than never, I guess...). tl;dr Serially using Scanners to do point
> lookups instead of a BatchScanner is ~20x faster. This sounds like a
> pretty serious performance issue to me.
>
> Here's a general outline for what I did.
>
> * Accumulo 1.8.0
> * Created a table with 1M rows, each row with 10 columns using YCSB
> (workloada)
> * Split the table into 9 tablets
> * Computed the set of all rows in the table
>
> For a number of iterations:
> * Shuffle this set of rows
> * Choose the first N rows
> * Construct an equivalent set of Ranges from the set of Rows, choosing
> a
> random column (0-9)
> * Partition the N rows into X collections
> * Submit X tasks to query one partition of the N rows (to a thread pool
> with X fixed threads)
>
> I have two implementations of these tasks. One, where all ranges in a
> partition are executed via one BatchWriter. A second where each range
> is
> executed in serial using a Scanner. The numbers speak for themselves.
>
> ** BatchScanners **
> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
> all
> rows
> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 40178 ms
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 42296 ms
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 46094 ms
> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 47704 ms
> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 49221 ms
>
> ** Scanners **
> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
> all
> rows
> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2833 ms
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2536 ms
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2150 ms
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : 

Re: Accumulo Seek performance

2016-09-13 Thread Keith Turner
On Mon, Sep 12, 2016 at 5:50 PM, Adam J. Shook <adamjsh...@gmail.com> wrote:
> As an aside, this is actually pretty relevant to the work I've been doing
> for Presto/Accumulo integration.  It isn't uncommon to have around a million
> exact Ranges (that is, Ranges with a single row ID)  spread across the five
> Presto worker nodes we use for scanning Accumulo.  Right now, these ranges
> get packed into PrestoSplits, 10k ranges per split (an arbitrary number I
> chose), and each split is run in parallel (depending on the overall number
> of splits, they may be queued for execution).
>
> I'm curious to see the query impact of changing it to use a fixed thread
> pool of Scanners over the current BatchScanner implementation.  Maybe I'll
> play around with it sometime soon.

I added a readme to Josh's GH repo w/ the info I learned from Josh on
IRC.   So this should make it quicker for others to experiment.

>
> --Adam
>
> On Mon, Sep 12, 2016 at 2:47 PM, Dan Blum <db...@bbn.com> wrote:
>>
>> I think the 450 ranges returned a total of about 7.5M entries, but the
>> ranges were in fact quite small relative to the size of the table.
>>
>> -Original Message-
>> From: Josh Elser [mailto:josh.el...@gmail.com]
>> Sent: Monday, September 12, 2016 2:43 PM
>> To: user@accumulo.apache.org
>> Subject: Re: Accumulo Seek performance
>>
>> What does a "large scan" mean here, Dan?
>>
>> Sven's original problem statement was running many small/pointed Ranges
>> (e.g. point lookups). My observation was that BatchScanners were slower
>> than running each in a Scanner when using multiple BS's concurrently.
>>
>> Dan Blum wrote:
>> > I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using
>> > Scanners was much slower than using a BatchScanner with 11 threads, by 
>> > about
>> > a 5:1 ratio. There were 450 ranges.
>> >
>> > -Original Message-
>> > From: Josh Elser [mailto:josh.el...@gmail.com]
>> > Sent: Monday, September 12, 2016 1:42 PM
>> > To: user@accumulo.apache.org
>> > Subject: Re: Accumulo Seek performance
>> >
>> > I had increased the readahead threed pool to 32 (from 16). I had also
>> > increased the minimum thread pool size from 20 to 40. I had 10 tablets
>> > with the data block cache turned on (probably only 256M tho).
>> >
>> > Each tablet had a single file (manually compacted). Did not observe
>> > cache rates.
>> >
>> > I've been working through this with Keith on IRC this morning too. Found
>> > that a single batchscanner (one partition) is faster than the Scanner.
>> > Two partitions and things started to slow down.
>> >
>> > Two interesting points to still pursue, IMO:
>> >
>> > 1. I saw that the tserver-side logging for MultiScanSess was near
>> > identical to the BatchScanner timings
>> > 2. The minimum server threads did not seem to be taking effect. Despite
>> > having the value set to 64, I only saw a few ClientPool threads in a
>> > jstack after running the test.
>> >
>> > Adam Fuchs wrote:
>> >> Sorry, Monday morning poor reading skills, I guess. :)
>> >>
>> >> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
>> >> experience HDFS seeks tend to take something like 10-100ms, and I would
>> >> expect that time to dominate here. With 60 client threads your
>> >> bottleneck should be the readahead pool, which I believe defaults to 16
>> >> threads. If you get perfect index caching then you should be seeing
>> >> something like 3000/16*50ms = 9,375ms. That's in the right ballpark,
>> >> but
>> >> it assumes no data cache hits. Do you have any idea of how many files
>> >> you had per tablet after the ingest? Do you know what your cache hit
>> >> rate was?
>> >>
>> >> Adam
>> >>
>> >>
>> >> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser<josh.el...@gmail.com
>> >> <mailto:josh.el...@gmail.com>>  wrote:
>> >>
>> >>  5 iterations, figured that would be apparent from the log messages
>> >> :)
>> >>
>> >>  The code is already posted in my original message.
>> >>
>> >>  Adam Fuchs wrote:
>> >>
>> >>  Josh,
>> >>
>> >>  Two questions:
>> >>
>> >>  1. How many iterations did you do? I would like to see an
>> >&

Re: Accumulo Seek performance

2016-09-12 Thread Adam J. Shook
As an aside, this is actually pretty relevant to the work I've been doing
for Presto/Accumulo integration.  It isn't uncommon to have around a
million exact Ranges (that is, Ranges with a single row ID)  spread across
the five Presto worker nodes we use for scanning Accumulo.  Right now,
these ranges get packed into PrestoSplits, 10k ranges per split (an
arbitrary number I chose), and each split is run in parallel (depending on
the overall number of splits, they may be queued for execution).

I'm curious to see the query impact of changing it to use a fixed thread
pool of Scanners over the current BatchScanner implementation.  Maybe I'll
play around with it sometime soon.

--Adam

On Mon, Sep 12, 2016 at 2:47 PM, Dan Blum <db...@bbn.com> wrote:

> I think the 450 ranges returned a total of about 7.5M entries, but the
> ranges were in fact quite small relative to the size of the table.
>
> -Original Message-
> From: Josh Elser [mailto:josh.el...@gmail.com]
> Sent: Monday, September 12, 2016 2:43 PM
> To: user@accumulo.apache.org
> Subject: Re: Accumulo Seek performance
>
> What does a "large scan" mean here, Dan?
>
> Sven's original problem statement was running many small/pointed Ranges
> (e.g. point lookups). My observation was that BatchScanners were slower
> than running each in a Scanner when using multiple BS's concurrently.
>
> Dan Blum wrote:
> > I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using
> Scanners was much slower than using a BatchScanner with 11 threads, by
> about a 5:1 ratio. There were 450 ranges.
> >
> > -Original Message-
> > From: Josh Elser [mailto:josh.el...@gmail.com]
> > Sent: Monday, September 12, 2016 1:42 PM
> > To: user@accumulo.apache.org
> > Subject: Re: Accumulo Seek performance
> >
> > I had increased the readahead threed pool to 32 (from 16). I had also
> > increased the minimum thread pool size from 20 to 40. I had 10 tablets
> > with the data block cache turned on (probably only 256M tho).
> >
> > Each tablet had a single file (manually compacted). Did not observe
> > cache rates.
> >
> > I've been working through this with Keith on IRC this morning too. Found
> > that a single batchscanner (one partition) is faster than the Scanner.
> > Two partitions and things started to slow down.
> >
> > Two interesting points to still pursue, IMO:
> >
> > 1. I saw that the tserver-side logging for MultiScanSess was near
> > identical to the BatchScanner timings
> > 2. The minimum server threads did not seem to be taking effect. Despite
> > having the value set to 64, I only saw a few ClientPool threads in a
> > jstack after running the test.
> >
> > Adam Fuchs wrote:
> >> Sorry, Monday morning poor reading skills, I guess. :)
> >>
> >> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
> >> experience HDFS seeks tend to take something like 10-100ms, and I would
> >> expect that time to dominate here. With 60 client threads your
> >> bottleneck should be the readahead pool, which I believe defaults to 16
> >> threads. If you get perfect index caching then you should be seeing
> >> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
> >> it assumes no data cache hits. Do you have any idea of how many files
> >> you had per tablet after the ingest? Do you know what your cache hit
> >> rate was?
> >>
> >> Adam
> >>
> >>
> >> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser<josh.el...@gmail.com
> >> <mailto:josh.el...@gmail.com>>  wrote:
> >>
> >>  5 iterations, figured that would be apparent from the log messages
> :)
> >>
> >>  The code is already posted in my original message.
> >>
> >>  Adam Fuchs wrote:
> >>
> >>  Josh,
> >>
> >>  Two questions:
> >>
> >>  1. How many iterations did you do? I would like to see an
> absolute
> >>  number of lookups per second to compare against other
> observations.
> >>
> >>  2. Can you post your code somewhere so I can run it?
> >>
> >>  Thanks,
> >>  Adam
> >>
> >>
> >>  On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
> >>  <josh.el...@gmail.com<mailto:josh.el...@gmail.com>
> >>  <mailto:josh.el...@gmail.com<mailto:josh.el...@gmail.com>>>
> wrote:
> >>
> >>   Sven, et al:
> >>
> >>   So, it wou

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
I think the 450 ranges returned a total of about 7.5M entries, but the ranges 
were in fact quite small relative to the size of the table.

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com] 
Sent: Monday, September 12, 2016 2:43 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

What does a "large scan" mean here, Dan?

Sven's original problem statement was running many small/pointed Ranges 
(e.g. point lookups). My observation was that BatchScanners were slower 
than running each in a Scanner when using multiple BS's concurrently.

Dan Blum wrote:
> I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using 
> Scanners was much slower than using a BatchScanner with 11 threads, by about 
> a 5:1 ratio. There were 450 ranges.
>
> -Original Message-
> From: Josh Elser [mailto:josh.el...@gmail.com]
> Sent: Monday, September 12, 2016 1:42 PM
> To: user@accumulo.apache.org
> Subject: Re: Accumulo Seek performance
>
> I had increased the readahead threed pool to 32 (from 16). I had also
> increased the minimum thread pool size from 20 to 40. I had 10 tablets
> with the data block cache turned on (probably only 256M tho).
>
> Each tablet had a single file (manually compacted). Did not observe
> cache rates.
>
> I've been working through this with Keith on IRC this morning too. Found
> that a single batchscanner (one partition) is faster than the Scanner.
> Two partitions and things started to slow down.
>
> Two interesting points to still pursue, IMO:
>
> 1. I saw that the tserver-side logging for MultiScanSess was near
> identical to the BatchScanner timings
> 2. The minimum server threads did not seem to be taking effect. Despite
> having the value set to 64, I only saw a few ClientPool threads in a
> jstack after running the test.
>
> Adam Fuchs wrote:
>> Sorry, Monday morning poor reading skills, I guess. :)
>>
>> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
>> experience HDFS seeks tend to take something like 10-100ms, and I would
>> expect that time to dominate here. With 60 client threads your
>> bottleneck should be the readahead pool, which I believe defaults to 16
>> threads. If you get perfect index caching then you should be seeing
>> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
>> it assumes no data cache hits. Do you have any idea of how many files
>> you had per tablet after the ingest? Do you know what your cache hit
>> rate was?
>>
>> Adam
>>
>>
>> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser<josh.el...@gmail.com
>> <mailto:josh.el...@gmail.com>>  wrote:
>>
>>  5 iterations, figured that would be apparent from the log messages :)
>>
>>  The code is already posted in my original message.
>>
>>  Adam Fuchs wrote:
>>
>>  Josh,
>>
>>  Two questions:
>>
>>  1. How many iterations did you do? I would like to see an absolute
>>  number of lookups per second to compare against other observations.
>>
>>  2. Can you post your code somewhere so I can run it?
>>
>>  Thanks,
>>  Adam
>>
>>
>>  On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
>>  <josh.el...@gmail.com<mailto:josh.el...@gmail.com>
>>  <mailto:josh.el...@gmail.com<mailto:josh.el...@gmail.com>>>  wrote:
>>
>>   Sven, et al:
>>
>>   So, it would appear that I have been able to reproduce this one
>>   (better late than never, I guess...). tl;dr Serially using
>>  Scanners
>>   to do point lookups instead of a BatchScanner is ~20x
>>  faster. This
>>   sounds like a pretty serious performance issue to me.
>>
>>   Here's a general outline for what I did.
>>
>>   * Accumulo 1.8.0
>>   * Created a table with 1M rows, each row with 10 columns
>>  using YCSB
>>   (workloada)
>>   * Split the table into 9 tablets
>>   * Computed the set of all rows in the table
>>
>>   For a number of iterations:
>>   * Shuffle this set of rows
>>   * Choose the first N rows
>>   * Construct an equivalent set of Ranges from the set of Rows,
>>   choosing a random column (0-9)
>>   * Partition the N rows into X collections
>>   * Submit X tasks to query one partition of the N rows (to a
>>  

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser

What does a "large scan" mean here, Dan?

Sven's original problem statement was running many small/pointed Ranges 
(e.g. point lookups). My observation was that BatchScanners were slower 
than running each in a Scanner when using multiple BS's concurrently.


Dan Blum wrote:

I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using 
Scanners was much slower than using a BatchScanner with 11 threads, by about a 
5:1 ratio. There were 450 ranges.

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Monday, September 12, 2016 1:42 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

I had increased the readahead threed pool to 32 (from 16). I had also
increased the minimum thread pool size from 20 to 40. I had 10 tablets
with the data block cache turned on (probably only 256M tho).

Each tablet had a single file (manually compacted). Did not observe
cache rates.

I've been working through this with Keith on IRC this morning too. Found
that a single batchscanner (one partition) is faster than the Scanner.
Two partitions and things started to slow down.

Two interesting points to still pursue, IMO:

1. I saw that the tserver-side logging for MultiScanSess was near
identical to the BatchScanner timings
2. The minimum server threads did not seem to be taking effect. Despite
having the value set to 64, I only saw a few ClientPool threads in a
jstack after running the test.

Adam Fuchs wrote:

Sorry, Monday morning poor reading skills, I guess. :)

So, 3000 ranges in 40 seconds with the BatchScanner. In my past
experience HDFS seeks tend to take something like 10-100ms, and I would
expect that time to dominate here. With 60 client threads your
bottleneck should be the readahead pool, which I believe defaults to 16
threads. If you get perfect index caching then you should be seeing
something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
it assumes no data cache hits. Do you have any idea of how many files
you had per tablet after the ingest? Do you know what your cache hit
rate was?

Adam


On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser<josh.el...@gmail.com
<mailto:josh.el...@gmail.com>>  wrote:

 5 iterations, figured that would be apparent from the log messages :)

 The code is already posted in my original message.

 Adam Fuchs wrote:

 Josh,

 Two questions:

 1. How many iterations did you do? I would like to see an absolute
 number of lookups per second to compare against other observations.

 2. Can you post your code somewhere so I can run it?

 Thanks,
 Adam


 On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
 <josh.el...@gmail.com<mailto:josh.el...@gmail.com>
 <mailto:josh.el...@gmail.com<mailto:josh.el...@gmail.com>>>  wrote:

  Sven, et al:

  So, it would appear that I have been able to reproduce this one
  (better late than never, I guess...). tl;dr Serially using
 Scanners
  to do point lookups instead of a BatchScanner is ~20x
 faster. This
  sounds like a pretty serious performance issue to me.

  Here's a general outline for what I did.

  * Accumulo 1.8.0
  * Created a table with 1M rows, each row with 10 columns
 using YCSB
  (workloada)
  * Split the table into 9 tablets
  * Computed the set of all rows in the table

  For a number of iterations:
  * Shuffle this set of rows
  * Choose the first N rows
  * Construct an equivalent set of Ranges from the set of Rows,
  choosing a random column (0-9)
  * Partition the N rows into X collections
  * Submit X tasks to query one partition of the N rows (to a
 thread
  pool with X fixed threads)

  I have two implementations of these tasks. One, where all
 ranges in
  a partition are executed via one BatchWriter. A second
 where each
  range is executed in serial using a Scanner. The numbers
 speak for
  themselves.

  ** BatchScanners **
  2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO :
 Shuffled
  all rows
  2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
  ranges calculated: 3000 ranges found
  2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
  Executing 6 range partitions using a pool of 6 threads
  2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
 Queries
  executed in 40178 ms
  2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
  Executing 6 range partitions using a pool of 6 threads
  2016-09-10 

Re: Accumulo Seek performance

2016-09-12 Thread Keith Turner
Note  I was running a single tserver, datanode, and zookeeper on my workstation.

On Mon, Sep 12, 2016 at 2:02 PM, Keith Turner  wrote:
> Josh helped me get up and running w/ YCSB and I Am seeing very
> different results.   I am going to make a pull req to Josh's GH repo
> to add a Readme w/ what I learned from Josh in IRC.
>
> The link below is the Accumulo config I used for running a local 1.8.0 
> instance.
>
> https://gist.github.com/keith-turner/4678a0aac2a2a0e240ea5d73285743ab
>
> I created splits user1~ user2~ user3~ user4~ user5~ user6~ user7~
> user8~ user9~ AND then compacted the table.
>
> Below is the performance I saw with a single batch scanner (configured
> 1 partition).  The batch scanner has 10 threads.
>
> 2016-09-12 12:36:41,079 [client.ClientConfiguration] WARN : Found no
> client.conf in default paths. Using default client configuration
> values.
> 2016-09-12 12:36:41,428 [joshelser.YcsbBatchScanner] INFO : Connected
> to Accumulo
> 2016-09-12 12:36:41,429 [joshelser.YcsbBatchScanner] INFO : Computing ranges
> 2016-09-12 12:36:48,059 [joshelser.YcsbBatchScanner] INFO : Calculated
> all rows: Found 100 rows
> 2016-09-12 12:36:48,096 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
> 2016-09-12 12:36:48,116 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-12 12:36:48,118 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:49,372 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1252 ms
> 2016-09-12 12:36:49,372 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:50,561 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1188 ms
> 2016-09-12 12:36:50,561 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:51,741 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1179 ms
> 2016-09-12 12:36:51,741 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:52,974 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1233 ms
> 2016-09-12 12:36:52,974 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:54,146 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1171 ms
>
> Below is the performance I saw with 6 batch scanners. Each batch
> scanner has 10 threads.
>
> 2016-09-12 13:58:21,061 [client.ClientConfiguration] WARN : Found no
> client.conf in default paths. Using default client configuration
> values.
> 2016-09-12 13:58:21,380 [joshelser.YcsbBatchScanner] INFO : Connected
> to Accumulo
> 2016-09-12 13:58:21,381 [joshelser.YcsbBatchScanner] INFO : Computing ranges
> 2016-09-12 13:58:28,571 [joshelser.YcsbBatchScanner] INFO : Calculated
> all rows: Found 100 rows
> 2016-09-12 13:58:28,606 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
> 2016-09-12 13:58:28,632 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-12 13:58:28,634 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:30,273 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1637 ms
> 2016-09-12 13:58:30,273 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:31,883 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1609 ms
> 2016-09-12 13:58:31,883 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:33,422 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1539 ms
> 2016-09-12 13:58:33,422 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:34,994 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1571 ms
> 2016-09-12 13:58:34,994 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:36,512 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1517 ms
>
> Below is the performance I saw with 6 threads each using a scanner.
>
> 2016-09-12 14:01:14,972 [client.ClientConfiguration] WARN : Found no
> client.conf in default paths. Using default client configuration
> values.
> 2016-09-12 14:01:15,287 [joshelser.YcsbBatchScanner] INFO : Connected
> to Accumulo
> 2016-09-12 14:01:15,288 [joshelser.YcsbBatchScanner] INFO : Computing ranges
> 2016-09-12 14:01:22,309 [joshelser.YcsbBatchScanner] INFO : Calculated
> all rows: Found 100 rows
> 2016-09-12 14:01:22,352 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
> 2016-09-12 14:01:22,373 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-12 14:01:22,376 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 14:01:25,696 

Re: Accumulo Seek performance

2016-09-12 Thread Keith Turner
Josh helped me get up and running w/ YCSB and I Am seeing very
different results.   I am going to make a pull req to Josh's GH repo
to add a Readme w/ what I learned from Josh in IRC.

The link below is the Accumulo config I used for running a local 1.8.0 instance.

https://gist.github.com/keith-turner/4678a0aac2a2a0e240ea5d73285743ab

I created splits user1~ user2~ user3~ user4~ user5~ user6~ user7~
user8~ user9~ AND then compacted the table.

Below is the performance I saw with a single batch scanner (configured
1 partition).  The batch scanner has 10 threads.

2016-09-12 12:36:41,079 [client.ClientConfiguration] WARN : Found no
client.conf in default paths. Using default client configuration
values.
2016-09-12 12:36:41,428 [joshelser.YcsbBatchScanner] INFO : Connected
to Accumulo
2016-09-12 12:36:41,429 [joshelser.YcsbBatchScanner] INFO : Computing ranges
2016-09-12 12:36:48,059 [joshelser.YcsbBatchScanner] INFO : Calculated
all rows: Found 100 rows
2016-09-12 12:36:48,096 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
2016-09-12 12:36:48,116 [joshelser.YcsbBatchScanner] INFO : All ranges
calculated: 3000 ranges found
2016-09-12 12:36:48,118 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:49,372 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1252 ms
2016-09-12 12:36:49,372 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:50,561 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1188 ms
2016-09-12 12:36:50,561 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:51,741 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1179 ms
2016-09-12 12:36:51,741 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:52,974 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1233 ms
2016-09-12 12:36:52,974 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:54,146 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1171 ms

Below is the performance I saw with 6 batch scanners. Each batch
scanner has 10 threads.

2016-09-12 13:58:21,061 [client.ClientConfiguration] WARN : Found no
client.conf in default paths. Using default client configuration
values.
2016-09-12 13:58:21,380 [joshelser.YcsbBatchScanner] INFO : Connected
to Accumulo
2016-09-12 13:58:21,381 [joshelser.YcsbBatchScanner] INFO : Computing ranges
2016-09-12 13:58:28,571 [joshelser.YcsbBatchScanner] INFO : Calculated
all rows: Found 100 rows
2016-09-12 13:58:28,606 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
2016-09-12 13:58:28,632 [joshelser.YcsbBatchScanner] INFO : All ranges
calculated: 3000 ranges found
2016-09-12 13:58:28,634 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:30,273 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1637 ms
2016-09-12 13:58:30,273 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:31,883 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1609 ms
2016-09-12 13:58:31,883 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:33,422 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1539 ms
2016-09-12 13:58:33,422 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:34,994 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1571 ms
2016-09-12 13:58:34,994 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:36,512 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1517 ms

Below is the performance I saw with 6 threads each using a scanner.

2016-09-12 14:01:14,972 [client.ClientConfiguration] WARN : Found no
client.conf in default paths. Using default client configuration
values.
2016-09-12 14:01:15,287 [joshelser.YcsbBatchScanner] INFO : Connected
to Accumulo
2016-09-12 14:01:15,288 [joshelser.YcsbBatchScanner] INFO : Computing ranges
2016-09-12 14:01:22,309 [joshelser.YcsbBatchScanner] INFO : Calculated
all rows: Found 100 rows
2016-09-12 14:01:22,352 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
2016-09-12 14:01:22,373 [joshelser.YcsbBatchScanner] INFO : All ranges
calculated: 3000 ranges found
2016-09-12 14:01:22,376 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 14:01:25,696 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 3318 ms
2016-09-12 14:01:25,696 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 14:01:29,001 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 3305 ms
2016-09-12 14:01:29,001 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions 

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using 
Scanners was much slower than using a BatchScanner with 11 threads, by about a 
5:1 ratio. There were 450 ranges.

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com] 
Sent: Monday, September 12, 2016 1:42 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

I had increased the readahead threed pool to 32 (from 16). I had also 
increased the minimum thread pool size from 20 to 40. I had 10 tablets 
with the data block cache turned on (probably only 256M tho).

Each tablet had a single file (manually compacted). Did not observe 
cache rates.

I've been working through this with Keith on IRC this morning too. Found 
that a single batchscanner (one partition) is faster than the Scanner. 
Two partitions and things started to slow down.

Two interesting points to still pursue, IMO:

1. I saw that the tserver-side logging for MultiScanSess was near 
identical to the BatchScanner timings
2. The minimum server threads did not seem to be taking effect. Despite 
having the value set to 64, I only saw a few ClientPool threads in a 
jstack after running the test.

Adam Fuchs wrote:
> Sorry, Monday morning poor reading skills, I guess. :)
>
> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
> experience HDFS seeks tend to take something like 10-100ms, and I would
> expect that time to dominate here. With 60 client threads your
> bottleneck should be the readahead pool, which I believe defaults to 16
> threads. If you get perfect index caching then you should be seeing
> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
> it assumes no data cache hits. Do you have any idea of how many files
> you had per tablet after the ingest? Do you know what your cache hit
> rate was?
>
> Adam
>
>
> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser <josh.el...@gmail.com
> <mailto:josh.el...@gmail.com>> wrote:
>
> 5 iterations, figured that would be apparent from the log messages :)
>
> The code is already posted in my original message.
>
> Adam Fuchs wrote:
>
> Josh,
>
> Two questions:
>
> 1. How many iterations did you do? I would like to see an absolute
> number of lookups per second to compare against other observations.
>
> 2. Can you post your code somewhere so I can run it?
>
> Thanks,
> Adam
>
>
> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
> <josh.el...@gmail.com <mailto:josh.el...@gmail.com>
> <mailto:josh.el...@gmail.com <mailto:josh.el...@gmail.com>>> wrote:
>
>  Sven, et al:
>
>  So, it would appear that I have been able to reproduce this one
>  (better late than never, I guess...). tl;dr Serially using
> Scanners
>  to do point lookups instead of a BatchScanner is ~20x
> faster. This
>  sounds like a pretty serious performance issue to me.
>
>  Here's a general outline for what I did.
>
>  * Accumulo 1.8.0
>  * Created a table with 1M rows, each row with 10 columns
> using YCSB
>  (workloada)
>  * Split the table into 9 tablets
>  * Computed the set of all rows in the table
>
>  For a number of iterations:
>  * Shuffle this set of rows
>  * Choose the first N rows
>  * Construct an equivalent set of Ranges from the set of Rows,
>  choosing a random column (0-9)
>  * Partition the N rows into X collections
>  * Submit X tasks to query one partition of the N rows (to a
> thread
>  pool with X fixed threads)
>
>  I have two implementations of these tasks. One, where all
> ranges in
>  a partition are executed via one BatchWriter. A second
> where each
>  range is executed in serial using a Scanner. The numbers
> speak for
>  themselves.
>
>  ** BatchScanners **
>  2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO :
> Shuffled
>  all rows
>  2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>  ranges calculated: 3000 ranges found
>  2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>  Executing 6 range partitions using a pool of 6 threads
>  2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
> Queries
>  executed in 40178 ms
>  2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
I had increased the readahead threed pool to 32 (from 16). I had also 
increased the minimum thread pool size from 20 to 40. I had 10 tablets 
with the data block cache turned on (probably only 256M tho).


Each tablet had a single file (manually compacted). Did not observe 
cache rates.


I've been working through this with Keith on IRC this morning too. Found 
that a single batchscanner (one partition) is faster than the Scanner. 
Two partitions and things started to slow down.


Two interesting points to still pursue, IMO:

1. I saw that the tserver-side logging for MultiScanSess was near 
identical to the BatchScanner timings
2. The minimum server threads did not seem to be taking effect. Despite 
having the value set to 64, I only saw a few ClientPool threads in a 
jstack after running the test.


Adam Fuchs wrote:

Sorry, Monday morning poor reading skills, I guess. :)

So, 3000 ranges in 40 seconds with the BatchScanner. In my past
experience HDFS seeks tend to take something like 10-100ms, and I would
expect that time to dominate here. With 60 client threads your
bottleneck should be the readahead pool, which I believe defaults to 16
threads. If you get perfect index caching then you should be seeing
something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
it assumes no data cache hits. Do you have any idea of how many files
you had per tablet after the ingest? Do you know what your cache hit
rate was?

Adam


On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser > wrote:

5 iterations, figured that would be apparent from the log messages :)

The code is already posted in my original message.

Adam Fuchs wrote:

Josh,

Two questions:

1. How many iterations did you do? I would like to see an absolute
number of lookups per second to compare against other observations.

2. Can you post your code somewhere so I can run it?

Thanks,
Adam


On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser

>> wrote:

 Sven, et al:

 So, it would appear that I have been able to reproduce this one
 (better late than never, I guess...). tl;dr Serially using
Scanners
 to do point lookups instead of a BatchScanner is ~20x
faster. This
 sounds like a pretty serious performance issue to me.

 Here's a general outline for what I did.

 * Accumulo 1.8.0
 * Created a table with 1M rows, each row with 10 columns
using YCSB
 (workloada)
 * Split the table into 9 tablets
 * Computed the set of all rows in the table

 For a number of iterations:
 * Shuffle this set of rows
 * Choose the first N rows
 * Construct an equivalent set of Ranges from the set of Rows,
 choosing a random column (0-9)
 * Partition the N rows into X collections
 * Submit X tasks to query one partition of the N rows (to a
thread
 pool with X fixed threads)

 I have two implementations of these tasks. One, where all
ranges in
 a partition are executed via one BatchWriter. A second
where each
 range is executed in serial using a Scanner. The numbers
speak for
 themselves.

 ** BatchScanners **
 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO :
Shuffled
 all rows
 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
 ranges calculated: 3000 ranges found
 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
 Executing 6 range partitions using a pool of 6 threads
 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
Queries
 executed in 40178 ms
 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
 Executing 6 range partitions using a pool of 6 threads
 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
Queries
 executed in 42296 ms
 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
 Executing 6 range partitions using a pool of 6 threads
 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO :
Queries
 executed in 46094 ms
 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
 Executing 6 range partitions using a pool of 6 threads
 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO :
Queries
 executed in 47704 ms
 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
 Executing 6 range partitions 

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
Sorry, Monday morning poor reading skills, I guess. :)

So, 3000 ranges in 40 seconds with the BatchScanner. In my past experience
HDFS seeks tend to take something like 10-100ms, and I would expect that
time to dominate here. With 60 client threads your bottleneck should be the
readahead pool, which I believe defaults to 16 threads. If you get perfect
index caching then you should be seeing something like 3000/16*50ms =
9,375ms. That's in the right ballpark, but it assumes no data cache hits.
Do you have any idea of how many files you had per tablet after the ingest?
Do you know what your cache hit rate was?

Adam


On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser  wrote:

> 5 iterations, figured that would be apparent from the log messages :)
>
> The code is already posted in my original message.
>
> Adam Fuchs wrote:
>
>> Josh,
>>
>> Two questions:
>>
>> 1. How many iterations did you do? I would like to see an absolute
>> number of lookups per second to compare against other observations.
>>
>> 2. Can you post your code somewhere so I can run it?
>>
>> Thanks,
>> Adam
>>
>>
>> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser > > wrote:
>>
>> Sven, et al:
>>
>> So, it would appear that I have been able to reproduce this one
>> (better late than never, I guess...). tl;dr Serially using Scanners
>> to do point lookups instead of a BatchScanner is ~20x faster. This
>> sounds like a pretty serious performance issue to me.
>>
>> Here's a general outline for what I did.
>>
>> * Accumulo 1.8.0
>> * Created a table with 1M rows, each row with 10 columns using YCSB
>> (workloada)
>> * Split the table into 9 tablets
>> * Computed the set of all rows in the table
>>
>> For a number of iterations:
>> * Shuffle this set of rows
>> * Choose the first N rows
>> * Construct an equivalent set of Ranges from the set of Rows,
>> choosing a random column (0-9)
>> * Partition the N rows into X collections
>> * Submit X tasks to query one partition of the N rows (to a thread
>> pool with X fixed threads)
>>
>> I have two implementations of these tasks. One, where all ranges in
>> a partition are executed via one BatchWriter. A second where each
>> range is executed in serial using a Scanner. The numbers speak for
>> themselves.
>>
>> ** BatchScanners **
>> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
>> all rows
>> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>> ranges calculated: 3000 ranges found
>> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 40178 ms
>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 42296 ms
>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 46094 ms
>> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 47704 ms
>> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 49221 ms
>>
>> ** Scanners **
>> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
>> all rows
>> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>> ranges calculated: 3000 ranges found
>> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2833 ms
>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2536 ms
>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2150 ms
>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2061 ms
>> 2016-09-10 17:57:33,487 

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser

5 iterations, figured that would be apparent from the log messages :)

The code is already posted in my original message.

Adam Fuchs wrote:

Josh,

Two questions:

1. How many iterations did you do? I would like to see an absolute
number of lookups per second to compare against other observations.

2. Can you post your code somewhere so I can run it?

Thanks,
Adam


On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser > wrote:

Sven, et al:

So, it would appear that I have been able to reproduce this one
(better late than never, I guess...). tl;dr Serially using Scanners
to do point lookups instead of a BatchScanner is ~20x faster. This
sounds like a pretty serious performance issue to me.

Here's a general outline for what I did.

* Accumulo 1.8.0
* Created a table with 1M rows, each row with 10 columns using YCSB
(workloada)
* Split the table into 9 tablets
* Computed the set of all rows in the table

For a number of iterations:
* Shuffle this set of rows
* Choose the first N rows
* Construct an equivalent set of Ranges from the set of Rows,
choosing a random column (0-9)
* Partition the N rows into X collections
* Submit X tasks to query one partition of the N rows (to a thread
pool with X fixed threads)

I have two implementations of these tasks. One, where all ranges in
a partition are executed via one BatchWriter. A second where each
range is executed in serial using a Scanner. The numbers speak for
themselves.

** BatchScanners **
2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
all rows
2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
ranges calculated: 3000 ranges found
2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 40178 ms
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 42296 ms
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 46094 ms
2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 47704 ms
2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 49221 ms

** Scanners **
2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
all rows
2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
ranges calculated: 3000 ranges found
2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2833 ms
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2536 ms
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2150 ms
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2061 ms
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2140 ms

Query code is available
https://github.com/joshelser/accumulo-range-binning



Sven Hodapp wrote:

Hi Keith,

I've tried it with 1, 2 or 10 threads. Unfortunately there where
no amazing differences.
Maybe it's a problem with the table structure? For example it
may happen that one row id (e.g. a sentence) has several
thousand column families. Can this affect the seek performance?

So for my initial example it has about 3000 row ids to seek,
which will return about 500k entries. If I filter for specific
column families (e.g. a document without annotations) it will
return about 5k entries, but the seek time will only be halved.

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
Josh,

Two questions:

1. How many iterations did you do? I would like to see an absolute number
of lookups per second to compare against other observations.

2. Can you post your code somewhere so I can run it?

Thanks,
Adam


On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser  wrote:

> Sven, et al:
>
> So, it would appear that I have been able to reproduce this one (better
> late than never, I guess...). tl;dr Serially using Scanners to do point
> lookups instead of a BatchScanner is ~20x faster. This sounds like a pretty
> serious performance issue to me.
>
> Here's a general outline for what I did.
>
> * Accumulo 1.8.0
> * Created a table with 1M rows, each row with 10 columns using YCSB
> (workloada)
> * Split the table into 9 tablets
> * Computed the set of all rows in the table
>
> For a number of iterations:
> * Shuffle this set of rows
> * Choose the first N rows
> * Construct an equivalent set of Ranges from the set of Rows, choosing a
> random column (0-9)
> * Partition the N rows into X collections
> * Submit X tasks to query one partition of the N rows (to a thread pool
> with X fixed threads)
>
> I have two implementations of these tasks. One, where all ranges in a
> partition are executed via one BatchWriter. A second where each range is
> executed in serial using a Scanner. The numbers speak for themselves.
>
> ** BatchScanners **
> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 40178 ms
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 42296 ms
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 46094 ms
> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 47704 ms
> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 49221 ms
>
> ** Scanners **
> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2833 ms
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2536 ms
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2150 ms
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2061 ms
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2140 ms
>
> Query code is available https://github.com/joshelser/a
> ccumulo-range-binning
>
>
> Sven Hodapp wrote:
>
>> Hi Keith,
>>
>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>> amazing differences.
>> Maybe it's a problem with the table structure? For example it may happen
>> that one row id (e.g. a sentence) has several thousand column families. Can
>> this affect the seek performance?
>>
>> So for my initial example it has about 3000 row ids to seek, which will
>> return about 500k entries. If I filter for specific column families (e.g. a
>> document without annotations) it will return about 5k entries, but the seek
>> time will only be halved.
>> Are there to much column families to seek it fast?
>>
>> Thanks!
>>
>> Regards,
>> Sven
>>
>>


Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser

Keith Turner wrote:

On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser  wrote:

>  Good call. I kind of forgot about BatchScanner threads and trying to factor
>  those in:). I guess doing one thread in the BatchScanners would be more
>  accurate.
>
>  Although, I only had one TServer, so I don't*think*  there would be any
>  difference. I don't believe we have concurrent requests from one
>  BatchScanner to one TServer.


There are, if the batch scanner sees it has extra threads and there
are multiple tablets on the tserver, then it will submit concurrent
request to a single tserver.



Hrm, curious then. I don't think I was oversaturating the physical 
resources on my laptop, but who knows. I'll see if I can revisit this 
experiment tonight to see if it changes anything. It was very easy to 
get YCSB data up and ingested and then run this tool.


Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
I don't have enough context to say definitively, but I'd assume earlier 
versions too.


Dan Blum wrote:

Is this a problem specific to 1.8.0, or is it likely to affect earlier versions?

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Saturday, September 10, 2016 6:01 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

Sven, et al:

So, it would appear that I have been able to reproduce this one (better
late than never, I guess...). tl;dr Serially using Scanners to do point
lookups instead of a BatchScanner is ~20x faster. This sounds like a
pretty serious performance issue to me.

Here's a general outline for what I did.

* Accumulo 1.8.0
* Created a table with 1M rows, each row with 10 columns using YCSB
(workloada)
* Split the table into 9 tablets
* Computed the set of all rows in the table

For a number of iterations:
* Shuffle this set of rows
* Choose the first N rows
* Construct an equivalent set of Ranges from the set of Rows, choosing a
random column (0-9)
* Partition the N rows into X collections
* Submit X tasks to query one partition of the N rows (to a thread pool
with X fixed threads)

I have two implementations of these tasks. One, where all ranges in a
partition are executed via one BatchWriter. A second where each range is
executed in serial using a Scanner. The numbers speak for themselves.

** BatchScanners **
2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
rows
2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
calculated: 3000 ranges found
2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 40178 ms
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 42296 ms
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 46094 ms
2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 47704 ms
2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 49221 ms

** Scanners **
2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
rows
2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
calculated: 3000 ranges found
2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2833 ms
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2536 ms
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2150 ms
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2061 ms
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
range partitions using a pool of 6 threads
2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2140 ms

Query code is available https://github.com/joshelser/accumulo-range-binning

Sven Hodapp wrote:

Hi Keith,

I've tried it with 1, 2 or 10 threads. Unfortunately there where no amazing 
differences.
Maybe it's a problem with the table structure? For example it may happen that 
one row id (e.g. a sentence) has several thousand column families. Can this 
affect the seek performance?

So for my initial example it has about 3000 row ids to seek, which will return 
about 500k entries. If I filter for specific column families (e.g. a document 
without annotations) it will return about 5k entries, but the seek time will 
only be halved.
Are there to much column families to seek it fast?

Thanks!

Regards,
Sven





RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
I am not sure - my recollection is that the 1.6.x code capped the number of 
threads requested at 1 per tablet (covered by the requested ranges), not 1 per 
tablet server.

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com] 
Sent: Monday, September 12, 2016 10:58 AM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

Good call. I kind of forgot about BatchScanner threads and trying to 
factor those in :). I guess doing one thread in the BatchScanners would 
be more accurate.

Although, I only had one TServer, so I don't *think* there would be any 
difference. I don't believe we have concurrent requests from one 
BatchScanner to one TServer.

Dylan Hutchison wrote:
> Nice setup Josh.  Thank you for putting together the tests.  A few
> questions:
>
> The serial scanner implementation uses 6 threads: one for each thread in
> the thread pool.
> The batch scanner implementation uses 60 threads: 10 for each thread in
> the thread pool, since the BatchScanner was configured with 10 threads
> and there are 10 (9?) tablets.
>
> Isn't 60 threads of communication naturally inefficient?  I wonder if we
> would see the same performance if we set each BatchScanner to use 1 or 2
> threads.
>
> Maybe this would motivate a /MultiTableBatchScanner/, which maintains a
> fixed number of threads across any number of concurrent scans, possibly
> to the same table.
>
>
> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <josh.el...@gmail.com
> <mailto:josh.el...@gmail.com>> wrote:
>
> Sven, et al:
>
> So, it would appear that I have been able to reproduce this one
> (better late than never, I guess...). tl;dr Serially using Scanners
> to do point lookups instead of a BatchScanner is ~20x faster. This
> sounds like a pretty serious performance issue to me.
>
> Here's a general outline for what I did.
>
> * Accumulo 1.8.0
> * Created a table with 1M rows, each row with 10 columns using YCSB
> (workloada)
> * Split the table into 9 tablets
> * Computed the set of all rows in the table
>
> For a number of iterations:
> * Shuffle this set of rows
> * Choose the first N rows
> * Construct an equivalent set of Ranges from the set of Rows,
> choosing a random column (0-9)
> * Partition the N rows into X collections
> * Submit X tasks to query one partition of the N rows (to a thread
> pool with X fixed threads)
>
> I have two implementations of these tasks. One, where all ranges in
> a partition are executed via one BatchWriter. A second where each
> range is executed in serial using a Scanner. The numbers speak for
> themselves.
>
> ** BatchScanners **
> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
> all rows
> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
> ranges calculated: 3000 ranges found
> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
> Executing 6 range partitions using a pool of 6 threads
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 40178 ms
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
> Executing 6 range partitions using a pool of 6 threads
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 42296 ms
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
> Executing 6 range partitions using a pool of 6 threads
> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 46094 ms
> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
> Executing 6 range partitions using a pool of 6 threads
> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 47704 ms
> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
> Executing 6 range partitions using a pool of 6 threads
> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 49221 ms
>
> ** Scanners **
> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
> all rows
> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
> ranges calculated: 3000 ranges found
> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
> Executing 6 range partitions using a pool of 6 threads
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2833 ms
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
> Executing 6 range partitions using a pool of 6 threads
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>  

Re: Accumulo Seek performance

2016-09-12 Thread Keith Turner
On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser  wrote:
> Good call. I kind of forgot about BatchScanner threads and trying to factor
> those in :). I guess doing one thread in the BatchScanners would be more
> accurate.
>
> Although, I only had one TServer, so I don't *think* there would be any
> difference. I don't believe we have concurrent requests from one
> BatchScanner to one TServer.

There are, if the batch scanner sees it has extra threads and there
are multiple tablets on the tserver, then it will submit concurrent
request to a single tserver.

>
> Dylan Hutchison wrote:
>>
>> Nice setup Josh.  Thank you for putting together the tests.  A few
>> questions:
>>
>> The serial scanner implementation uses 6 threads: one for each thread in
>> the thread pool.
>> The batch scanner implementation uses 60 threads: 10 for each thread in
>> the thread pool, since the BatchScanner was configured with 10 threads
>> and there are 10 (9?) tablets.
>>
>> Isn't 60 threads of communication naturally inefficient?  I wonder if we
>> would see the same performance if we set each BatchScanner to use 1 or 2
>> threads.
>>
>> Maybe this would motivate a /MultiTableBatchScanner/, which maintains a
>> fixed number of threads across any number of concurrent scans, possibly
>> to the same table.
>>
>>
>> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser > > wrote:
>>
>> Sven, et al:
>>
>> So, it would appear that I have been able to reproduce this one
>> (better late than never, I guess...). tl;dr Serially using Scanners
>> to do point lookups instead of a BatchScanner is ~20x faster. This
>> sounds like a pretty serious performance issue to me.
>>
>> Here's a general outline for what I did.
>>
>> * Accumulo 1.8.0
>> * Created a table with 1M rows, each row with 10 columns using YCSB
>> (workloada)
>> * Split the table into 9 tablets
>> * Computed the set of all rows in the table
>>
>> For a number of iterations:
>> * Shuffle this set of rows
>> * Choose the first N rows
>> * Construct an equivalent set of Ranges from the set of Rows,
>> choosing a random column (0-9)
>> * Partition the N rows into X collections
>> * Submit X tasks to query one partition of the N rows (to a thread
>> pool with X fixed threads)
>>
>> I have two implementations of these tasks. One, where all ranges in
>> a partition are executed via one BatchWriter. A second where each
>> range is executed in serial using a Scanner. The numbers speak for
>> themselves.
>>
>> ** BatchScanners **
>> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
>> all rows
>> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>> ranges calculated: 3000 ranges found
>> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 40178 ms
>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 42296 ms
>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 46094 ms
>> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 47704 ms
>> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 49221 ms
>>
>> ** Scanners **
>> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
>> all rows
>> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>> ranges calculated: 3000 ranges found
>> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2833 ms
>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2536 ms
>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>> Executing 6 range partitions using a pool of 6 threads
>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2150 ms
>> 2016-09-10 17:57:31,425 

Re: Accumulo Seek performance

2016-09-12 Thread Josh Elser
Good call. I kind of forgot about BatchScanner threads and trying to 
factor those in :). I guess doing one thread in the BatchScanners would 
be more accurate.


Although, I only had one TServer, so I don't *think* there would be any 
difference. I don't believe we have concurrent requests from one 
BatchScanner to one TServer.


Dylan Hutchison wrote:

Nice setup Josh.  Thank you for putting together the tests.  A few
questions:

The serial scanner implementation uses 6 threads: one for each thread in
the thread pool.
The batch scanner implementation uses 60 threads: 10 for each thread in
the thread pool, since the BatchScanner was configured with 10 threads
and there are 10 (9?) tablets.

Isn't 60 threads of communication naturally inefficient?  I wonder if we
would see the same performance if we set each BatchScanner to use 1 or 2
threads.

Maybe this would motivate a /MultiTableBatchScanner/, which maintains a
fixed number of threads across any number of concurrent scans, possibly
to the same table.


On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser > wrote:

Sven, et al:

So, it would appear that I have been able to reproduce this one
(better late than never, I guess...). tl;dr Serially using Scanners
to do point lookups instead of a BatchScanner is ~20x faster. This
sounds like a pretty serious performance issue to me.

Here's a general outline for what I did.

* Accumulo 1.8.0
* Created a table with 1M rows, each row with 10 columns using YCSB
(workloada)
* Split the table into 9 tablets
* Computed the set of all rows in the table

For a number of iterations:
* Shuffle this set of rows
* Choose the first N rows
* Construct an equivalent set of Ranges from the set of Rows,
choosing a random column (0-9)
* Partition the N rows into X collections
* Submit X tasks to query one partition of the N rows (to a thread
pool with X fixed threads)

I have two implementations of these tasks. One, where all ranges in
a partition are executed via one BatchWriter. A second where each
range is executed in serial using a Scanner. The numbers speak for
themselves.

** BatchScanners **
2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
all rows
2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
ranges calculated: 3000 ranges found
2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 40178 ms
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 42296 ms
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 46094 ms
2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 47704 ms
2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 49221 ms

** Scanners **
2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
all rows
2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
ranges calculated: 3000 ranges found
2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2833 ms
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2536 ms
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2150 ms
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2061 ms
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
Executing 6 range partitions using a pool of 6 threads
2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2140 ms

Query code is available
https://github.com/joshelser/accumulo-range-binning

RE: Accumulo Seek performance

2016-09-12 Thread Dan Blum
Is this a problem specific to 1.8.0, or is it likely to affect earlier versions?

-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com] 
Sent: Saturday, September 10, 2016 6:01 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

Sven, et al:

So, it would appear that I have been able to reproduce this one (better 
late than never, I guess...). tl;dr Serially using Scanners to do point 
lookups instead of a BatchScanner is ~20x faster. This sounds like a 
pretty serious performance issue to me.

Here's a general outline for what I did.

* Accumulo 1.8.0
* Created a table with 1M rows, each row with 10 columns using YCSB 
(workloada)
* Split the table into 9 tablets
* Computed the set of all rows in the table

For a number of iterations:
* Shuffle this set of rows
* Choose the first N rows
* Construct an equivalent set of Ranges from the set of Rows, choosing a 
random column (0-9)
* Partition the N rows into X collections
* Submit X tasks to query one partition of the N rows (to a thread pool 
with X fixed threads)

I have two implementations of these tasks. One, where all ranges in a 
partition are executed via one BatchWriter. A second where each range is 
executed in serial using a Scanner. The numbers speak for themselves.

** BatchScanners **
2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all 
rows
2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges 
calculated: 3000 ranges found
2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 40178 ms
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 42296 ms
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 46094 ms
2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 47704 ms
2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 49221 ms

** Scanners **
2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all 
rows
2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges 
calculated: 3000 ranges found
2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2833 ms
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2536 ms
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2150 ms
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2061 ms
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2140 ms

Query code is available https://github.com/joshelser/accumulo-range-binning

Sven Hodapp wrote:
> Hi Keith,
>
> I've tried it with 1, 2 or 10 threads. Unfortunately there where no amazing 
> differences.
> Maybe it's a problem with the table structure? For example it may happen that 
> one row id (e.g. a sentence) has several thousand column families. Can this 
> affect the seek performance?
>
> So for my initial example it has about 3000 row ids to seek, which will 
> return about 500k entries. If I filter for specific column families (e.g. a 
> document without annotations) it will return about 5k entries, but the seek 
> time will only be halved.
> Are there to much column families to seek it fast?
>
> Thanks!
>
> Regards,
> Sven
>



Re: Accumulo Seek performance

2016-09-12 Thread Dylan Hutchison
Nice setup Josh.  Thank you for putting together the tests.  A few
questions:

The serial scanner implementation uses 6 threads: one for each thread in
the thread pool.
The batch scanner implementation uses 60 threads: 10 for each thread in the
thread pool, since the BatchScanner was configured with 10 threads and
there are 10 (9?) tablets.

Isn't 60 threads of communication naturally inefficient?  I wonder if we
would see the same performance if we set each BatchScanner to use 1 or 2
threads.

Maybe this would motivate a *MultiTableBatchScanner*, which maintains a
fixed number of threads across any number of concurrent scans, possibly to
the same table.


On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser  wrote:

> Sven, et al:
>
> So, it would appear that I have been able to reproduce this one (better
> late than never, I guess...). tl;dr Serially using Scanners to do point
> lookups instead of a BatchScanner is ~20x faster. This sounds like a pretty
> serious performance issue to me.
>
> Here's a general outline for what I did.
>
> * Accumulo 1.8.0
> * Created a table with 1M rows, each row with 10 columns using YCSB
> (workloada)
> * Split the table into 9 tablets
> * Computed the set of all rows in the table
>
> For a number of iterations:
> * Shuffle this set of rows
> * Choose the first N rows
> * Construct an equivalent set of Ranges from the set of Rows, choosing a
> random column (0-9)
> * Partition the N rows into X collections
> * Submit X tasks to query one partition of the N rows (to a thread pool
> with X fixed threads)
>
> I have two implementations of these tasks. One, where all ranges in a
> partition are executed via one BatchWriter. A second where each range is
> executed in serial using a Scanner. The numbers speak for themselves.
>
> ** BatchScanners **
> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 40178 ms
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 42296 ms
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 46094 ms
> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 47704 ms
> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 49221 ms
>
> ** Scanners **
> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2833 ms
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2536 ms
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2150 ms
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2061 ms
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2140 ms
>
> Query code is available https://github.com/joshelser/a
> ccumulo-range-binning
>
>
> Sven Hodapp wrote:
>
>> Hi Keith,
>>
>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>> amazing differences.
>> Maybe it's a problem with the table structure? For example it may happen
>> that one row id (e.g. a sentence) has several thousand column families. Can
>> this affect the seek performance?
>>
>> So for my initial example it has about 3000 row ids to seek, which will
>> return about 500k entries. If I filter for specific column families (e.g. a
>> document without annotations) it will return about 5k entries, but the seek
>> time will only 

Re: Accumulo Seek performance

2016-09-10 Thread Josh Elser

Sven, et al:

So, it would appear that I have been able to reproduce this one (better 
late than never, I guess...). tl;dr Serially using Scanners to do point 
lookups instead of a BatchScanner is ~20x faster. This sounds like a 
pretty serious performance issue to me.


Here's a general outline for what I did.

* Accumulo 1.8.0
* Created a table with 1M rows, each row with 10 columns using YCSB 
(workloada)

* Split the table into 9 tablets
* Computed the set of all rows in the table

For a number of iterations:
* Shuffle this set of rows
* Choose the first N rows
* Construct an equivalent set of Ranges from the set of Rows, choosing a 
random column (0-9)

* Partition the N rows into X collections
* Submit X tasks to query one partition of the N rows (to a thread pool 
with X fixed threads)


I have two implementations of these tasks. One, where all ranges in a 
partition are executed via one BatchWriter. A second where each range is 
executed in serial using a Scanner. The numbers speak for themselves.


** BatchScanners **
2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all 
rows
2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges 
calculated: 3000 ranges found
2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 40178 ms
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 42296 ms
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 46094 ms
2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 47704 ms
2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 49221 ms


** Scanners **
2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all 
rows
2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges 
calculated: 3000 ranges found
2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2833 ms
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2536 ms
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2150 ms
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2061 ms
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2140 ms


Query code is available https://github.com/joshelser/accumulo-range-binning

Sven Hodapp wrote:

Hi Keith,

I've tried it with 1, 2 or 10 threads. Unfortunately there where no amazing 
differences.
Maybe it's a problem with the table structure? For example it may happen that 
one row id (e.g. a sentence) has several thousand column families. Can this 
affect the seek performance?

So for my initial example it has about 3000 row ids to seek, which will return 
about 500k entries. If I filter for specific column families (e.g. a document 
without annotations) it will return about 5k entries, but the seek time will 
only be halved.
Are there to much column families to seek it fast?

Thanks!

Regards,
Sven



Re: Accumulo Seek performance

2016-08-31 Thread Dylan Hutchison
Hi Sven,
  Without locality groups, your filtered scan may be reading nearly the
entire table.  The process looks like this:

   1. For each tablet that has one of the 3000 row ids (assuming sufficient
   tablet servers),
  1. *Seek* to the first column family of the first row id out of the
  target row ids in the tablet.
  2. *Read* that row+cf prefix.
  3. Find the next cf (out of the 5k cf's in your filter).
 1. *Read* the next entry and see if it is in the cf.  If it is,
 then you are lucky and go back to step 2.  Repeat this process for 10
 entries (a heuristic number).
 2. If none of the next 10 entries match the cf (or the next row in
 your target ranges), then *seek* to the next target row+cf, as in
 step 1.
  4. Continue until all target row ids in the tablet are scanned.

In the worst case, if the 5k target cf's in your filter are uniformly
spread out among the 500k total cf's (and each row has all 500k cf's, which
is probably not the case in your document-sentence table), then Accumulo
performs 5k seeks per row id, or 5k * 3k rows = 15M seeks, to be divided
among your tablet servers (assuming no significant skew).  You can adjust
this for the actual distribution of column families in your table to get an
idea of how many seeks Accumulo performs.

(On the other hand in the best case, if the 5k target cf's are all clumped
together, then Accumulo need only seek 3k times, or less if some row ids
are consecutive.)

Perhaps others could extend the model by estimating a "seconds/seek"
figure?  If we can estimate this, it would tell you whether your
BatchScanner times are in the right ballpark.  Or it might be sufficient to
compare the number of seeks.

Cheers, Dylan

On Wed, Aug 31, 2016 at 12:06 AM, Sven Hodapp <
sven.hod...@scai.fraunhofer.de> wrote:

> Hi Keith,
>
> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
> amazing differences.
> Maybe it's a problem with the table structure? For example it may happen
> that one row id (e.g. a sentence) has several thousand column families. Can
> this affect the seek performance?
>
> So for my initial example it has about 3000 row ids to seek, which will
> return about 500k entries. If I filter for specific column families (e.g. a
> document without annotations) it will return about 5k entries, but the seek
> time will only be halved.
> Are there to much column families to seek it fast?
>
> Thanks!
>
> Regards,
> Sven
>
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hod...@scai.fraunhofer.de
> www.scai.fraunhofer.de
>
> - Ursprüngliche Mail -
> > Von: "Keith Turner" <ke...@deenlo.com>
> > An: "user" <user@accumulo.apache.org>
> > Gesendet: Montag, 29. August 2016 22:37:32
> > Betreff: Re: Accumulo Seek performance
>
> > On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
> > <sven.hod...@scai.fraunhofer.de> wrote:
> >> Hi there,
> >>
> >> currently we're experimenting with a two node Accumulo cluster (two
> tablet
> >> servers) setup for document storage.
> >> This documents are decomposed up to the sentence level.
> >>
> >> Now I'm using a BatchScanner to assemble the full document like this:
> >>
> >> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) //
> ARTIFACTS table
> >> currently hosts ~30GB data, ~200M entries on ~45 tablets
> >> bscan.setRanges(ranges)  // there are like 3000 Range.exact's in
> the ranges-list
> >>   for (entry <- bscan.asScala) yield {
> >> val key = entry.getKey()
> >> val value = entry.getValue()
> >> // etc.
> >>   }
> >>
> >> For larger full documents (e.g. 3000 exact ranges), this operation will
> take
> >> about 12 seconds.
> >> But shorter documents are assembled blazing fast...
> >>
> >> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
> >> Is that a normal time for such a (seek) operation?
> >> Can I do something to get a better seek performance?
> >
> > How many threads did you configure the batch scanner with and did you
> > try varying this?
> >
> >>
> >> Note: I have already enabled bloom filtering on that table.
> >>
> >> Thank you for any advice!
> >>
> >> Regards,
> >> Sven
> >>
> >> --
> >> Sven Hodapp, M.Sc.,
> >> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> >> Department of Bioinformatics
> >> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> >> sven.hod...@scai.fraunhofer.de
> > > www.scai.fraunhofer.de
>


Re: Accumulo Seek performance

2016-08-31 Thread Sven Hodapp
Hi Keith,

I've tried it with 1, 2 or 10 threads. Unfortunately there where no amazing 
differences.
Maybe it's a problem with the table structure? For example it may happen that 
one row id (e.g. a sentence) has several thousand column families. Can this 
affect the seek performance?

So for my initial example it has about 3000 row ids to seek, which will return 
about 500k entries. If I filter for specific column families (e.g. a document 
without annotations) it will return about 5k entries, but the seek time will 
only be halved.
Are there to much column families to seek it fast?

Thanks!

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de

- Ursprüngliche Mail -
> Von: "Keith Turner" <ke...@deenlo.com>
> An: "user" <user@accumulo.apache.org>
> Gesendet: Montag, 29. August 2016 22:37:32
> Betreff: Re: Accumulo Seek performance

> On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
> <sven.hod...@scai.fraunhofer.de> wrote:
>> Hi there,
>>
>> currently we're experimenting with a two node Accumulo cluster (two tablet
>> servers) setup for document storage.
>> This documents are decomposed up to the sentence level.
>>
>> Now I'm using a BatchScanner to assemble the full document like this:
>>
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // 
>> ARTIFACTS table
>> currently hosts ~30GB data, ~200M entries on ~45 tablets
>> bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the 
>> ranges-list
>>   for (entry <- bscan.asScala) yield {
>> val key = entry.getKey()
>> val value = entry.getValue()
>> // etc.
>>   }
>>
>> For larger full documents (e.g. 3000 exact ranges), this operation will take
>> about 12 seconds.
>> But shorter documents are assembled blazing fast...
>>
>> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
>> Is that a normal time for such a (seek) operation?
>> Can I do something to get a better seek performance?
> 
> How many threads did you configure the batch scanner with and did you
> try varying this?
> 
>>
>> Note: I have already enabled bloom filtering on that table.
>>
>> Thank you for any advice!
>>
>> Regards,
>> Sven
>>
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> sven.hod...@scai.fraunhofer.de
> > www.scai.fraunhofer.de


Re: Accumulo Seek performance

2016-08-29 Thread Keith Turner
On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
 wrote:
> Hi there,
>
> currently we're experimenting with a two node Accumulo cluster (two tablet 
> servers) setup for document storage.
> This documents are decomposed up to the sentence level.
>
> Now I'm using a BatchScanner to assemble the full document like this:
>
> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // 
> ARTIFACTS table currently hosts ~30GB data, ~200M entries on ~45 tablets
> bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the 
> ranges-list
>   for (entry <- bscan.asScala) yield {
> val key = entry.getKey()
> val value = entry.getValue()
> // etc.
>   }
>
> For larger full documents (e.g. 3000 exact ranges), this operation will take 
> about 12 seconds.
> But shorter documents are assembled blazing fast...
>
> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
> Is that a normal time for such a (seek) operation?
> Can I do something to get a better seek performance?

How many threads did you configure the batch scanner with and did you
try varying this?

>
> Note: I have already enabled bloom filtering on that table.
>
> Thank you for any advice!
>
> Regards,
> Sven
>
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hod...@scai.fraunhofer.de
> www.scai.fraunhofer.de


Re: Accumulo Seek performance

2016-08-25 Thread Josh Elser

Sven,

Strange results. BatchScanners most definitely can be processed in 
parallel by the tabletservers.


There is a dynamically resizing threadpool in the TabletServers that 
respond to load on the system. As the pool remains full, it will grow. 
As it remains empty, it will shrink.


A few more questions: how many TabletServers do you have and did you run 
this benchmark multiple times in succession to see if the results 
changed? Also, have you tried increasing the number of threads per 
batchscanner to see if that makes a difference?


I might have to try to run a similar later today. I am curious :)

Sven Hodapp wrote:

Hi,

I've changed the code a little bit, so that it uses a thread pool (via the 
Future):

 val ranges500 = ranges.asScala.grouped(500)  // this means 6 BatchScanners 
will be created

 for (ranges<- ranges500) {
   val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
   bscan.setRanges(ranges.asJava)
   Future {
 time("mult-scanner") {
   bscan.asScala.toList  // toList forces the iteration of the iterator
 }
   }
 }

Here are the results:

 background log: info: mult-scanner time: 4807.289358 ms
 background log: info: mult-scanner time: 4930.996522 ms
 background log: info: mult-scanner time: 9510.010808 ms
 background log: info: mult-scanner time: 11394.152391 ms
 background log: info: mult-scanner time: 13297.247295 ms
 background log: info: mult-scanner time: 14032.704837 ms

 background log: info: single-scanner time: 15322.624393 ms

Every Future completes independent, but in return every batch scanner iterator 
needs more time to complete. :(
This means the batch scanners aren't really processed in parallel on the server 
side?
Should I reconfigure something? Maybe the tablet servers haven't/can't allocate 
enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory 
and a storage with ~300MB/s...)

Regards,
Sven



Re: Accumulo Seek performance

2016-08-25 Thread Sven Hodapp
Hi Dave,

toList will exhaust the iterator. But all 6 iterators will be concurrently 
exhausted within the Future object 
(http://docs.scala-lang.org/overviews/core/futures.html).

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de

- Ursprüngliche Mail -
> Von: dlmar...@comcast.net
> An: "user" <user@accumulo.apache.org>
> Gesendet: Donnerstag, 25. August 2016 16:22:35
> Betreff: Re: Accumulo Seek performance

> But does toList exhaust the first iterator() before going to the next?
> 
> - Dave
> 
> 
> - Original Message -
> 
> From: "Sven Hodapp" <sven.hod...@scai.fraunhofer.de>
> To: "user" <user@accumulo.apache.org>
> Sent: Thursday, August 25, 2016 9:42:00 AM
> Subject: Re: Accumulo Seek performance
> 
> Hi dlmarion,
> 
> toList should also call iterator(), and that is done in independently for each
> batch scanner iterator in the context of the Future.
> 
> Regards,
> Sven
> 
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hod...@scai.fraunhofer.de
> www.scai.fraunhofer.de
> 
> - Ursprüngliche Mail -----
>> Von: dlmar...@comcast.net
>> An: "user" <user@accumulo.apache.org>
>> Gesendet: Donnerstag, 25. August 2016 14:34:39
>> Betreff: Re: Accumulo Seek performance
> 
>> Calling BatchScanner.iterator() is what starts the work on the server side. 
>> You
>> should do this first for all 6 batch scanners, then iterate over all of them 
>> in
>> parallel.
>> 
>> - Original Message -
>> 
>> From: "Sven Hodapp" <sven.hod...@scai.fraunhofer.de>
>> To: "user" <user@accumulo.apache.org>
>> Sent: Thursday, August 25, 2016 4:53:41 AM
>> Subject: Re: Accumulo Seek performance
>> 
>> Hi,
>> 
>> I've changed the code a little bit, so that it uses a thread pool (via the
>> Future):
>> 
>> val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners 
>> will
>> be created
>> 
>> for (ranges <- ranges500) {
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
>> bscan.setRanges(ranges.asJava)
>> Future {
>> time("mult-scanner") {
>> bscan.asScala.toList // toList forces the iteration of the iterator
>> }
>> }
>> }
>> 
>> Here are the results:
>> 
>> background log: info: mult-scanner time: 4807.289358 ms
>> background log: info: mult-scanner time: 4930.996522 ms
>> background log: info: mult-scanner time: 9510.010808 ms
>> background log: info: mult-scanner time: 11394.152391 ms
>> background log: info: mult-scanner time: 13297.247295 ms
>> background log: info: mult-scanner time: 14032.704837 ms
>> 
>> background log: info: single-scanner time: 15322.624393 ms
>> 
>> Every Future completes independent, but in return every batch scanner 
>> iterator
>> needs more time to complete. :(
>> This means the batch scanners aren't really processed in parallel on the 
>> server
>> side?
>> Should I reconfigure something? Maybe the tablet servers haven't/can't 
>> allocate
>> enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory
>> and a storage with ~300MB/s...)
>> 
>> Regards,
>> Sven
>> 
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> sven.hod...@scai.fraunhofer.de
>> www.scai.fraunhofer.de
>> 
>> - Ursprüngliche Mail -
>>> Von: "Josh Elser" <josh.el...@gmail.com>
>>> An: "user" <user@accumulo.apache.org>
>>> Gesendet: Mittwoch, 24. August 2016 18:36:42
>>> Betreff: Re: Accumulo Seek performance
>> 
>>> Ahh duh. Bad advice from me in the first place :)
>>> 
>>> Throw 'em in a threadpool locally.
>>> 
>>> dlmar...@comcast.net wrote:
>>>> Doesn't this use the 6 batch scanners serially?
>>>> 
>>>> 
>>>> *From: *"Sven Hodapp" <sven.hod...@scai.fraunhofer.de>
>>>> *To: *"user&q

Re: Accumulo Seek performance

2016-08-25 Thread Sven Hodapp
Hi dlmarion,

toList should also call iterator(), and that is done in independently for each 
batch scanner iterator in the context of the Future.

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de

- Ursprüngliche Mail -
> Von: dlmar...@comcast.net
> An: "user" <user@accumulo.apache.org>
> Gesendet: Donnerstag, 25. August 2016 14:34:39
> Betreff: Re: Accumulo Seek performance

> Calling BatchScanner.iterator() is what starts the work on the server side. 
> You
> should do this first for all 6 batch scanners, then iterate over all of them 
> in
> parallel.
> 
> - Original Message -
> 
> From: "Sven Hodapp" <sven.hod...@scai.fraunhofer.de>
> To: "user" <user@accumulo.apache.org>
> Sent: Thursday, August 25, 2016 4:53:41 AM
> Subject: Re: Accumulo Seek performance
> 
> Hi,
> 
> I've changed the code a little bit, so that it uses a thread pool (via the
> Future):
> 
> val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will
> be created
> 
> for (ranges <- ranges500) {
> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
> bscan.setRanges(ranges.asJava)
> Future {
> time("mult-scanner") {
> bscan.asScala.toList // toList forces the iteration of the iterator
> }
> }
> }
> 
> Here are the results:
> 
> background log: info: mult-scanner time: 4807.289358 ms
> background log: info: mult-scanner time: 4930.996522 ms
> background log: info: mult-scanner time: 9510.010808 ms
> background log: info: mult-scanner time: 11394.152391 ms
> background log: info: mult-scanner time: 13297.247295 ms
> background log: info: mult-scanner time: 14032.704837 ms
> 
> background log: info: single-scanner time: 15322.624393 ms
> 
> Every Future completes independent, but in return every batch scanner iterator
> needs more time to complete. :(
> This means the batch scanners aren't really processed in parallel on the 
> server
> side?
> Should I reconfigure something? Maybe the tablet servers haven't/can't 
> allocate
> enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory
> and a storage with ~300MB/s...)
> 
> Regards,
> Sven
> 
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hod...@scai.fraunhofer.de
> www.scai.fraunhofer.de
> 
> - Ursprüngliche Mail -
>> Von: "Josh Elser" <josh.el...@gmail.com>
>> An: "user" <user@accumulo.apache.org>
>> Gesendet: Mittwoch, 24. August 2016 18:36:42
>> Betreff: Re: Accumulo Seek performance
> 
>> Ahh duh. Bad advice from me in the first place :)
>> 
>> Throw 'em in a threadpool locally.
>> 
>> dlmar...@comcast.net wrote:
>>> Doesn't this use the 6 batch scanners serially?
>>> 
>>> 
>>> *From: *"Sven Hodapp" <sven.hod...@scai.fraunhofer.de>
>>> *To: *"user" <user@accumulo.apache.org>
>>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM
>>> *Subject: *Re: Accumulo Seek performance
>>> 
>>> Hi Josh,
>>> 
>>> thanks for your reply!
>>> 
>>> I've tested your suggestion with a implementation like that:
>>> 
>>> val ranges500 = ranges.asScala.grouped(500) // this means 6
>>> BatchScanners will be created
>>> 
>>> time("mult-scanner") {
>>> for (ranges <- ranges500) {
>>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
>>> bscan.setRanges(ranges.asJava)
>>> for (entry <- bscan.asScala) yield {
>>> entry.getKey()
>>> }
>>> }
>>> }
>>> 
>>> And the result is a bit disappointing:
>>> 
>>> background log: info: mult-scanner time: 18064.969281 ms
>>> background log: info: single-scanner time: 6527.482383 ms
>>> 
>>> I'm doing something wrong here?
>>> 
>>> 
>>> Regards,
>>> Sven
>>> 
>>> --
>>> Sven Hodapp, M.Sc.,
>>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>>> Department of Bioinformatics
>>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>>> sven.hod...@scai.fraunhofer.de
>>> www.scai.fraunhof

Re: Accumulo Seek performance

2016-08-25 Thread dlmarion

Calling BatchScanner.iterator() is what starts the work on the server side. You 
should do this first for all 6 batch scanners, then iterate over all of them in 
parallel. 

- Original Message -

From: "Sven Hodapp" <sven.hod...@scai.fraunhofer.de> 
To: "user" <user@accumulo.apache.org> 
Sent: Thursday, August 25, 2016 4:53:41 AM 
Subject: Re: Accumulo Seek performance 

Hi, 

I've changed the code a little bit, so that it uses a thread pool (via the 
Future): 

val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will 
be created 

for (ranges <- ranges500) { 
val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2) 
bscan.setRanges(ranges.asJava) 
Future { 
time("mult-scanner") { 
bscan.asScala.toList // toList forces the iteration of the iterator 
} 
} 
} 

Here are the results: 

background log: info: mult-scanner time: 4807.289358 ms 
background log: info: mult-scanner time: 4930.996522 ms 
background log: info: mult-scanner time: 9510.010808 ms 
background log: info: mult-scanner time: 11394.152391 ms 
background log: info: mult-scanner time: 13297.247295 ms 
background log: info: mult-scanner time: 14032.704837 ms 

background log: info: single-scanner time: 15322.624393 ms 

Every Future completes independent, but in return every batch scanner iterator 
needs more time to complete. :( 
This means the batch scanners aren't really processed in parallel on the server 
side? 
Should I reconfigure something? Maybe the tablet servers haven't/can't allocate 
enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory 
and a storage with ~300MB/s...) 

Regards, 
Sven 

-- 
Sven Hodapp, M.Sc., 
Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
Department of Bioinformatics 
Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
sven.hod...@scai.fraunhofer.de 
www.scai.fraunhofer.de 

- Ursprüngliche Mail - 
> Von: "Josh Elser" <josh.el...@gmail.com> 
> An: "user" <user@accumulo.apache.org> 
> Gesendet: Mittwoch, 24. August 2016 18:36:42 
> Betreff: Re: Accumulo Seek performance 

> Ahh duh. Bad advice from me in the first place :) 
> 
> Throw 'em in a threadpool locally. 
> 
> dlmar...@comcast.net wrote: 
>> Doesn't this use the 6 batch scanners serially? 
>> 
>>  
>> *From: *"Sven Hodapp" <sven.hod...@scai.fraunhofer.de> 
>> *To: *"user" <user@accumulo.apache.org> 
>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM 
>> *Subject: *Re: Accumulo Seek performance 
>> 
>> Hi Josh, 
>> 
>> thanks for your reply! 
>> 
>> I've tested your suggestion with a implementation like that: 
>> 
>> val ranges500 = ranges.asScala.grouped(500) // this means 6 
>> BatchScanners will be created 
>> 
>> time("mult-scanner") { 
>> for (ranges <- ranges500) { 
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1) 
>> bscan.setRanges(ranges.asJava) 
>> for (entry <- bscan.asScala) yield { 
>> entry.getKey() 
>> } 
>> } 
>> } 
>> 
>> And the result is a bit disappointing: 
>> 
>> background log: info: mult-scanner time: 18064.969281 ms 
>> background log: info: single-scanner time: 6527.482383 ms 
>> 
>> I'm doing something wrong here? 
>> 
>> 
>> Regards, 
>> Sven 
>> 
>> -- 
>> Sven Hodapp, M.Sc., 
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
>> Department of Bioinformatics 
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
>> sven.hod...@scai.fraunhofer.de 
>> www.scai.fraunhofer.de 
>> 
>> - Ursprüngliche Mail - 
>> > Von: "Josh Elser" <josh.el...@gmail.com> 
>> > An: "user" <user@accumulo.apache.org> 
>> > Gesendet: Mittwoch, 24. August 2016 16:33:37 
>> > Betreff: Re: Accumulo Seek performance 
>> 
>> > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710 
>> > 
>> > I don't feel like 3000 ranges is too many, but this isn't quantitative. 
>> > 
>> > IIRC, the BatchScanner will take each Range you provide, bin each Range 
>> > to the TabletServer(s) currently hosting the corresponding data, clip 
>> > (truncate) each Range to match the Tablet boundaries, and then does an 
>> > RPC to each TabletServer with just the Ranges hosted there. 
>> > 
>> > Inside the TabletServer, it will then have many Ranges, binned by Tablet 
>> > (KeyExtent, to be precise). This will spawn a 
>> > org.apache.accumulo.tserver.s

Re: Accumulo Seek performance

2016-08-25 Thread Sven Hodapp
Hi,

I've changed the code a little bit, so that it uses a thread pool (via the 
Future):

val ranges500 = ranges.asScala.grouped(500)  // this means 6 BatchScanners 
will be created

for (ranges <- ranges500) {
  val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
  bscan.setRanges(ranges.asJava)
  Future {
time("mult-scanner") {
  bscan.asScala.toList  // toList forces the iteration of the iterator
}
  }
}

Here are the results:

background log: info: mult-scanner time: 4807.289358 ms
background log: info: mult-scanner time: 4930.996522 ms
background log: info: mult-scanner time: 9510.010808 ms
background log: info: mult-scanner time: 11394.152391 ms
background log: info: mult-scanner time: 13297.247295 ms
background log: info: mult-scanner time: 14032.704837 ms

background log: info: single-scanner time: 15322.624393 ms

Every Future completes independent, but in return every batch scanner iterator 
needs more time to complete. :(
This means the batch scanners aren't really processed in parallel on the server 
side?
Should I reconfigure something? Maybe the tablet servers haven't/can't allocate 
enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory 
and a storage with ~300MB/s...)

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de

- Ursprüngliche Mail -
> Von: "Josh Elser" <josh.el...@gmail.com>
> An: "user" <user@accumulo.apache.org>
> Gesendet: Mittwoch, 24. August 2016 18:36:42
> Betreff: Re: Accumulo Seek performance

> Ahh duh. Bad advice from me in the first place :)
> 
> Throw 'em in a threadpool locally.
> 
> dlmar...@comcast.net wrote:
>> Doesn't this use the 6 batch scanners serially?
>>
>> 
>> *From: *"Sven Hodapp" <sven.hod...@scai.fraunhofer.de>
>> *To: *"user" <user@accumulo.apache.org>
>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM
>> *Subject: *Re: Accumulo Seek performance
>>
>> Hi Josh,
>>
>> thanks for your reply!
>>
>> I've tested your suggestion with a implementation like that:
>>
>> val ranges500 = ranges.asScala.grouped(500) // this means 6
>> BatchScanners will be created
>>
>> time("mult-scanner") {
>> for (ranges <- ranges500) {
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
>> bscan.setRanges(ranges.asJava)
>> for (entry <- bscan.asScala) yield {
>> entry.getKey()
>> }
>> }
>> }
>>
>> And the result is a bit disappointing:
>>
>> background log: info: mult-scanner time: 18064.969281 ms
>> background log: info: single-scanner time: 6527.482383 ms
>>
>> I'm doing something wrong here?
>>
>>
>> Regards,
>> Sven
>>
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> sven.hod...@scai.fraunhofer.de
>> www.scai.fraunhofer.de
>>
>> - Ursprüngliche Mail -
>>  > Von: "Josh Elser" <josh.el...@gmail.com>
>>  > An: "user" <user@accumulo.apache.org>
>>  > Gesendet: Mittwoch, 24. August 2016 16:33:37
>>  > Betreff: Re: Accumulo Seek performance
>>
>>  > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
>>  >
>>  > I don't feel like 3000 ranges is too many, but this isn't quantitative.
>>  >
>>  > IIRC, the BatchScanner will take each Range you provide, bin each Range
>>  > to the TabletServer(s) currently hosting the corresponding data, clip
>>  > (truncate) each Range to match the Tablet boundaries, and then does an
>>  > RPC to each TabletServer with just the Ranges hosted there.
>>  >
>>  > Inside the TabletServer, it will then have many Ranges, binned by Tablet
>>  > (KeyExtent, to be precise). This will spawn a
>>  > org.apache.accumulo.tserver.scan.LookupTask will will start collecting
>>  > results to send back to the client.
>>  >
>>  > The caveat here is that those ranges are processed serially on a
>>  > TabletServer. Maybe, you're swamping one TabletServer with lots of
>>  > Ranges that it could be processing in parallel.
>>  >
>>  > Could you experiment with using multiple BatchSc

Re: Accumulo Seek performance

2016-08-24 Thread dlmarion
Doesn't this use the 6 batch scanners serially? 

- Original Message -

From: "Sven Hodapp" <sven.hod...@scai.fraunhofer.de> 
To: "user" <user@accumulo.apache.org> 
Sent: Wednesday, August 24, 2016 11:56:14 AM 
Subject: Re: Accumulo Seek performance 

Hi Josh, 

thanks for your reply! 

I've tested your suggestion with a implementation like that: 

val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will 
be created 

time("mult-scanner") { 
for (ranges <- ranges500) { 
val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1) 
bscan.setRanges(ranges.asJava) 
for (entry <- bscan.asScala) yield { 
entry.getKey() 
} 
} 
} 

And the result is a bit disappointing: 

background log: info: mult-scanner time: 18064.969281 ms 
background log: info: single-scanner time: 6527.482383 ms 

I'm doing something wrong here? 


Regards, 
Sven 

-- 
Sven Hodapp, M.Sc., 
Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
Department of Bioinformatics 
Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
sven.hod...@scai.fraunhofer.de 
www.scai.fraunhofer.de 

- Ursprüngliche Mail - 
> Von: "Josh Elser" <josh.el...@gmail.com> 
> An: "user" <user@accumulo.apache.org> 
> Gesendet: Mittwoch, 24. August 2016 16:33:37 
> Betreff: Re: Accumulo Seek performance 

> This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710 
> 
> I don't feel like 3000 ranges is too many, but this isn't quantitative. 
> 
> IIRC, the BatchScanner will take each Range you provide, bin each Range 
> to the TabletServer(s) currently hosting the corresponding data, clip 
> (truncate) each Range to match the Tablet boundaries, and then does an 
> RPC to each TabletServer with just the Ranges hosted there. 
> 
> Inside the TabletServer, it will then have many Ranges, binned by Tablet 
> (KeyExtent, to be precise). This will spawn a 
> org.apache.accumulo.tserver.scan.LookupTask will will start collecting 
> results to send back to the client. 
> 
> The caveat here is that those ranges are processed serially on a 
> TabletServer. Maybe, you're swamping one TabletServer with lots of 
> Ranges that it could be processing in parallel. 
> 
> Could you experiment with using multiple BatchScanners and something 
> like Guava's Iterables.concat to make it appear like one Iterator? 
> 
> I'm curious if we should put an optimization into the BatchScanner 
> itself to limit the number of ranges we send in one RPC to a 
> TabletServer (e.g. one BatchScanner might open multiple 
> MultiScanSessions to a TabletServer). 
> 
> Sven Hodapp wrote: 
>> Hi there, 
>> 
>> currently we're experimenting with a two node Accumulo cluster (two tablet 
>> servers) setup for document storage. 
>> This documents are decomposed up to the sentence level. 
>> 
>> Now I'm using a BatchScanner to assemble the full document like this: 
>> 
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS 
>> table 
>> currently hosts ~30GB data, ~200M entries on ~45 tablets 
>> bscan.setRanges(ranges) // there are like 3000 Range.exact's in the 
>> ranges-list 
>> for (entry<- bscan.asScala) yield { 
>> val key = entry.getKey() 
>> val value = entry.getValue() 
>> // etc. 
>> } 
>> 
>> For larger full documents (e.g. 3000 exact ranges), this operation will take 
>> about 12 seconds. 
>> But shorter documents are assembled blazing fast... 
>> 
>> Is that to much for a BatchScanner / I'm misusing the BatchScaner? 
>> Is that a normal time for such a (seek) operation? 
>> Can I do something to get a better seek performance? 
>> 
>> Note: I have already enabled bloom filtering on that table. 
>> 
>> Thank you for any advice! 
>> 
>> Regards, 
>> Sven 



Re: Accumulo Seek performance

2016-08-24 Thread Sven Hodapp
Hi Josh,

thanks for your reply!

I've tested your suggestion with a implementation like that:

val ranges500 = ranges.asScala.grouped(500)  // this means 6 BatchScanners 
will be created

time("mult-scanner") {
  for (ranges <- ranges500) {
val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
bscan.setRanges(ranges.asJava)
for (entry <- bscan.asScala) yield {
  entry.getKey()
}
  }
}

And the result is a bit disappointing:

background log: info: mult-scanner time: 18064.969281 ms
background log: info: single-scanner time: 6527.482383 ms

I'm doing something wrong here?


Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de

- Ursprüngliche Mail -
> Von: "Josh Elser" <josh.el...@gmail.com>
> An: "user" <user@accumulo.apache.org>
> Gesendet: Mittwoch, 24. August 2016 16:33:37
> Betreff: Re: Accumulo Seek performance

> This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
> 
> I don't feel like 3000 ranges is too many, but this isn't quantitative.
> 
> IIRC, the BatchScanner will take each Range you provide, bin each Range
> to the TabletServer(s) currently hosting the corresponding data, clip
> (truncate) each Range to match the Tablet boundaries, and then does an
> RPC to each TabletServer with just the Ranges hosted there.
> 
> Inside the TabletServer, it will then have many Ranges, binned by Tablet
> (KeyExtent, to be precise). This will spawn a
> org.apache.accumulo.tserver.scan.LookupTask will will start collecting
> results to send back to the client.
> 
> The caveat here is that those ranges are processed serially on a
> TabletServer. Maybe, you're swamping one TabletServer with lots of
> Ranges that it could be processing in parallel.
> 
> Could you experiment with using multiple BatchScanners and something
> like Guava's Iterables.concat to make it appear like one Iterator?
> 
> I'm curious if we should put an optimization into the BatchScanner
> itself to limit the number of ranges we send in one RPC to a
> TabletServer (e.g. one BatchScanner might open multiple
> MultiScanSessions to a TabletServer).
> 
> Sven Hodapp wrote:
>> Hi there,
>>
>> currently we're experimenting with a two node Accumulo cluster (two tablet
>> servers) setup for document storage.
>> This documents are decomposed up to the sentence level.
>>
>> Now I'm using a BatchScanner to assemble the full document like this:
>>
>>  val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // 
>> ARTIFACTS table
>>  currently hosts ~30GB data, ~200M entries on ~45 tablets
>>  bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the 
>> ranges-list
>>for (entry<- bscan.asScala) yield {
>>  val key = entry.getKey()
>>  val value = entry.getValue()
>>  // etc.
>>}
>>
>> For larger full documents (e.g. 3000 exact ranges), this operation will take
>> about 12 seconds.
>> But shorter documents are assembled blazing fast...
>>
>> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
>> Is that a normal time for such a (seek) operation?
>> Can I do something to get a better seek performance?
>>
>> Note: I have already enabled bloom filtering on that table.
>>
>> Thank you for any advice!
>>
>> Regards,
>> Sven


Re: Accumulo Seek performance

2016-08-24 Thread Josh Elser

This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710

I don't feel like 3000 ranges is too many, but this isn't quantitative.

IIRC, the BatchScanner will take each Range you provide, bin each Range 
to the TabletServer(s) currently hosting the corresponding data, clip 
(truncate) each Range to match the Tablet boundaries, and then does an 
RPC to each TabletServer with just the Ranges hosted there.


Inside the TabletServer, it will then have many Ranges, binned by Tablet 
(KeyExtent, to be precise). This will spawn a 
org.apache.accumulo.tserver.scan.LookupTask will will start collecting 
results to send back to the client.


The caveat here is that those ranges are processed serially on a 
TabletServer. Maybe, you're swamping one TabletServer with lots of 
Ranges that it could be processing in parallel.


Could you experiment with using multiple BatchScanners and something 
like Guava's Iterables.concat to make it appear like one Iterator?


I'm curious if we should put an optimization into the BatchScanner 
itself to limit the number of ranges we send in one RPC to a 
TabletServer (e.g. one BatchScanner might open multiple 
MultiScanSessions to a TabletServer).


Sven Hodapp wrote:

Hi there,

currently we're experimenting with a two node Accumulo cluster (two tablet 
servers) setup for document storage.
This documents are decomposed up to the sentence level.

Now I'm using a BatchScanner to assemble the full document like this:

 val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS 
table currently hosts ~30GB data, ~200M entries on ~45 tablets
 bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the 
ranges-list
   for (entry<- bscan.asScala) yield {
 val key = entry.getKey()
 val value = entry.getValue()
 // etc.
   }

For larger full documents (e.g. 3000 exact ranges), this operation will take 
about 12 seconds.
But shorter documents are assembled blazing fast...

Is that to much for a BatchScanner / I'm misusing the BatchScaner?
Is that a normal time for such a (seek) operation?
Can I do something to get a better seek performance?

Note: I have already enabled bloom filtering on that table.

Thank you for any advice!

Regards,
Sven



Accumulo Seek performance

2016-08-24 Thread Sven Hodapp
Hi there,

currently we're experimenting with a two node Accumulo cluster (two tablet 
servers) setup for document storage.
This documents are decomposed up to the sentence level.

Now I'm using a BatchScanner to assemble the full document like this:

val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS 
table currently hosts ~30GB data, ~200M entries on ~45 tablets 
bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the 
ranges-list
  for (entry <- bscan.asScala) yield {
val key = entry.getKey()
val value = entry.getValue()
// etc.
  }

For larger full documents (e.g. 3000 exact ranges), this operation will take 
about 12 seconds.
But shorter documents are assembled blazing fast...

Is that to much for a BatchScanner / I'm misusing the BatchScaner?
Is that a normal time for such a (seek) operation?
Can I do something to get a better seek performance?

Note: I have already enabled bloom filtering on that table.

Thank you for any advice!

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hod...@scai.fraunhofer.de
www.scai.fraunhofer.de