Re: Accumulo Seek performance

Sven Hodapp Wed, 31 Aug 2016 00:06:54 -0700

Hi Keith,

I've tried it with 1, 2 or 10 threads. Unfortunately there where no amazing 
differences.
Maybe it's a problem with the table structure? For example it may happen that 
one row id (e.g. a sentence) has several thousand column families. Can this 
affect the seek performance?


So for my initial example it has about 3000 row ids to seek, which will return 
about 500k entries. If I filter for specific column families (e.g. a document 
without annotations) it will return about 5k entries, but the seek time will 
only be halved.
Are there to much column families to seek it fast?

Thanks!

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
[email protected]
www.scai.fraunhofer.de

----- Ursprüngliche Mail -----
> Von: "Keith Turner" <[email protected]>
> An: "user" <[email protected]>
> Gesendet: Montag, 29. August 2016 22:37:32
> Betreff: Re: Accumulo Seek performance

> On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
> <[email protected]> wrote:
>> Hi there,
>>
>> currently we're experimenting with a two node Accumulo cluster (two tablet
>> servers) setup for document storage.
>> This documents are decomposed up to the sentence level.
>>
>> Now I'm using a BatchScanner to assemble the full document like this:
>>
>>     val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // 
>> ARTIFACTS table
>>     currently hosts ~30GB data, ~200M entries on ~45 tablets
>>     bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the 
>> ranges-list
>>       for (entry <- bscan.asScala) yield {
>>         val key = entry.getKey()
>>         val value = entry.getValue()
>>         // etc.
>>       }
>>
>> For larger full documents (e.g. 3000 exact ranges), this operation will take
>> about 12 seconds.
>> But shorter documents are assembled blazing fast...
>>
>> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
>> Is that a normal time for such a (seek) operation?
>> Can I do something to get a better seek performance?
> 
> How many threads did you configure the batch scanner with and did you
> try varying this?
> 
>>
>> Note: I have already enabled bloom filtering on that table.
>>
>> Thank you for any advice!
>>
>> Regards,
>> Sven
>>
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> [email protected]
> > www.scai.fraunhofer.de

Re: Accumulo Seek performance

Reply via email to