Hi there,
currently we're experimenting with a two node Accumulo cluster (two tablet
servers) setup for document storage.
This documents are decomposed up to the sentence level.
Now I'm using a BatchScanner to assemble the full document like this:
val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS
table currently hosts ~30GB data, ~200M entries on ~45 tablets
bscan.setRanges(ranges) // there are like 3000 Range.exact's in the
ranges-list
for (entry <- bscan.asScala) yield {
val key = entry.getKey()
val value = entry.getValue()
// etc.
}
For larger full documents (e.g. 3000 exact ranges), this operation will take
about 12 seconds.
But shorter documents are assembled blazing fast...
Is that to much for a BatchScanner / I'm misusing the BatchScaner?
Is that a normal time for such a (seek) operation?
Can I do something to get a better seek performance?
Note: I have already enabled bloom filtering on that table.
Thank you for any advice!
Regards,
Sven
--
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
[email protected]
www.scai.fraunhofer.de