Mario,
Are you using a Scanner or a BatchScanner?
One thing we did in the past with a geohash-based schema was to prefix a
shard ID in front of the geohash that allows you to involve all the
tservers in the scan. You'd multiply your ranges by the number of
tservers you have but if the client is not the bottleneck then it may
increase your throughput.
Andrew
On 04/10/2016 11:05 AM, Mario Pastorelli wrote:
Hi,
I'm currently having some scan speed issues with Accumulo and I would
like to understand why and how can I solve it. I have geographical
data and I use as primary key the day and then the geohex, which is a
linearisation of lat and lon. The reason for this key is that I always
query the data for one day but for a set of geohexes with represent a
zone, so with this schema I can scan use a single scan to read all the
data for one day with few seeks. My problem is that the scan is
painfully slow: for instance, to read 5617019 rows it takes around 17
seconds and the scan speed is 13MB/s, less than 750k scan entries/s
and around 300 seeks. I enable the tracer and this is what I've got
17325+0 Dice@srv1 Dice.query
11+1 Dice@srv1 scan 11+1 Dice@srv1 scan:location
5+13 Dice@srv1 scan 5+13 Dice@srv1 scan:location
4+19 Dice@srv1 scan 4+19 Dice@srv1 scan:location
5+23 Dice@srv1 scan 4+24 Dice@srv1 scan:location
I'm not sure how to speedup the scanning. I have the following question:
- is this speed normal?
- can I involve more servers in the scan? Right now only two server
have the ranges but with a cluster of 15 machines it would be nice to
involve more of them. Is it possible?
Thanks,
Mario
--
Mario Pastorelli| TERALYTICS
*software engineer*
Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone:+41794381682
email: [email protected]
<mailto:[email protected]>
www.teralytics.net <http://www.teralytics.net/>
Company registration number: CH-020.3.037.709-7 | Trade register
Canton Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
Yann de Vries
This e-mail message contains confidential information which is for the
sole attention and use of the intended recipient. Please notify us at
once if you think that it may not be intended for you and delete it
immediately.