Hi Jeff,
Not a rookie question at all. This is an area in the API where we know
we could make the lifecycle more obvious. We have a ticket somewhere for it.
If you're using a single user/password to connect to Accumulo (not using
special accounts per your QSL client), there's no reason you can't reuse
Connectors. The number of Connectors you want to cache is likely
relative to the concurrent user load of your service.
The fun part here is that each Connector retains a reference to the
Instance which it uses internally. There are synchronized calls inside
each ZooKeeperInstance which may start to degrade when you get above
maybe 50 concurrent threads accessing it (ballpark guess).
You also do not want to create a new ZooKeeperInstance for every request
as you're doing now as I believe it will cause you some issues in Java
heap due to some nitty-gritty ZooKeeper details (ask if you're actually
curious).
In summary, definitely cache ZooKeeperInstances, but use some number
relative to the number of users. Connectors can be cached too, but share
Instances under the hoods. Using HTTP benchmarking tools with various
client pool sizes like JMeter should help you balance out these numbers.
Hope this helps.
- Josh
On 5/19/14, 10:29 PM, Jeff Schwartz wrote:
Rookie Question... I've built a Query Service Layer (QSL) according to
the documentation from the Accumulo v1.6.0 User Manual. My question is
how often should I be getting a Zoo Keeper Instance and Connector to
accumulo. For example, here's some psuedo code for a typical service in
my QSL.
public void readTable(...) {
Instance instance = new ZooKeeperInstance(accumuloInstanceName,
zooServers);
Connector connector = instance.getConnector(username, passwordToken);
Scanner scanner = connector.getScanner(tableName, auths);
Scanner.setRange(range);
for (Map.Entry<Key,Value> entry : scanner) {
...
}
scanner.close();
}
If I do these lines of code for every call in my restful service, then I
feel like that is generating a lot of extra connections to both
zookeeper and accumulo. Additionally, I would assume that that will
have a negative impact on performance. Should I cache any connectors or
ZooKeeper instances?
Any suggestions or best practices would be greatly appreciated.
Thanks in advance.
Sincerely,
Jeff Schwartz