Hi Jeff,

Not a rookie question at all. This is an area in the API where we know we could make the lifecycle more obvious. We have a ticket somewhere for it.

If you're using a single user/password to connect to Accumulo (not using special accounts per your QSL client), there's no reason you can't reuse Connectors. The number of Connectors you want to cache is likely relative to the concurrent user load of your service.

The fun part here is that each Connector retains a reference to the Instance which it uses internally. There are synchronized calls inside each ZooKeeperInstance which may start to degrade when you get above maybe 50 concurrent threads accessing it (ballpark guess).

You also do not want to create a new ZooKeeperInstance for every request as you're doing now as I believe it will cause you some issues in Java heap due to some nitty-gritty ZooKeeper details (ask if you're actually curious).

In summary, definitely cache ZooKeeperInstances, but use some number relative to the number of users. Connectors can be cached too, but share Instances under the hoods. Using HTTP benchmarking tools with various client pool sizes like JMeter should help you balance out these numbers.

Hope this helps.

- Josh

On 5/19/14, 10:29 PM, Jeff Schwartz wrote:
Rookie Question...  I've built a Query Service Layer (QSL) according to
the documentation from the Accumulo v1.6.0 User Manual.  My question is
how often should I be getting a Zoo Keeper Instance and Connector to
accumulo.  For example, here's some psuedo code for a typical service in
my QSL.

public void readTable(...) {
     Instance instance = new ZooKeeperInstance(accumuloInstanceName,
zooServers);
     Connector connector = instance.getConnector(username, passwordToken);
     Scanner scanner = connector.getScanner(tableName, auths);
     Scanner.setRange(range);
     for (Map.Entry<Key,Value> entry : scanner) {
       ...
     }
     scanner.close();
}

If I do these lines of code for every call in my restful service, then I
feel like that is generating a lot of extra connections to both
zookeeper and accumulo.  Additionally, I would assume that that will
have a negative impact on performance.  Should I cache any connectors or
ZooKeeper instances?

Any suggestions or best practices would be greatly appreciated.

Thanks in advance.

Sincerely,
Jeff Schwartz

Reply via email to