The tserver.readahead.concurrent.max property provides an upper-bound on the number of scans that will start "reading ahead". This read-ahead is a performance tweak that tries to smooth the I/O cost of reading from files. However, each readahead thread does increase the amount of heap used as the data that was read is stored in memory. This parameter lets you provide a maximum amount of space that will be used by readahead across *all* scan tasks (from a Scanner, BatchScanner or even major compactions) for a tablet server.

The tserver.scan.files.open.max property provides you with control over the upper-bound of the number of files for scanning that a tablet server (across all tablets hosted by that tablet server) can open. Again, as holding these files open, this parameter is meant to allow you to place an upper bound on the memory consumption used by opening files.

Now, the number of threads that a batchscanner uses is what's primarily going to control your "server side parallelism". When you provide a value of N to the batchscanner "threads", you will get up to N "scan tasks" running concurrently against your Accumulo instance. The two previously described properties will only act to limit the number of resources that your single batchscanner (in the view of all active batchscanners) can consume.

In situations with multiple clients reading from an Accumulo instance, you may run into cases where a scan task (one thread from your BatchScanner) is blocked until the tabletserver finishes a previous read and thus frees additional resources (number of open files or readahead threads) to satisfy your scan request.

Hope that helps.

On 2/7/14, 3:19 PM, Anthony F wrote:
How do the config variables tserver.readahead.concurrent.max and
tserver.scan.files.open.max interact with BatchScanner threads requested
from the Connector?  I have tserver.readahead.concurrent.max set to 64
and tserver.scan.files.open.max set to 100.  However, unless I bump up
the number of BatchScanner threads, I don't see much tserver side
parallelism.  If I bump up the number of BatchScanner threads, then I
can see multiple scans per tserver.  What governs the number of tserver
side threads used to execute a scan and what prevents too many threads
from spinning up to service multiple concurrent scans from independent
clients?

Reply via email to