Hi again,

I am curious as to how Accumulo handles multiple threads in a Batchscanner, and 
what its ramifications are for memory use on a node.

Let's say I start a Batchscanner with 20 threads, and scan across the entire 
range of rows in a table of 80 tablets, spread across 4 nodes. Will the 
Batchscanner try to spin off 20 threads if possible, or will it try to match it 
to the number of nodes? Should I try to match the number of threads with the 
number of cores that will be working on the data?

When a thread is spun off, my thinking is that the tablet that the thread is 
spun off on will move the entire tablet to memory, and then the tablet will be 
iterated through. Is this how it typically happens (or is there possibly 
multiple threads on the same tablet)? If so, do I have to worry about memory 
issues if, say, one of the nodes tries to move 10 tablets into memory, but 
doesn't have 20 GB of RAM left to store it?

Sorry for the vagueness of the questions, but I'm trying to understand how the 
general process works under the covers, in order to diagnose some performance 
issues I have been having.

Thanks,
David

Reply via email to