On Thu, Mar 21, 2013 at 4:02 PM, Slater, David M. <[email protected]> wrote: > Thanks Keith, that was very helpful. > > As for your comment "Multiple threads can scan a tablet concurrently", is > there any way to force a BatchScanner to run at most one thread on a tablet, > or to have it give the entire tablet range [a, c) to an iterator instead of > breaking it up into [a, b) and [b, c) for different iterators on the same > tablet?
A batch scanner will not use more than one thread to scan an individual tablet. I was just responding to your question asking if multiple threads can scan a tablet. If there are multiple scanners and batch scanner, then you could have multiple threads scanning a tablet. > > If it is not designed to operate that way, are there methods in > TabletServerBatchReader that would make sense to extend in order to add that > functionality? > > Best regards, > David > > -----Original Message----- > From: Keith Turner [mailto:[email protected]] > Sent: Friday, March 15, 2013 3:24 PM > To: [email protected] > Subject: Re: Batchscanner and Tablet Memory > > On Fri, Mar 15, 2013 at 3:08 PM, Slater, David M. > <[email protected]> wrote: >> Hi again, >> >> >> >> I am curious as to how Accumulo handles multiple threads in a >> Batchscanner, and what its ramifications are for memory use on a node. >> >> >> >> Let's say I start a Batchscanner with 20 threads, and scan across the >> entire range of rows in a table of 80 tablets, spread across 4 nodes. >> Will the Batchscanner try to spin off 20 threads if possible, or will >> it try to match it to the number of nodes? Should I try to match the >> number of threads with the number of cores that will be working on the data? >> >> > > When the batch scanner has more threads than nodes, it will run > multiple scans on each node. It will only do this for nodes where it > has multiple tablets to scan. So in your example I think it may run > 20/4=5 scans on each node. Each scan would access 80/20=4 tablets. > >> >> When a thread is spun off, my thinking is that the tablet that the >> thread is spun off on will move the entire tablet to memory, and then >> the tablet will be iterated through. Is this how it typically happens >> (or is there possibly multiple threads on the same tablet)? If so, do >> I have to worry about memory issues if, say, one of the nodes tries to >> move 10 tablets into memory, but doesn't have 20 GB of RAM left to store it? > > Entire tablets are not loaded into memory when you scan a tablet. > Tablets are composed of rfiles. RFiles are composed of blocks of key values. > So only a few of these key/blocks from rfiles are loaded at any given time. > It possible that these RFile blocks may be cached in the tablet server > process depending on your configuration. > > Multiple threads can scan a tablet concurrently. > >> >> >> >> Sorry for the vagueness of the questions, but I'm trying to understand >> how the general process works under the covers, in order to diagnose >> some performance issues I have been having. >> >> >> >> Thanks, >> David
