On Wed, May 16, 2012 at 12:15 AM, Sukant Hajra <[email protected]> wrote: > Hi, > > There's a couple of sanity checks I wanted to run by the list: > > 1. I see in the documentation that mutations may be partially read unless > using IsolatedScanners, which is a way to have atomicity for applications. > Is there any other mechanism for atomic operations to know about?
For the batch scanner take a look at the WholeRowIterator and the batch scanner java docs. > > 2. I'm assuming that a flushed write to a row is not guaranteed to be > sensed by a subsequent read (no immediate consistency). Is this correct? After a call to flush() on a batchwriter returns, any mutations written before the call to flush should be immediately visible. > > 3. When using a BatchWriter does the order in which mutations are added > make any reliable assertion on the order that these mutations are sensed by > subsequent reads? Given two mutations A and B, I'd like to assert that any > node sensing B will also sense A. No, the order does not matter. The batch writer will have multiple background threads writing mutations to different tablet servers. So the mutations will become visible at different times irrespective of the order you add them. For the A and B case, you could write both mutations and then call flush. After the flush, both will be visible. However during the flush operation one may be visible and the other not visible. > > 4. I'm going to have a long standing thread doing batch writing. Is it > reasonable/safe to give this thread an open BatchWriter (making sure to > close the writer when shutting down the thread)? Or might this cause a > memory leak? When you close a batchwrite it flushes any data it has in memory and shuts down its thread pool. > > 5. I'm assuming that BatchWriter is minimally blocking. Is there any merit > to or precedent of load balancing across multiple writers? Or would that > be redundant to optimizations already built into BatchWriter? Its safe for multiple threads to use one batchwriter. This may be more optimal up to the point were there are so many threads that it causes lock contention. The nice thing about having multiple threads share one batch writer is that the background threads sending data to tablet severs will presumably have larger batches. This should result in less network round trips. It also allows large batches for the write ahead log on the server side. Write ahead log batching should be less of a concern in 1.5 w/ group commit. > > Thanks a lot for helping me better understand Accumulo. Feel free to point me > to documentation I might have missed. > > -Sukant
