Hi guys, We've been slowing inching towards using iterators more effectively. The typical use case of indexed docs fit one of our needs and we wrote a prototype for it.
We've recently realized that iterators are not just read-only, and that we can get more data-local functionality by taking advantage of their ability to mutate data as well. We've only begun to think more of how this may assist us. A /lot/ of our critical data-accesses are slightly complex, but local to one row. We have billions of entities in our system, so a simple bijection of entities to rows works our really well for us with respect to iterators. Up to this point, we've had an planned architecture that uses Kestrel for WALog and a messaging system like Akka pipelining work. Akka would help us manage flowing work from the user to the log and from the log to orchestrations of Accumulo intra-row reads and writes. The log just helps us get some faster response time without sacrificing too much reliability. Recently someone asked why use our own WALog when Accumulo has one natively in HDFS. My response has been that Accumulo's WALog is at a lower level of granularity of mutations. We want reliable orchestrations of mutations. Our orchestrations are idempotent, but we want something long the lines of at-least-once delivery for the entire orchestration. If an iterator goes down mid-processing, I fear Accumulo's native WALog is insufficient to claim we have a reliable enough system. I could definitely go through source code to validate this opinion, but I thought I'd bounce this reasoning off the list first. Also, I'm sure we're not the only people using Accumulo in this way. Please feel to advise us if anyone's got other ideas for an architecture or feels we're thinking about the problem backwards. Thanks for your input, Sukant
