sanity checking application WALogs make sense

Sukant Hajra Fri, 14 Sep 2012 22:44:52 -0700

Hi guys,

We've been slowing inching towards using iterators more effectively.  The
typical use case of indexed docs fit one of our needs and we wrote a prototype
for it.


We've recently realized that iterators are not just read-only, and that we can
get more data-local functionality by taking advantage of their ability to
mutate data as well.  We've only begun to think more of how this may assist us.
A /lot/ of our critical data-accesses are slightly complex, but local to one
row.  We have billions of entities in our system, so a simple bijection of
entities to rows works our really well for us with respect to iterators.

Up to this point, we've had an planned architecture that uses Kestrel for WALog
and a messaging system like Akka pipelining work.  Akka would help us manage
flowing work from the user to the log and from the log to orchestrations of
Accumulo intra-row reads and writes.  The log just helps us get some faster
response time without sacrificing too much reliability.

Recently someone asked why use our own WALog when Accumulo has one natively in
HDFS.  My response has been that Accumulo's WALog is at a lower level of
granularity of mutations.  We want reliable orchestrations of mutations.  Our
orchestrations are idempotent, but we want something long the lines of
at-least-once delivery for the entire orchestration.  If an iterator goes down
mid-processing, I fear Accumulo's native WALog is insufficient to claim we have
a reliable enough system.

I could definitely go through source code to validate this opinion, but I
thought I'd bounce this reasoning off the list first.

Also, I'm sure we're not the only people using Accumulo in this way.  Please
feel to advise us if anyone's got other ideas for an architecture or feels
we're thinking about the problem backwards.

Thanks for your input,
Sukant

sanity checking application WALogs make sense

Reply via email to