Thanks for taking the time to write this up, Dylan.
I'm a little worried about the RemoteWriteIterator. Using a BatchWriter
implies that you'll need some sort of resource management - both
ensuring that the BatchWriter is close()'ed whenever a
compaction/procedure ends and handling rejected mutations. Have you put
any thought into how you would address these?
I'm not familiar enough with the internals anymore, but I remember that
I had some pains trying to write to another table during compactions
when I was working on replication. I think as long as it's not triggered
off of the metadata table, it wouldn't have any deadlock issues.
Architecturally, it's a little worrisome, because it feels a bit like
using a wrench as a hammer -- iterators are great for performing some
passing computation, but not really for doing some arbitrary
read/writes. It gets back to how Accumulo/HBase comparisons where people
try to compare Iterators and Coprocessors. They can sometimes do the
same thing, but they're definitely different features.
Anyways, I need to stew on it some more and give it a few more reads.
Thanks again for sharing!
Dylan Hutchison wrote:
Hello all,
As promised
<https://mail-archives.apache.org/mod_mbox/accumulo-user/201502.mbox/%3CCAPx%3DJkakO3ice7vbH%2BeUo%2B6AP1JPebVbTDu%2Bg71KV8SvQ4J9WA%40mail.gmail.com%3E>,
here is a design doc open for comments on implementing server-side
computation in Accumulo.
https://github.com/Accla/accumulo_stored_procedure_design
Would love to hear your opinion, especially if the proposed design
pattern matches one of /your use cases/.
Regards,
Dylan Hutchison