I tried to design the replication implementation to be relatively flexible in what the
act of "replication" actually looks like. In short, you write an implementation
that will get some context information about new information in the system that will be
run.
https://github.com/apache/accumulo/blob/3107627b778e3093d95777a4313277305cd0aaa2/core/src/main/java/org/apache/accumulo/core/client/replication/ReplicaSystem.java
Be aware though, this isn't a substitute for a trigger, and may not actually meet your
needs of "realtime". By default, it would be order of minutes before your
implementation would be triggered. You could tweak some configuration parameters down to
10's of seconds, but you would incur some more load by repeatedly scanning the Accumulo
replication table.
If you just want notification of *any* data being written to a table, I think
you could do this pretty easily. Inspecting the new data that has arrived and
make some data-aware notification would be more difficult but likely still
feasible.
D P wrote:
The lily indexer/SEP is really interesting. Thanks for both of your posts
On Wed, Oct 22, 2014 at 2:07 PM, Sean Busbey <[email protected]
<mailto:[email protected]>> wrote:
the way this gets done in HBase, i.e. for the HBase Lily
Indexer[1], is to add a replication consumer that isn't an actual
cluster. IMHO, you'd be better off taking that kind of approach
rather than trying to consume the WALs off of HDFS. I haven't
attempted to use our replication interface for this yet, but in
principle it should work.
Note that either of these approaches are going to be very fragile
across Accumulo versions because they aren't interfaces intended
for consumption.
[1]: http://ngdata.github.io/hbase-indexer/
On Wed, Oct 22, 2014 at 12:59 PM, D P <[email protected]
<mailto:[email protected]>> wrote:
I am working with Accumulo and looking for the best means of
knowing when something has been updated/inserted into my
Accumulo instance. For instance, every-time data is inserted,
how can I know externally? If the write-ahead log file stores
this, is it best to just read the HDFS WAL log with a storm
spout to know when something has been inserted into a table?
I am planning to do some real-time visualization with
accumulo, but when data is inserted I want to be able to
notify my UI.
Thanks!
--
Sean