I tried to design the replication implementation to be relatively flexible in what the 
act of "replication" actually looks like. In short, you write an implementation 
that will get some context information about new information in the system that will be 
run.

https://github.com/apache/accumulo/blob/3107627b778e3093d95777a4313277305cd0aaa2/core/src/main/java/org/apache/accumulo/core/client/replication/ReplicaSystem.java

Be aware though, this isn't a substitute for a trigger, and may not actually meet your 
needs of "realtime". By default, it would be order of minutes before your 
implementation would be triggered. You could tweak some configuration parameters down to 
10's of seconds, but you would incur some more load by repeatedly scanning the Accumulo 
replication table.

If you just want notification of *any* data being written to a table, I think 
you could do this pretty easily. Inspecting the new data that has arrived and 
make some data-aware notification would be more difficult but likely still 
feasible.

D P wrote:
The lily indexer/SEP is really interesting.  Thanks for both of your posts

On Wed, Oct 22, 2014 at 2:07 PM, Sean Busbey <[email protected] <mailto:[email protected]>> wrote:

    the way this gets done in HBase, i.e. for the HBase Lily
    Indexer[1], is to add a replication consumer that isn't an actual
    cluster. IMHO, you'd be better off taking that kind of approach
    rather than trying to consume the WALs off of HDFS. I haven't
    attempted to use our replication interface for this yet, but in
    principle it should work.

    Note that either of these approaches are going to be very fragile
    across Accumulo versions because they aren't interfaces intended
    for consumption.

    [1]: http://ngdata.github.io/hbase-indexer/

    On Wed, Oct 22, 2014 at 12:59 PM, D P <[email protected]
    <mailto:[email protected]>> wrote:

        I am working with Accumulo and looking for the best means of
        knowing when something has been updated/inserted into my
        Accumulo instance.  For instance, every-time data is inserted,
        how can I know externally?  If the write-ahead log file stores
        this, is it best to just read the HDFS WAL log with a storm
        spout to know when something has been inserted into a table?

        I am planning to do some real-time visualization with
        accumulo, but when data is inserted I want to be able to
        notify my UI.

        Thanks!




-- Sean


Reply via email to