Thanks, Josh.  I think the main pain-point is that replication doesn't
occur until the WAL is closed.  We've made some aggressive configuration
changes to Accumulo to reduce the WAL time rollover and minor compaction
frequency to force replication to go faster.  It is down to around 20
minutes or so on our production clusters, but we are kind of at our limit
-- Accumulo is spending a lot more time doing bookkeeping tasks and it is
starting to affect our query performance.

My initial thoughts are to increase the replication parallelism and start
replicating the WAL before it is closed (I see a few JIRAs open already
that mention these things), but I haven't done enough digging in the code
base to see what is really available.

Are you free for a bit in the near future to meet up for a bit and talk
replication?  I'll buy lunch!

Cheers,
--Adam

On Wed, Feb 15, 2017 at 2:52 PM, Josh Elser <josh.el...@gmail.com> wrote:

> Hi Adam,
>
> I'm not presently working on anything (too many irons in other fires), but
> I'd be happy to help work through a design doc for improvements.
>
> Do you have a list of pain-points which are the primary causes of latency?
> That would help in identifying the changes to make and how best to
> implement them.
>
> - Josh
>
>
> Adam J. Shook wrote:
>
>> I'm currently scoping what it would take to improve the latency in the
>> replication feature of Accumulo.  I'm interested in knowing what work,
>> if any, is being done to improve replication latency?  If work is being
>> done, would there be some interest in collaborating on that effort?
>>
>> If nothing is currently being planned, I'd be interested in design ideas
>> and pointers from the community for improvements to the existing
>> implementation.  We're looking to get replication down to less than five
>> minutes and are willing to put in the effort to implement the
>> improvements.
>>
>> Thank you for your time!
>>
>> Cheers,
>> --Adam
>>
>

Reply via email to