Re: Replication Latency

Josh Elser Wed, 11 Jan 2017 17:03:12 -0800

Did you look at the accumulo-gc log to actually correlate how often theclass I sent is being executed?


Noe Detore wrote:

To be fare, after writing the post I grepped the logs and found my WALs
rolling over on size before the time max.age threshold was hit. That is
the reason I did not see improvement in latency based on adjustment by
reducing the max.age.


There is still an x factor from when a WAL is no longer written to by
the tserver as to when it actually gets replicated that I need to figure
out. For example my WALs appear to done(new wal created on tserver)
being written to in 3m, but replication is taking about 12 to 15 min to
complete. Even though the wal is not being written to after 3m I am not
seeing it ready for replication (closed: true) until after 13m.


On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <josh.el...@gmail.com
<mailto:josh.el...@gmail.com>> wrote:

    See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences
    for where WALs are currently marked as "closed".

    I don't recall the details, but I think there was some issue with
    trying to close them in TabletServerLogger.

    Yes to your last question: if it were done in TabletServerLogger, it
    would be closed more quickly than done by the GC. The issue is
    whether or not it's actually safe to mark them as closed there. I
    just don't remember the internal WAL lifecycle well enough.


    Noe Detore wrote:

        Hello,

        I trying to influence replication latency with
        tserver.walog.max.age.
        But noticing no difference when setting the value low. Looking
        in the
        code of org.apache.accumulo.tserver.log.TabletServerLogger:

        protected void closeForReplication(Collection<CommitSession>
        sessions) {
             // TODO We can close the WAL here for replication purposes
           }

        This to do is called by :
        testLockAndRun(logSetLock, new TestCallWithWriteLock() {
               @Override
               boolean test() {
                 return (logSizeEstimate.get() > maxSize) ||
        ((System.currentTimeMillis() - createTime) > maxAge);
               }

               @Override
               void withWriteLock() throws IOException {
                 close();
                 closeForReplication(sessions);
               }
             });
             return seq;
           }

        I am still trying to understand what is happening here, but
        could this
        TODO be the reason replication status records are not being
        updated with
        'closed: true' sooner ?

        Thank you
        Noe

Re: Replication Latency

Reply via email to