To be fare, after writing the post I grepped the logs and found my WALs
rolling over on size before the time max.age threshold was hit. That is the
reason I did not see improvement in latency based on adjustment by reducing
the max.age.

There is still an x factor from when a WAL is no longer written to by the
tserver as to when it actually gets replicated that I need to figure out.
For example my WALs appear to done(new wal created on tserver) being
written to in 3m, but replication is taking about 12 to 15 min to complete.
Even though the wal is not being written to after 3m I am not seeing it
ready for replication (closed: true) until after 13m.


On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <josh.el...@gmail.com> wrote:

> See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences for
> where WALs are currently marked as "closed".
>
> I don't recall the details, but I think there was some issue with trying
> to close them in TabletServerLogger.
>
> Yes to your last question: if it were done in TabletServerLogger, it would
> be closed more quickly than done by the GC. The issue is whether or not
> it's actually safe to mark them as closed there. I just don't remember the
> internal WAL lifecycle well enough.
>
>
> Noe Detore wrote:
>
>> Hello,
>>
>> I trying to influence replication latency with tserver.walog.max.age.
>> But noticing no difference when setting the value low. Looking in the
>> code of org.apache.accumulo.tserver.log.TabletServerLogger:
>>
>> protected void closeForReplication(Collection<CommitSession> sessions) {
>>     // TODO We can close the WAL here for replication purposes
>>   }
>>
>> This to do is called by :
>> testLockAndRun(logSetLock, new TestCallWithWriteLock() {
>>       @Override
>>       boolean test() {
>>         return (logSizeEstimate.get() > maxSize) ||
>> ((System.currentTimeMillis() - createTime) > maxAge);
>>       }
>>
>>       @Override
>>       void withWriteLock() throws IOException {
>>         close();
>>         closeForReplication(sessions);
>>       }
>>     });
>>     return seq;
>>   }
>>
>> I am still trying to understand what is happening here, but could this
>> TODO be the reason replication status records are not being updated with
>> 'closed: true' sooner ?
>>
>> Thank you
>> Noe
>>
>

Reply via email to