The code referenced in the PR works to detect and move a WAL, replacing it
with an empty one, but isn't fully wrapped up/merged.  Some priorities were
shifted and this got pushed back, though I do plan on addressing the
comments in the code review Soon™.

I'd suggest upgrading to 1.9.2 once you resolve the issue.  We've been
running it for a while and have not had any WAL-related errors.

--Adam

On Tue, Aug 21, 2018 at 6:58 PM Ed Coleman <d...@etcoleman.com> wrote:

> The has been work done in https://github.com/apache/accumulo/pull/574.
> I'm not certain of the state of the code, but the description may provide
> you with things that you could look at manually.
>
>
> -----Original Message-----
> From: tech.s...@gmail.com [mailto:tech.s...@gmail.com]
> Sent: Tuesday, August 21, 2018 5:45 PM
> To: user@accumulo.apache.org
> Subject: Re: Corrupt WAL
>
> Was there any success with this workaround strategy?  I am also
> experiencing this issue.
>
> On 2018/06/13 16:30:22, "Adam J. Shook" <adamjsh...@gmail.com> wrote:
> > Sorry, I had the error backwards.  There is an OPEN for the WAL and
> > then immediately a COMPACTION_FINISH entry.  This would cause the error.
> >
> > On Wed, Jun 13, 2018 at 11:34 AM, Adam J. Shook <adamjsh...@gmail.com>
> > wrote:
> >
> > > Looking at the log I see that the last two entries are
> > > COMPACTION_START of one RFile immediately followed by a
> > > COMPACTION_START of a separate RFile which (I believe) would lead to
> > > the error.  Would this necessarily be an issue if the compactions are
> for separate RFiles?
> > >
> > > This is a dev cluster and I don't necessarily care about it, but is
> > > there a (good) means to do WAL log surgery?  I imagine I can just
> > > chop off bytes until the log is parseable and missing the info about
> the compactions.
> > >
> > > On Tue, Jun 12, 2018 at 2:32 PM, Keith Turner <ke...@deenlo.com>
> wrote:
> > >
> > >> On Tue, Jun 12, 2018 at 12:10 PM, Adam J. Shook
> > >> <adamjsh...@gmail.com>
> > >> wrote:
> > >> > Yes, that is the error.  I'll inspect the logs and report back.
> > >>
> > >> Ok.  The LogReader command has a mechanism to filter which tablet
> > >> is displayed.  If the walog has  alot of data in it, may need to
> > >> use this.
> > >>
> > >> Also, be aware that only 5 mutations are shown for a "many mutations"
> > >> objects in the walog.   The -m options changes this.  May want to see
> > >> more when deciding if the info in the log is important.
> > >>
> > >>
> > >> >
> > >> > On Tue, Jun 12, 2018 at 10:14 AM, Keith Turner <ke...@deenlo.com>
> > >> wrote:
> > >> >>
> > >> >> Is the message you are seeing "COMPACTION_FINISH (without
> > >> >> preceding COMPACTION_START)" ?  That messages indicates that the
> > >> >> WALs are incomplete, probably as a result of the NN problems.
> > >> >> Could do the following :
> > >> >>
> > >> >> 1) Run the following command to see whats in the log.  Need to
> > >> >> see what is there for the root tablet.
> > >> >>
> > >> >>    accumulo org.apache.accumulo.tserver.logger.LogReader
> > >> >>
> > >> >> 2) Replace the log file with an empty file after seeing if there
> > >> >> is anything important in it.
> > >> >>
> > >> >> I think the list of WALs for the root tablet is stored in ZK at
> > >> >> /accumulo/<id>/walogs
> > >> >>
> > >> >> On Mon, Jun 11, 2018 at 5:26 PM, Adam J. Shook
> > >> >> <adamjsh...@gmail.com>
> > >> >> wrote:
> > >> >> > Hey all,
> > >> >> >
> > >> >> > The root tablet on one of our dev systems isn't loading due to
> > >> >> > an illegal state exception -- COMPACTION_FINISH preceding
> > >> >> > COMPACTION_START.
> > >> What'd
> > >> >> > be
> > >> >> > the best way to mitigate this issue?  This was likely caused
> > >> >> > due to
> > >> both
> > >> >> > of
> > >> >> > our NameNodes failing.
> > >> >> >
> > >> >> > Thank you,
> > >> >> > --Adam
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>
>

Reply via email to