Re: HBase master unable to recover with error "Cannot seek after EOF"

Claude M Fri, 07 Jan 2022 09:38:05 -0800

Has HBase 2.4 been tested to be fully functional w/ Hadoop 2.10.0?  I don't
see it in the compatibility chart.


On Fri, Jan 7, 2022 at 12:37 AM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:

> You can try to upgrade to 2.4.x, it should be rolling upgradable.
>
> Claude M <claudemur...@gmail.com> 于2022年1月4日周二 23:24写道：
> >
> > I don't want to rebuild HBase.  According to the attached HBase/Hadoop
> compatibility chart, the latest version of HBase that has been verified w/
> Hadoop is 2.3.x.
> > The fix was put into branch 2.3 on 11/21 but there is not going to be a
> 2.3.8 release since it is mentioned that branch 2.3 is EOL.  Is there not
> another way around this?
> >
> > On Fri, Dec 24, 2021 at 12:53 AM 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
> >>
> >> Ah, thanks Yulin Niu for the pointer. HBASE-26053 should be the problem.
> >>
> >> Yulin Niu <yulin.niu.2...@gmail.com> 于2021年12月19日周日 10:41写道：
> >> >
> >> > https://issues.apache.org/jira/browse/HBASE-25053
> >> > It seems the bug described in this issue, You can try cherry pick this
> >> > patch, Claude M
> >> >
> >> > Viraj Jasani <vjas...@apache.org> 于2021年12月19日周日 02:17写道：
> >> >
> >> > > > Your fix is a bit dangerous since you may lose some ongoing
> procedures,
> >> > > but
> >> > > > if you did not experience any inconsistency on your cluster, for
> example,
> >> > > > some regions are not online, then it is OK.
> >> > >
> >> > > Duo, out of curiosity, even if some regions are offline and/or some
> servers
> >> > > go offline, wouldn't master failover re-trigger SCPs and TRSPs to
> bring all
> >> > > regions ONLINE?
> >> > > I have played around with removal of MasterProcWAL on hbase1 only
> (WAL proc
> >> > > store) and have seen new SCPs getting triggered i.e. AM doesn bring
> all
> >> > > regions ONLINE eventually.
> >> > >
> >> > >
> >> > > On Thu, Dec 16, 2021 at 9:57 PM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I guess this should be a bug. For the master local region we do
> not
> >> > > handle
> >> > > > broken WAL files which do not even have a valid header.
> >> > > >
> >> > > > Will take a look at the code tomorrow to confirm whether this is
> the
> >> > > case.
> >> > > >
> >> > > > Your fix is a bit dangerous since you may lose some ongoing
> procedures,
> >> > > but
> >> > > > if you did not experience any inconsistency on your cluster, for
> example,
> >> > > > some regions are not online, then it is OK.
> >> > > >
> >> > > > Thanks for reporting.
> >> > > >
> >> > > > Claude M <claudemur...@gmail.com> 于2021年12月16日周四 03:37写道：
> >> > > >
> >> > > > > Hello,
> >> > > > >
> >> > > > > I have the following installed:
> >> > > > >
> >> > > > >    - Hadoop 3.2.2
> >> > > > >    - HBase 2.3.5
> >> > > > >
> >> > > > >
> >> > > > > When all the datanodes in Hadoop are stopped but the HBase
> cluster is
> >> > > > > still running, the HBase master crashes w/ the attached
> exception and
> >> > > is
> >> > > > > not recoverable.
> >> > > > >
> >> > > > > If I delete the contents under the following directories in
> hdfs, the
> >> > > > > master will then recover:
> >> > > > >
> >> > > > >    - /hbase/MasterData/WALs/
> >> > > > >    - /hbase/MasterData/data/master/store/*/recovered.wals/
> >> > > > >
> >> > > > > Is this an appropriate way to resolve the issue?  If not, what
> should
> >> > > be
> >> > > > > done?
> >> > > > >
> >> > > > >
> >> > > > > Thanks
> >> > > > >
> >> > > >
> >> > >
>

Re: HBase master unable to recover with error "Cannot seek after EOF"

Reply via email to