> Your fix is a bit dangerous since you may lose some ongoing procedures,
but
> if you did not experience any inconsistency on your cluster, for example,
> some regions are not online, then it is OK.

Duo, out of curiosity, even if some regions are offline and/or some servers
go offline, wouldn't master failover re-trigger SCPs and TRSPs to bring all
regions ONLINE?
I have played around with removal of MasterProcWAL on hbase1 only (WAL proc
store) and have seen new SCPs getting triggered i.e. AM doesn bring all
regions ONLINE eventually.


On Thu, Dec 16, 2021 at 9:57 PM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:

> I guess this should be a bug. For the master local region we do not handle
> broken WAL files which do not even have a valid header.
>
> Will take a look at the code tomorrow to confirm whether this is the case.
>
> Your fix is a bit dangerous since you may lose some ongoing procedures, but
> if you did not experience any inconsistency on your cluster, for example,
> some regions are not online, then it is OK.
>
> Thanks for reporting.
>
> Claude M <claudemur...@gmail.com> 于2021年12月16日周四 03:37写道:
>
> > Hello,
> >
> > I have the following installed:
> >
> >    - Hadoop 3.2.2
> >    - HBase 2.3.5
> >
> >
> > When all the datanodes in Hadoop are stopped but the HBase cluster is
> > still running, the HBase master crashes w/ the attached exception and is
> > not recoverable.
> >
> > If I delete the contents under the following directories in hdfs, the
> > master will then recover:
> >
> >    - /hbase/MasterData/WALs/
> >    - /hbase/MasterData/data/master/store/*/recovered.wals/
> >
> > Is this an appropriate way to resolve the issue?  If not, what should be
> > done?
> >
> >
> > Thanks
> >
>

Reply via email to