Re: RS crash upon replication

Ted Yu Wed, 22 May 2013 13:50:01 -0700

What does this command show you ?

get /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1


Cheers

On Wed, May 22, 2013 at 1:46 PM, [email protected] <
[email protected]> wrote:

> ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379
> [1]
> [zk: va-p-zookeeper-01-c:2181(CONNECTED) 2] ls
> /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
> []
>
> I'm on hbase-0.94.2-cdh4.2.1
>
> Thanks
>
>
> On Wed, May 22, 2013 at 11:40 PM, Varun Sharma <[email protected]>
> wrote:
>
> > Also what version of HBase are you running ?
> >
> >
> > On Wed, May 22, 2013 at 1:38 PM, Varun Sharma <[email protected]>
> wrote:
> >
> > > Basically,
> > >
> > > You had va-p-hbase-02 crash - that caused all the replication related
> > data
> > > in zookeeper to be moved to va-p-hbase-01 and have it take over for
> > > replicating 02's logs. Now each region server also maintains an
> in-memory
> > > state of whats in ZK, it seems like when you start up 01, its trying to
> > > replicate the 02 logs underneath but its failing to because that data
> is
> > > not in ZK. This is somewhat weird...
> > >
> > > Can you open the zookeepeer shell and do
> > >
> > > ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379
> > >
> > > And give the output ?
> > >
> > >
> > > On Wed, May 22, 2013 at 1:27 PM, [email protected] <
> > > [email protected]> wrote:
> > >
> > >> Hi,
> > >>
> > >> This is bad ... and happened twice: I had my replication-slave cluster
> > >> offlined. I performed quite a massive Merge operation on it and after
> a
> > >> couple of hours it had finished and I returned it back online. At the
> > same
> > >> time, the replication-master RS machines crashed (see first crash
> > >> http://pastebin.com/1msNZ2tH) with the first exception being:
> > >>
> > >> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
> =
> > >> NoNode for
> > >>
> > >>
> >
> /hbase/replication/rs/va-p-hbase-01-c,60020,1369233253404/1-va-p-hbase-01-c,60020,1369042378287-va-p-hbase-02-c,60020,1369042377731/va-p-hbase-01-c%2C60020%2C1369042378287.1369220050719
> > >>         at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > >>         at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > >>         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:354)
> > >>         at
> > >> org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:846)
> > >>         at
> > >> org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:898)
> > >>         at
> > >> org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:892)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:638)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:387)
> > >>
> > >> Before restarting the crashed RS's, I have applied a
> 'stop_replication'
> > >> cmd. Then fired up the RS's again. They've started o.k. but once I've
> > hit
> > >> 'start_replication' they have crashed once again. The second crash log
> > >> http://pastebin.com/8Nb5epJJ has the same initial exception
> > >> (org.apache.zookeeper.KeeperException$NoNodeException:
> > >> KeeperErrorCode = NoNode). I've started the crash region servers again
> > >> without replication and currently all is well, but I need to start
> > >> replication asap.
> > >>
> > >> Does anyone have an idea what's going on and how can I solve it ?
> > >>
> > >> Thanks,
> > >> Amit
> > >>
> > >
> > >
> >
>

Re: RS crash upon replication

Reply via email to