Sync table is to be run manually when you think there can be
inconsistencies between the 2 clusters only for specific time period.

As soon as you disable serial replication, it should start replicating from
the time it was stuck. You can build dashboards from jmx metrics generated
from hmaster to know about these and setup alerts as well.



On Sun, Dec 12, 2021, 3:33 PM Hamado Dene <hamadod...@yahoo.com.invalid>
wrote:

> Ok perfect.How often should this sync run? I guess in this case you have
> to automate it somehow, correct?
> Since I will have to disable serial mode, do I first have to align tables
> manually or the moment I disable serial mode, the regionservers will start
> replicating from where they were blocked?
>
>
>     Il domenica 12 dicembre 2021, 10:55:05 CET, Mallikarjun <
> mallik.v.ar...@gmail.com> ha scritto:
>
>  https://hbase.apache.org/book.html#hashtable.synctable
>
> To copy the difference between tables for a specific time period.
>
> On Sun, Dec 12, 2021, 3:12 PM Hamado Dene <hamadod...@yahoo.com.invalid>
> wrote:
>
> > Interesting, thank you very much for the info. I'll try to disable serial
> > replication.As for "sync table utility" what do you mean?I am new to
> Hbase,
> > I am not yet familiar with all Hbase tools.
> >
> >
> >
> >    Il domenica 12 dicembre 2021, 10:15:01 CET, Mallikarjun <
> > mallik.v.ar...@gmail.com> ha scritto:
> >
> >  We have faced issues with serial replication when one of the region
> server
> > of either cluster goes into hardware failure, typically memory from my
> > understanding. I could not spend enough time to reproduce reliably to
> > identify the root cause. So I don't know why it is caused.
> >
> > Issue could be your serial replication has got into deadlock mode among
> the
> > region servers. Who are not able to make any progress because older
> > sequence ID is not replicated and older sequence ID is not in front of
> the
> > line to be able to replicate itself.
> >
> > Quick fix: disable serial replication temporarily so that out of ordering
> > is allowed to unblock the replication. Can result into some
> inconsistencies
> > between clusters which can be fixed using sync table utility since your
> > setup is active passive
> >
> > Another fix: delete barriers for each regions in hbase:meta. Same
> > consequence as above.
> >
> >
> > On Sun, Dec 12, 2021, 2:24 PM Hamado Dene <hamadod...@yahoo.com.invalid>
> > wrote:
> >
> > > I'm using hbase 2.2.6 with hadoop 2.8.5.Yes, My replication serial is
> > > enabled.This is my peer configuration
> > >
> > >
> > >
> > > |
> > > | Peer Id | Cluster Key | Endpoint | State | IsSerial | Bandwidth |
> > > ReplicateAll | Namespaces | Exclude Namespaces | Table Cfs | Exclude
> > Table
> > > Cfs |
> > > | replicav1 | acv-db10-hn,acv-db11-hn,acv-db12-hn:2181:/hbase |  |
> > ENABLED
> > > | true | UNLIMITED | true
> > >
> > >  |
> > >
> > >    Il domenica 12 dicembre 2021, 09:39:44 CET, Mallikarjun <
> > > mallik.v.ar...@gmail.com> ha scritto:
> > >
> > >  Which version of hbase are you using? Is your replication serial
> > enabled?
> > >
> > > ---
> > > Mallikarjun
> > >
> > >
> > > On Sun, Dec 12, 2021 at 1:54 PM Hamado Dene
> <hamadod...@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > > Hi Hbase community,
> > > >
> > > > On our production installation we have two hbase clusters in two
> > > different
> > > > datacenters.The primary datacenter replicates the data to the
> secondary
> > > > datacenter.When we create the tables, we first create on the
> secondary
> > > > datacenter and then on the primary and then we set replication scope
> > to 1
> > > > on the primary.The peer pointing to quorum zk of the secondary
> cluster
> > is
> > > > configured on the primary.
> > > > Initially, replication worked fine and data was replicated.We have
> > > > recently noticed that some tables are empty in the secondary
> > datacenter.
> > > So
> > > > most likely the data is no longer replicated. I'm seeing lines like
> > this
> > > in
> > > > the logs:
> > > >
> > > >
> > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > edits:
> > > > 0, current progress:walGroup [db11%2C16020%2C1637849866921]:
> currently
> > > > replicating from:
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/db11-hd%2C16020%2C1637849866921.1637849874263
> > > > at position: -1
> > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > edits:
> > > > 0, current progress:walGroup [db09%2C16020%2C1637589840862]:
> currently
> > > > replicating from:
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/db09-hd%2C16020%2C1637589840862.1637589846870
> > > > at position: -1
> > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > edits:
> > > > 0, current progress:walGroup [db13%2C16020%2C1635424806449]:
> currently
> > > > replicating from:
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/db13%2C16020%2C1635424806449.1635424812985
> > > > at position: -1
> > > >
> > > >
> > > >
> > > > 2021-12-12 09:13:47,148 INFO  [rzv-db09-hd:16020Replication
> Statistics
> > > #0]
> > > > regionserver.Replication: ormal source for cluster replicav1: Total
> > > > replicated edits: 0, current progress:walGroup
> > > > [db09%2C16020%2C1638791923537]: currently replicating from:
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/WALs/rzv-db09-hd.rozzano.diennea.lan,16020,1638791923537/rzv-db09-hd.rozzano.diennea.lan%2C16020%2C1638791923537.1638791930213
> > > > at position: -1
> > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > edits:
> > > > 0, current progress:walGroup [db09%2C16020%2C1634401671527]:
> currently
> > > > replicating from:
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/rzv-db09-hd.rozzano.diennea.lan%2C16020%2C1634401671527.1634401679218
> > > > at position: -1
> > > > Recovered source for cluster/machine(s) replicav1: Total replicated
> > > edits:
> > > > 0, current progress:walGroup [db10%2C16020%2C1637585899997]:
> currently
> > > > replicating from:
> > > >
> > >
> >
> hdfs://rozzanohadoopcluster/hbase/oldWALs/rzv-db10-hd.rozzano.diennea.lan%2C16020%2C1637585899997.1637585906625
> > > > at position: -1
> > > >
> > > >
> > > >
> > > > 2021-12-12 08:24:58,561 WARN
> > [regionserver/rzv-db12-hd:16020.logRoller]
> > > > regionserver.ReplicationSource: WAL group
> db12%2C16020%2C1638790692057
> > > > queue size: 187 exceeds value of replication.source.log.queue.warn: 2
> > > > Do you have any info on what could be the problem?
> > > >
> > > > Thanks
> > >
> >
>

Reply via email to