Re: Forcibly merging regions

Ted Yu Fri, 14 Nov 2014 11:37:08 -0800

This means that yesterday's compaction was not major compaction.

When references get in the way of merging regions, you know that it is time
for major compaction.


Cheers

On Fri, Nov 14, 2014 at 11:31 AM, Shahab Yunus <[email protected]>
wrote:

> After major compacting the references were freed for the above mentioned
> regions and then the merge_region command succeeded and they got merged.
> Hmmm.
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 2:08 PM, Shahab Yunus <[email protected]>
> wrote:
>
> > Digging deeper into the code, I came across this (this is from
> > CatalogJanitor#cleanMergeRegion):
> >
> >
> > ...
> >
> > ...
> >
> > HFileArchiver.archiveRegion <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
> fs, regionA);
> >
> > HFileArchiver.archiveRegion <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
> fs, regionB);
> >
> > MetaEditor.deleteMergeQualifiers <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29>(server
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server>.getCatalogTracker
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29>(),
> mergedRegion);
> >
> > return true;
> >
> >
> > Do you think it is ok if we face this issue then we forcibly archive and
> > clean the regions ?
> >
> > Regards,
> > Shahab
> >
> > On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <[email protected]>
> > wrote:
> >
> >> Yesterday, I believe.
> >>
> >> Regards,
> >> Shahab
> >>
> >> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <[email protected]> wrote:
> >>
> >>> Shahab:
> >>> When was the last time compaction was run on this table ?
> >>>
> >>> Cheers
> >>>
> >>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <[email protected]>
> >>> wrote:
> >>>
> >>> > I see. Thanks.
> >>> >
> >>> > And if the region indeed has references, then can we somehow forcibly
> >>> > remove them? Is this even possible (if not advisable)? Basically what
> >>> I am
> >>> > trying to ask is that let us say we do hit this scenario and we know
> >>> it is
> >>> > OK to go ahead and merge. What steps can we follow after detection of
> >>> such
> >>> > unwanted references.
> >>> >
> >>> > Regards,
> >>> > Shahab
> >>> >
> >>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <[email protected]>
> wrote:
> >>> >
> >>> > > For automated detection of such scenario, you can reference the
> code
> >>> in
> >>> > > CatalogJanitor#cleanMergeRegion():
> >>> > >
> >>> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
> >>> > >
> >>> > >           this.services.getConfiguration(), fs, tabledir,
> >>> mergedRegion,
> >>> > > true
> >>> > > );
> >>> > >
> >>> > > ...
> >>> > >
> >>> > > Then regionFs.hasReferences(htd) would tell you whether the
> >>> underlying
> >>> > > region has reference files.
> >>> > > Cheers
> >>> > >
> >>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <
> >>> [email protected]>
> >>> > > wrote:
> >>> > >
> >>> > > > No. Not that I can recall but I can check.
> >>> > > >
> >>> > > > From resolution perspective, is there any way we can resolve
> this.
> >>> More
> >>> > > > importantly, anyway we can automate the resolution, if we run
> into
> >>> such
> >>> > > > issues in future? 'Cleaning the qualifier', that is.
> >>> > > >
> >>> > > > Regards,
> >>> > > > Shahab
> >>> > > >
> >>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <[email protected]>
> >>> wrote:
> >>> > > >
> >>> > > > > One possibility was that region
> 7373f75181c71eb5061a6673cee15931
> >>> was
> >>> > > > > involved in some hbase snapshot.
> >>> > > > >
> >>> > > > > Was the underlying table being snapshotted in recent past ?
> >>> > > > >
> >>> > > > > Cheers
> >>> > > > >
> >>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
> >>> > [email protected]>
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > > > Thanks again.
> >>> > > > > >
> >>> > > > > > But I have been polling for a while and it still doesn't
> >>> merge. I
> >>> > > mean
> >>> > > > > this
> >>> > > > > > particular region example that I sent you, I am trying to
> >>> merge it
> >>> > > > since
> >>> > > > > > yesterday. I ran the polling-base code all night and I have
> to
> >>> kill
> >>> > > it.
> >>> > > > > > Then in the morning, I tried manual merging through hbase
> >>> shell and
> >>> > > it
> >>> > > > > > still doesn't merge. Note that the current polling logic
> >>> doesnot
> >>> > try
> >>> > > to
> >>> > > > > > call merge again. It just checks the region size.
> >>> > > > > >
> >>> > > > > > So how to clean it then? Or actually make it merge? Plus is
> >>> this
> >>> > > > > something
> >>> > > > > > expected (a region keeping a reference)? How can we avoid it?
> >>> > > > > >
> >>> > > > > > Note that this is not limited to this table only. We are
> seeing
> >>> > this
> >>> > > in
> >>> > > > > > other regions of other tables as well. Are we merging too
> fast?
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > Regards,
> >>> > > > > > Shahab
> >>> > > > > >
> >>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <
> [email protected]>
> >>> > > wrote:
> >>> > > > > >
> >>> > > > > > > Polling as you described is fine.
> >>> > > > > > >
> >>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
> >>> > > > > > > DispatchMergingRegionHandler.
> >>> > > > > > >
> >>> > > > > > > If clean was successful, you would see the following:
> >>> > > > > > >
> >>> > > > > > >       LOG.debug("Deleting region " +
> >>> > > regionA.getRegionNameAsString()
> >>> > > > +
> >>> > > > > "
> >>> > > > > > > and "
> >>> > > > > > >
> >>> > > > > > >           + regionB.getRegionNameAsString()
> >>> > > > > > >
> >>> > > > > > >           + " from fs because merged region no longer holds
> >>> > > > > references");
> >>> > > > > > >
> >>> > > > > > > Assuming there was no log below in your master log:
> >>> > > > > > >
> >>> > > > > > >       LOG.error("Merged region " +
> >>> region.getRegionNameAsString()
> >>> > > > > > >
> >>> > > > > > >           + " has only one merge qualifier in META.");
> >>> > > > > > >
> >>> > > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931
> >>> still
> >>> > > had
> >>> > > > > > > reference file.
> >>> > > > > > >
> >>> > > > > > > Cheers
> >>> > > > > > >
> >>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> >>> > > > [email protected]>
> >>> > > > > > > wrote:
> >>> > > > > > >
> >>> > > > > > > > Hi Ted.
> >>> > > > > > > >
> >>> > > > > > > > The log bit is below at the end of the email. This is the
> >>> > command
> >>> > > > to
> >>> > > > > > > merge
> >>> > > > > > > > that I gave just now through hbase shell. forcible was
> >>> false
> >>> > but
> >>> > > it
> >>> > > > > > > behaves
> >>> > > > > > > > similarly if forcible is true too. This is from master
> log.
> >>> > > Indeed
> >>> > > > > the
> >>> > > > > > > > region merging was skipped! What does this mean? Data
> >>> seems to
> >>> > be
> >>> > > > > > intact
> >>> > > > > > > > for this table.
> >>> > > > > > > >
> >>> > > > > > > > Just to give you a background. This table was first merge
> >>> by
> >>> > the
> >>> > > > auto
> >>> > > > > > > mated
> >>> > > > > > > > java application. What we are doing is that we are
> merging
> >>> > tables
> >>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i
> >>> async,
> >>> > > we
> >>> > > > > poll
> >>> > > > > > > for
> >>> > > > > > > > the number of regions getting lowered after this merge
> >>> call.
> >>> > The
> >>> > > > > > > > application hangs and continues polling for ever as the
> >>> > previous
> >>> > > > > merge
> >>> > > > > > > > didn't happen.
> >>> > > > > > > >
> >>> > > > > > > > In this poll loop, we do get the number of regions by a
> >>> fresh
> >>> > > call
> >>> > > > to
> >>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> >>> > > > > > > >
> >>> > > > > > > > What are these merge qualifiers and what are we doing
> >>> wrong or
> >>> > > > should
> >>> > > > > > do?
> >>> > > > > > > >
> >>> > > > > > > > In the polling loop we can somehow retry merge again? But
> >>> how
> >>> > can
> >>> > > > we
> >>> > > > > > > know,
> >>> > > > > > > > that we need to call merge again as it works for some
> >>> regions.
> >>> > Is
> >>> > > > the
> >>> > > > > > > table
> >>> > > > > > > > meta corrupted for some reason by the above logic?
> >>> > > > > > > >
> >>> > > > > > > > Thanks a lot.
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > >
> >>> >
> >>>
> ------------------------------------------------------------------------
> >>> > > > > > > >
> >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > Session:
> >>> > > > > > > > 0x348c7017707236b closed
> >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > > > EventThread
> >>> > > > > > > > shut down
> >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > > Initiating
> >>> > > > > > > > client connection,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> >>> > > > > > > > sessionTimeout=60000
> >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> >>> > > > > > > > baseZNode=/hbase
> >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> >>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> >>> Process
> >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> >>> connecting
> >>> > to
> >>> > > > > > > ZooKeeper
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Opening
> >>> > > > > > > > socket connection to server
> >>> > ip-1010018.ec2.internal/1010019:2181.
> >>> > > > > Will
> >>> > > > > > > not
> >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> >>> > > > > > > > 2014-11-14 11:25:02,646 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Socket
> >>> > > > > > > > connection established to
> >>> ip-1010018.ec2.internal/1010019:2181,
> >>> > > > > > > initiating
> >>> > > > > > > > session
> >>> > > > > > > > 2014-11-14 11:25:02,648 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Session
> >>> > > > > > > > establishment complete on server
> >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> >>> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > Session:
> >>> > > > > > > > 0x348c7017707236c closed
> >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > > > EventThread
> >>> > > > > > > > shut down
> >>> > > > > > > > 2014-11-14 11:25:30,713 INFO
> >>> > > > > > > >
> >>> > > >
> >>> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> >>> > > > > > Skip
> >>> > > > > > > > merging regions
> >>> > > > > > > >
> >>> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> >>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
> >>> > > qualifier
> >>> > > > > > > > 2014-11-14 11:25:41,383 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > > Initiating
> >>> > > > > > > > client connection,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> >>> > > > > > > > sessionTimeout=60000
> >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> >>> > > > > > > > baseZNode=/hbase
> >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> >>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> >>> Process
> >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> >>> connecting
> >>> > to
> >>> > > > > > > ZooKeeper
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Opening
> >>> > > > > > > > socket connection to server
> >>> > ip-1010018.ec2.internal/1010019:2181.
> >>> > > > > Will
> >>> > > > > > > not
> >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> >>> > > > > > > > 2014-11-14 11:25:41,386 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Socket
> >>> > > > > > > > connection established to
> >>> ip-1010018.ec2.internal/1010019:2181,
> >>> > > > > > > initiating
> >>> > > > > > > > session
> >>> > > > > > > > 2014-11-14 11:25:41,389 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Session
> >>> > > > > > > > establishment complete on server
> >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> >>> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> >>> > > > > > > > 2014-11-14 11:25:41,397 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > Session:
> >>> > > > > > > > 0x348c7017707236e closed
> >>> > > > > > > > 2014-11-14 11:25:41,398 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > > > EventThread
> >>> > > > > > > > shut down
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> ------------------------------------------------------------------------------------------------------------------------------------
> >>> > > > > > > >
> >>> > > > > > > > Regards,
> >>> > > > > > > > Shahab
> >>> > > > > > > >
> >>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
> >>> [email protected]>
> >>> > > > > wrote:
> >>> > > > > > > >
> >>> > > > > > > > > Looking at DispatchMergingRegionHandler, it does some
> >>> check
> >>> > > > before
> >>> > > > > > > > > initiating the merge.
> >>> > > > > > > > > e.g.:
> >>> > > > > > > > >
> >>> > > > > > > > >       LOG.info("Skip merging regions " +
> >>> > > > > > > region_a.getRegionNameAsString()
> >>> > > > > > > > >
> >>> > > > > > > > >           + ", " + region_b.getRegionNameAsString() +
> ",
> >>> > > because
> >>> > > > > > > region "
> >>> > > > > > > > >
> >>> > > > > > > > >           + (regionAHasMergeQualifier ?
> >>> > > > region_a.getEncodedName() :
> >>> > > > > > > > > region_b
> >>> > > > > > > > >
> >>> > > > > > > > >               .getEncodedName()) + " has merge
> >>> qualifier");
> >>> > > > > > > > >
> >>> > > > > > > > > Can you take a look at master log around the time merge
> >>> > request
> >>> > > > was
> >>> > > > > > > > issued
> >>> > > > > > > > > to see if you can get some clue ?
> >>> > > > > > > > >
> >>> > > > > > > > > Cheers
> >>> > > > > > > > >
> >>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> >>> > > > > > [email protected]>
> >>> > > > > > > > > wrote:
> >>> > > > > > > > >
> >>> > > > > > > > > > The documentation of online merge tool (merge_region)
> >>> > states
> >>> > > > that
> >>> > > > > > if
> >>> > > > > > > we
> >>> > > > > > > > > > forcibly merge regions (by setting the 3rd attribute
> as
> >>> > true)
> >>> > > > > then
> >>> > > > > > it
> >>> > > > > > > > can
> >>> > > > > > > > > > create overlapping regions. if this happens then will
> >>> this
> >>> > > > render
> >>> > > > > > the
> >>> > > > > > > > > > region or table unusable or it is just a performance
> >>> hit? I
> >>> > > > mean
> >>> > > > > > how
> >>> > > > > > > > > bigger
> >>> > > > > > > > > > of a deal it is?
> >>> > > > > > > > > >
> >>> > > > > > > > > > Actually, we are merging regions using the
> >>> programmatic API
> >>> > > for
> >>> > > > > > this
> >>> > > > > > > > and
> >>> > > > > > > > > > setting this flag ('forcible') as false. But for some
> >>> > tables
> >>> > > > (we
> >>> > > > > > > > haven't
> >>> > > > > > > > > > figured out a pattern yet, data is still accessible),
> >>> merge
> >>> > > of
> >>> > > > > > > regions
> >>> > > > > > > > do
> >>> > > > > > > > > > not happen at all. Afterwards we tried with this
> flag =
> >>> > true,
> >>> > > > and
> >>> > > > > > it
> >>> > > > > > > > > still
> >>> > > > > > > > > > doesn't merge them.
> >>> > > > > > > > > >
> >>> > > > > > > > > > CDH 5.1.0
> >>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> >>> > > > > > > > > >
> >>> > > > > > > > > > Regards,
> >>> > > > > > > > > > Shahab
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: Forcibly merging regions

Reply via email to