Re: Forcibly merging regions

Shahab Yunus Fri, 14 Nov 2014 11:32:43 -0800

After major compacting the references were freed for the above mentioned
regions and then the merge_region command succeeded and they got merged.
Hmmm.


Regards,
Shahab

On Fri, Nov 14, 2014 at 2:08 PM, Shahab Yunus <[email protected]>
wrote:

> Digging deeper into the code, I came across this (this is from
> CatalogJanitor#cleanMergeRegion):
>
>
> ...
>
> ...
>
> HFileArchiver.archiveRegion 
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
>  
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
>  
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
>  fs, regionA);
>
> HFileArchiver.archiveRegion 
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
>  
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
>  
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
>  fs, regionB);
>
> MetaEditor.deleteMergeQualifiers 
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29>(server
>  
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server>.getCatalogTracker
>  
> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29>(),
>  mergedRegion);
>
> return true;
>
>
> Do you think it is ok if we face this issue then we forcibly archive and
> clean the regions ?
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <[email protected]>
> wrote:
>
>> Yesterday, I believe.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <[email protected]> wrote:
>>
>>> Shahab:
>>> When was the last time compaction was run on this table ?
>>>
>>> Cheers
>>>
>>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <[email protected]>
>>> wrote:
>>>
>>> > I see. Thanks.
>>> >
>>> > And if the region indeed has references, then can we somehow forcibly
>>> > remove them? Is this even possible (if not advisable)? Basically what
>>> I am
>>> > trying to ask is that let us say we do hit this scenario and we know
>>> it is
>>> > OK to go ahead and merge. What steps can we follow after detection of
>>> such
>>> > unwanted references.
>>> >
>>> > Regards,
>>> > Shahab
>>> >
>>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <[email protected]> wrote:
>>> >
>>> > > For automated detection of such scenario, you can reference the code
>>> in
>>> > > CatalogJanitor#cleanMergeRegion():
>>> > >
>>> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
>>> > >
>>> > >           this.services.getConfiguration(), fs, tabledir,
>>> mergedRegion,
>>> > > true
>>> > > );
>>> > >
>>> > > ...
>>> > >
>>> > > Then regionFs.hasReferences(htd) would tell you whether the
>>> underlying
>>> > > region has reference files.
>>> > > Cheers
>>> > >
>>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <
>>> [email protected]>
>>> > > wrote:
>>> > >
>>> > > > No. Not that I can recall but I can check.
>>> > > >
>>> > > > From resolution perspective, is there any way we can resolve this.
>>> More
>>> > > > importantly, anyway we can automate the resolution, if we run into
>>> such
>>> > > > issues in future? 'Cleaning the qualifier', that is.
>>> > > >
>>> > > > Regards,
>>> > > > Shahab
>>> > > >
>>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <[email protected]>
>>> wrote:
>>> > > >
>>> > > > > One possibility was that region 7373f75181c71eb5061a6673cee15931
>>> was
>>> > > > > involved in some hbase snapshot.
>>> > > > >
>>> > > > > Was the underlying table being snapshotted in recent past ?
>>> > > > >
>>> > > > > Cheers
>>> > > > >
>>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
>>> > [email protected]>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Thanks again.
>>> > > > > >
>>> > > > > > But I have been polling for a while and it still doesn't
>>> merge. I
>>> > > mean
>>> > > > > this
>>> > > > > > particular region example that I sent you, I am trying to
>>> merge it
>>> > > > since
>>> > > > > > yesterday. I ran the polling-base code all night and I have to
>>> kill
>>> > > it.
>>> > > > > > Then in the morning, I tried manual merging through hbase
>>> shell and
>>> > > it
>>> > > > > > still doesn't merge. Note that the current polling logic
>>> doesnot
>>> > try
>>> > > to
>>> > > > > > call merge again. It just checks the region size.
>>> > > > > >
>>> > > > > > So how to clean it then? Or actually make it merge? Plus is
>>> this
>>> > > > > something
>>> > > > > > expected (a region keeping a reference)? How can we avoid it?
>>> > > > > >
>>> > > > > > Note that this is not limited to this table only. We are seeing
>>> > this
>>> > > in
>>> > > > > > other regions of other tables as well. Are we merging too fast?
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > Regards,
>>> > > > > > Shahab
>>> > > > > >
>>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <[email protected]>
>>> > > wrote:
>>> > > > > >
>>> > > > > > > Polling as you described is fine.
>>> > > > > > >
>>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
>>> > > > > > > DispatchMergingRegionHandler.
>>> > > > > > >
>>> > > > > > > If clean was successful, you would see the following:
>>> > > > > > >
>>> > > > > > >       LOG.debug("Deleting region " +
>>> > > regionA.getRegionNameAsString()
>>> > > > +
>>> > > > > "
>>> > > > > > > and "
>>> > > > > > >
>>> > > > > > >           + regionB.getRegionNameAsString()
>>> > > > > > >
>>> > > > > > >           + " from fs because merged region no longer holds
>>> > > > > references");
>>> > > > > > >
>>> > > > > > > Assuming there was no log below in your master log:
>>> > > > > > >
>>> > > > > > >       LOG.error("Merged region " +
>>> region.getRegionNameAsString()
>>> > > > > > >
>>> > > > > > >           + " has only one merge qualifier in META.");
>>> > > > > > >
>>> > > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931
>>> still
>>> > > had
>>> > > > > > > reference file.
>>> > > > > > >
>>> > > > > > > Cheers
>>> > > > > > >
>>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
>>> > > > [email protected]>
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Hi Ted.
>>> > > > > > > >
>>> > > > > > > > The log bit is below at the end of the email. This is the
>>> > command
>>> > > > to
>>> > > > > > > merge
>>> > > > > > > > that I gave just now through hbase shell. forcible was
>>> false
>>> > but
>>> > > it
>>> > > > > > > behaves
>>> > > > > > > > similarly if forcible is true too. This is from master log.
>>> > > Indeed
>>> > > > > the
>>> > > > > > > > region merging was skipped! What does this mean? Data
>>> seems to
>>> > be
>>> > > > > > intact
>>> > > > > > > > for this table.
>>> > > > > > > >
>>> > > > > > > > Just to give you a background. This table was first merge
>>> by
>>> > the
>>> > > > auto
>>> > > > > > > mated
>>> > > > > > > > java application. What we are doing is that we are merging
>>> > tables
>>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i
>>> async,
>>> > > we
>>> > > > > poll
>>> > > > > > > for
>>> > > > > > > > the number of regions getting lowered after this merge
>>> call.
>>> > The
>>> > > > > > > > application hangs and continues polling for ever as the
>>> > previous
>>> > > > > merge
>>> > > > > > > > didn't happen.
>>> > > > > > > >
>>> > > > > > > > In this poll loop, we do get the number of regions by a
>>> fresh
>>> > > call
>>> > > > to
>>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
>>> > > > > > > >
>>> > > > > > > > What are these merge qualifiers and what are we doing
>>> wrong or
>>> > > > should
>>> > > > > > do?
>>> > > > > > > >
>>> > > > > > > > In the polling loop we can somehow retry merge again? But
>>> how
>>> > can
>>> > > > we
>>> > > > > > > know,
>>> > > > > > > > that we need to call merge again as it works for some
>>> regions.
>>> > Is
>>> > > > the
>>> > > > > > > table
>>> > > > > > > > meta corrupted for some reason by the above logic?
>>> > > > > > > >
>>> > > > > > > > Thanks a lot.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > >
>>> >
>>> ------------------------------------------------------------------------
>>> > > > > > > >
>>> > > > > > > > 2014-11-14 11:25:02,643 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > Session:
>>> > > > > > > > 0x348c7017707236b closed
>>> > > > > > > > 2014-11-14 11:25:02,643 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > > > EventThread
>>> > > > > > > > shut down
>>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > > Initiating
>>> > > > > > > > client connection,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>>> > > > > > > > sessionTimeout=60000
>>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>>> > > > > > > > baseZNode=/hbase
>>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>>> Process
>>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
>>> connecting
>>> > to
>>> > > > > > > ZooKeeper
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Opening
>>> > > > > > > > socket connection to server
>>> > ip-1010018.ec2.internal/1010019:2181.
>>> > > > > Will
>>> > > > > > > not
>>> > > > > > > > attempt to authenticate using SASL (unknown error)
>>> > > > > > > > 2014-11-14 11:25:02,646 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Socket
>>> > > > > > > > connection established to
>>> ip-1010018.ec2.internal/1010019:2181,
>>> > > > > > > initiating
>>> > > > > > > > session
>>> > > > > > > > 2014-11-14 11:25:02,648 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Session
>>> > > > > > > > establishment complete on server
>>> > > > > ip-1010018.ec2.internal/1010019:2181,
>>> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
>>> > > > > > > > 2014-11-14 11:25:02,703 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > Session:
>>> > > > > > > > 0x348c7017707236c closed
>>> > > > > > > > 2014-11-14 11:25:02,703 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > > > EventThread
>>> > > > > > > > shut down
>>> > > > > > > > 2014-11-14 11:25:30,713 INFO
>>> > > > > > > >
>>> > > >
>>> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
>>> > > > > > Skip
>>> > > > > > > > merging regions
>>> > > > > > > >
>>> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
>>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
>>> > > qualifier
>>> > > > > > > > 2014-11-14 11:25:41,383 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > > Initiating
>>> > > > > > > > client connection,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>>> > > > > > > > sessionTimeout=60000
>>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>>> > > > > > > > baseZNode=/hbase
>>> > > > > > > > 2014-11-14 11:25:41,384 INFO
>>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>>> Process
>>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
>>> connecting
>>> > to
>>> > > > > > > ZooKeeper
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>>> > > > > > > > 2014-11-14 11:25:41,384 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Opening
>>> > > > > > > > socket connection to server
>>> > ip-1010018.ec2.internal/1010019:2181.
>>> > > > > Will
>>> > > > > > > not
>>> > > > > > > > attempt to authenticate using SASL (unknown error)
>>> > > > > > > > 2014-11-14 11:25:41,386 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Socket
>>> > > > > > > > connection established to
>>> ip-1010018.ec2.internal/1010019:2181,
>>> > > > > > > initiating
>>> > > > > > > > session
>>> > > > > > > > 2014-11-14 11:25:41,389 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Session
>>> > > > > > > > establishment complete on server
>>> > > > > ip-1010018.ec2.internal/1010019:2181,
>>> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
>>> > > > > > > > 2014-11-14 11:25:41,397 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > Session:
>>> > > > > > > > 0x348c7017707236e closed
>>> > > > > > > > 2014-11-14 11:25:41,398 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > > > EventThread
>>> > > > > > > > shut down
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ------------------------------------------------------------------------------------------------------------------------------------
>>> > > > > > > >
>>> > > > > > > > Regards,
>>> > > > > > > > Shahab
>>> > > > > > > >
>>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
>>> [email protected]>
>>> > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > Looking at DispatchMergingRegionHandler, it does some
>>> check
>>> > > > before
>>> > > > > > > > > initiating the merge.
>>> > > > > > > > > e.g.:
>>> > > > > > > > >
>>> > > > > > > > >       LOG.info("Skip merging regions " +
>>> > > > > > > region_a.getRegionNameAsString()
>>> > > > > > > > >
>>> > > > > > > > >           + ", " + region_b.getRegionNameAsString() + ",
>>> > > because
>>> > > > > > > region "
>>> > > > > > > > >
>>> > > > > > > > >           + (regionAHasMergeQualifier ?
>>> > > > region_a.getEncodedName() :
>>> > > > > > > > > region_b
>>> > > > > > > > >
>>> > > > > > > > >               .getEncodedName()) + " has merge
>>> qualifier");
>>> > > > > > > > >
>>> > > > > > > > > Can you take a look at master log around the time merge
>>> > request
>>> > > > was
>>> > > > > > > > issued
>>> > > > > > > > > to see if you can get some clue ?
>>> > > > > > > > >
>>> > > > > > > > > Cheers
>>> > > > > > > > >
>>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
>>> > > > > > [email protected]>
>>> > > > > > > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > > > The documentation of online merge tool (merge_region)
>>> > states
>>> > > > that
>>> > > > > > if
>>> > > > > > > we
>>> > > > > > > > > > forcibly merge regions (by setting the 3rd attribute as
>>> > true)
>>> > > > > then
>>> > > > > > it
>>> > > > > > > > can
>>> > > > > > > > > > create overlapping regions. if this happens then will
>>> this
>>> > > > render
>>> > > > > > the
>>> > > > > > > > > > region or table unusable or it is just a performance
>>> hit? I
>>> > > > mean
>>> > > > > > how
>>> > > > > > > > > bigger
>>> > > > > > > > > > of a deal it is?
>>> > > > > > > > > >
>>> > > > > > > > > > Actually, we are merging regions using the
>>> programmatic API
>>> > > for
>>> > > > > > this
>>> > > > > > > > and
>>> > > > > > > > > > setting this flag ('forcible') as false. But for some
>>> > tables
>>> > > > (we
>>> > > > > > > > haven't
>>> > > > > > > > > > figured out a pattern yet, data is still accessible),
>>> merge
>>> > > of
>>> > > > > > > regions
>>> > > > > > > > do
>>> > > > > > > > > > not happen at all. Afterwards we tried with this flag =
>>> > true,
>>> > > > and
>>> > > > > > it
>>> > > > > > > > > still
>>> > > > > > > > > > doesn't merge them.
>>> > > > > > > > > >
>>> > > > > > > > > > CDH 5.1.0
>>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
>>> > > > > > > > > >
>>> > > > > > > > > > Regards,
>>> > > > > > > > > > Shahab
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Forcibly merging regions

Reply via email to