Yesterday, I believe. Regards, Shahab
On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <[email protected]> wrote: > Shahab: > When was the last time compaction was run on this table ? > > Cheers > > On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <[email protected]> > wrote: > > > I see. Thanks. > > > > And if the region indeed has references, then can we somehow forcibly > > remove them? Is this even possible (if not advisable)? Basically what I > am > > trying to ask is that let us say we do hit this scenario and we know it > is > > OK to go ahead and merge. What steps can we follow after detection of > such > > unwanted references. > > > > Regards, > > Shahab > > > > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <[email protected]> wrote: > > > > > For automated detection of such scenario, you can reference the code in > > > CatalogJanitor#cleanMergeRegion(): > > > > > > regionFs = HRegionFileSystem.openRegionFromFileSystem( > > > > > > this.services.getConfiguration(), fs, tabledir, mergedRegion, > > > true > > > ); > > > > > > ... > > > > > > Then regionFs.hasReferences(htd) would tell you whether the underlying > > > region has reference files. > > > Cheers > > > > > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <[email protected]> > > > wrote: > > > > > > > No. Not that I can recall but I can check. > > > > > > > > From resolution perspective, is there any way we can resolve this. > More > > > > importantly, anyway we can automate the resolution, if we run into > such > > > > issues in future? 'Cleaning the qualifier', that is. > > > > > > > > Regards, > > > > Shahab > > > > > > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <[email protected]> > wrote: > > > > > > > > > One possibility was that region 7373f75181c71eb5061a6673cee15931 > was > > > > > involved in some hbase snapshot. > > > > > > > > > > Was the underlying table being snapshotted in recent past ? > > > > > > > > > > Cheers > > > > > > > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus < > > [email protected]> > > > > > wrote: > > > > > > > > > > > Thanks again. > > > > > > > > > > > > But I have been polling for a while and it still doesn't merge. I > > > mean > > > > > this > > > > > > particular region example that I sent you, I am trying to merge > it > > > > since > > > > > > yesterday. I ran the polling-base code all night and I have to > kill > > > it. > > > > > > Then in the morning, I tried manual merging through hbase shell > and > > > it > > > > > > still doesn't merge. Note that the current polling logic doesnot > > try > > > to > > > > > > call merge again. It just checks the region size. > > > > > > > > > > > > So how to clean it then? Or actually make it merge? Plus is this > > > > > something > > > > > > expected (a region keeping a reference)? How can we avoid it? > > > > > > > > > > > > Note that this is not limited to this table only. We are seeing > > this > > > in > > > > > > other regions of other tables as well. Are we merging too fast? > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > Shahab > > > > > > > > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <[email protected]> > > > wrote: > > > > > > > > > > > > > Polling as you described is fine. > > > > > > > > > > > > > > catalogJanitor.cleanMergeQualifier() is called by > > > > > > > DispatchMergingRegionHandler. > > > > > > > > > > > > > > If clean was successful, you would see the following: > > > > > > > > > > > > > > LOG.debug("Deleting region " + > > > regionA.getRegionNameAsString() > > > > + > > > > > " > > > > > > > and " > > > > > > > > > > > > > > + regionB.getRegionNameAsString() > > > > > > > > > > > > > > + " from fs because merged region no longer holds > > > > > references"); > > > > > > > > > > > > > > Assuming there was no log below in your master log: > > > > > > > > > > > > > > LOG.error("Merged region " + > region.getRegionNameAsString() > > > > > > > > > > > > > > + " has only one merge qualifier in META."); > > > > > > > > > > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931 > still > > > had > > > > > > > reference file. > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus < > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Ted. > > > > > > > > > > > > > > > > The log bit is below at the end of the email. This is the > > command > > > > to > > > > > > > merge > > > > > > > > that I gave just now through hbase shell. forcible was false > > but > > > it > > > > > > > behaves > > > > > > > > similarly if forcible is true too. This is from master log. > > > Indeed > > > > > the > > > > > > > > region merging was skipped! What does this mean? Data seems > to > > be > > > > > > intact > > > > > > > > for this table. > > > > > > > > > > > > > > > > Just to give you a background. This table was first merge by > > the > > > > auto > > > > > > > mated > > > > > > > > java application. What we are doing is that we are merging > > tables > > > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i > async, > > > we > > > > > poll > > > > > > > for > > > > > > > > the number of regions getting lowered after this merge call. > > The > > > > > > > > application hangs and continues polling for ever as the > > previous > > > > > merge > > > > > > > > didn't happen. > > > > > > > > > > > > > > > > In this poll loop, we do get the number of regions by a fresh > > > call > > > > to > > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize(). > > > > > > > > > > > > > > > > What are these merge qualifiers and what are we doing wrong > or > > > > should > > > > > > do? > > > > > > > > > > > > > > > > In the polling loop we can somehow retry merge again? But how > > can > > > > we > > > > > > > know, > > > > > > > > that we need to call merge again as it works for some > regions. > > Is > > > > the > > > > > > > table > > > > > > > > meta corrupted for some reason by the above logic? > > > > > > > > > > > > > > > > Thanks a lot. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper: > > > > Session: > > > > > > > > 0x348c7017707236b closed > > > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn: > > > > > > EventThread > > > > > > > > shut down > > > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper: > > > > > Initiating > > > > > > > > client connection, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181 > > > > > > > > sessionTimeout=60000 > > > > > watcher=catalogtracker-on-hconnection-0x47d865f2, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181, > > > > > > > > baseZNode=/hbase > > > > > > > > 2014-11-14 11:25:02,645 INFO > > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: > Process > > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 > connecting > > to > > > > > > > ZooKeeper > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181 > > > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn: > > > > Opening > > > > > > > > socket connection to server > > ip-1010018.ec2.internal/1010019:2181. > > > > > Will > > > > > > > not > > > > > > > > attempt to authenticate using SASL (unknown error) > > > > > > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: > > > > Socket > > > > > > > > connection established to > ip-1010018.ec2.internal/1010019:2181, > > > > > > > initiating > > > > > > > > session > > > > > > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn: > > > > Session > > > > > > > > establishment complete on server > > > > > ip-1010018.ec2.internal/1010019:2181, > > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000 > > > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper: > > > > Session: > > > > > > > > 0x348c7017707236c closed > > > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn: > > > > > > EventThread > > > > > > > > shut down > > > > > > > > 2014-11-14 11:25:30,713 INFO > > > > > > > > > > > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler: > > > > > > Skip > > > > > > > > merging regions > > > > > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931., > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096., > > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge > > > qualifier > > > > > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper: > > > > > Initiating > > > > > > > > client connection, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181 > > > > > > > > sessionTimeout=60000 > > > > > watcher=catalogtracker-on-hconnection-0x47d865f2, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181, > > > > > > > > baseZNode=/hbase > > > > > > > > 2014-11-14 11:25:41,384 INFO > > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: > Process > > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 > connecting > > to > > > > > > > ZooKeeper > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181 > > > > > > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn: > > > > Opening > > > > > > > > socket connection to server > > ip-1010018.ec2.internal/1010019:2181. > > > > > Will > > > > > > > not > > > > > > > > attempt to authenticate using SASL (unknown error) > > > > > > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: > > > > Socket > > > > > > > > connection established to > ip-1010018.ec2.internal/1010019:2181, > > > > > > > initiating > > > > > > > > session > > > > > > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn: > > > > Session > > > > > > > > establishment complete on server > > > > > ip-1010018.ec2.internal/1010019:2181, > > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000 > > > > > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper: > > > > Session: > > > > > > > > 0x348c7017707236e closed > > > > > > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn: > > > > > > EventThread > > > > > > > > shut down > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------ > > > > > > > > > > > > > > > > Regards, > > > > > > > > Shahab > > > > > > > > > > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > > > > > Looking at DispatchMergingRegionHandler, it does some check > > > > before > > > > > > > > > initiating the merge. > > > > > > > > > e.g.: > > > > > > > > > > > > > > > > > > LOG.info("Skip merging regions " + > > > > > > > region_a.getRegionNameAsString() > > > > > > > > > > > > > > > > > > + ", " + region_b.getRegionNameAsString() + ", > > > because > > > > > > > region " > > > > > > > > > > > > > > > > > > + (regionAHasMergeQualifier ? > > > > region_a.getEncodedName() : > > > > > > > > > region_b > > > > > > > > > > > > > > > > > > .getEncodedName()) + " has merge qualifier"); > > > > > > > > > > > > > > > > > > Can you take a look at master log around the time merge > > request > > > > was > > > > > > > > issued > > > > > > > > > to see if you can get some clue ? > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus < > > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > The documentation of online merge tool (merge_region) > > states > > > > that > > > > > > if > > > > > > > we > > > > > > > > > > forcibly merge regions (by setting the 3rd attribute as > > true) > > > > > then > > > > > > it > > > > > > > > can > > > > > > > > > > create overlapping regions. if this happens then will > this > > > > render > > > > > > the > > > > > > > > > > region or table unusable or it is just a performance > hit? I > > > > mean > > > > > > how > > > > > > > > > bigger > > > > > > > > > > of a deal it is? > > > > > > > > > > > > > > > > > > > > Actually, we are merging regions using the programmatic > API > > > for > > > > > > this > > > > > > > > and > > > > > > > > > > setting this flag ('forcible') as false. But for some > > tables > > > > (we > > > > > > > > haven't > > > > > > > > > > figured out a pattern yet, data is still accessible), > merge > > > of > > > > > > > regions > > > > > > > > do > > > > > > > > > > not happen at all. Afterwards we tried with this flag = > > true, > > > > and > > > > > > it > > > > > > > > > still > > > > > > > > > > doesn't merge them. > > > > > > > > > > > > > > > > > > > > CDH 5.1.0 > > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0) > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Shahab > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
