[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck
[ https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101014#comment-13101014 ] Hudson commented on HBASE-4007: --- Integrated in HBase-TRUNK #2192 (See [https://builds.apache.org/job/HBase-TRUNK/2192/]) HBASE-4007 distributed log splitting can get indefinitely stuck stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java distributed log splitting can get indefinitely stuck Key: HBASE-4007 URL: https://issues.apache.org/jira/browse/HBASE-4007 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch After the configured number of retries SplitLogManager is not going to resubmit log-split tasks. In this situation even if the splitLogWorker that owns the task dies the task will not get resubmitted. When a regionserver goes away then all the split-log tasks that it owned should be resubmitted by the SplitLogMaster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4350) Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that caught it
[ https://issues.apache.org/jira/browse/HBASE-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101015#comment-13101015 ] Hudson commented on HBASE-4350: --- Integrated in HBase-TRUNK #2192 (See [https://builds.apache.org/job/HBase-TRUNK/2192/]) HBASE-4350 Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that caught it stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that caught it --- Key: HBASE-4350 URL: https://issues.apache.org/jira/browse/HBASE-4350 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.92.0 Attachments: 0001-TestMultiColumnScanner-and-Bloom-filter-fix.patch Nicolas pointed out to me that the new unit test TestMultiColumnScanner that I wrote for the multi-column scanner Bloom filter optimization (which we will soon release) did not pass on the open-source trunk, and it bisected down to the HFile v2 commit. I debugged the unit test and found that there was a serious bug in HFile v2 Bloom filter lookup not caught by any of the existing unit tests: Bloom filters were used for non-Get Scans, which did not have minimum/maximum row set correctly, and some scan results were not returned. This diff is the unit test that helped catch the problem and a one-line fix for the bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4359) Show dead RegionServer names in the HMaster info page
[ https://issues.apache.org/jira/browse/HBASE-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-4359: --- Attachment: HBase Master UI - Dead Servers.png HBASE-4359.r1.diff Jamon+tests patch that adds in the improvement. Please review! I ran the updated master status servlet test. {code} - mvn -Dtest=TestMasterStatusServlet test --- T E S T S --- Running org.apache.hadoop.hbase.master.TestMasterStatusServlet Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.93 sec Results : Tests run: 5, Failures: 0, Errors: 0, Skipped: 0 {code} I also ran the build manually via bin/start-hbase.sh with proper config and HDFS running. The screenshot shows the implementation. Thanks in advance for reviews! Show dead RegionServer names in the HMaster info page - Key: HBASE-4359 URL: https://issues.apache.org/jira/browse/HBASE-4359 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 0.94.0 Attachments: HBASE-4359.r1.diff, HBase Master UI - Dead Servers.png Unlike other components of the cluster, like NameNode and JobTracker pages, the HMaster's info page does not show any data on dead region servers. While an RS is stateless being a good reason not to count dead nodes, I think having a list of dead nodes helps in cases where an administrator would want to find out which nodes are missing out on RS action (hey, everyone likes consistently spiking graphs! ;)). Following HBASE-3580, I think it makes sense to have a list of already maintained dead nodes show up in the info UI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4359) Show dead RegionServer names in the HMaster info page
[ https://issues.apache.org/jira/browse/HBASE-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-4359: --- Status: Patch Available (was: Open) Show dead RegionServer names in the HMaster info page - Key: HBASE-4359 URL: https://issues.apache.org/jira/browse/HBASE-4359 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 0.94.0 Attachments: HBASE-4359.r1.diff, HBase Master UI - Dead Servers.png Unlike other components of the cluster, like NameNode and JobTracker pages, the HMaster's info page does not show any data on dead region servers. While an RS is stateless being a good reason not to count dead nodes, I think having a list of dead nodes helps in cases where an administrator would want to find out which nodes are missing out on RS action (hey, everyone likes consistently spiking graphs! ;)). Following HBASE-3580, I think it makes sense to have a list of already maintained dead nodes show up in the info UI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4359) Show dead RegionServer names in the HMaster info page
[ https://issues.apache.org/jira/browse/HBASE-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-4359: --- Attachment: HBase Master UI - Dead Servers (Yes, still dead).png HBASE-4359.r2.diff Newer patch after a chat with Todd. Made it more consistent with the online listing. In brightest day, in darkest night, no test case shall escape my sight: {code} --- T E S T S --- --- T E S T S --- Running org.apache.hadoop.hbase.master.TestMasterStatusServlet Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.028 sec Results : Tests run: 5, Failures: 0, Errors: 0, Skipped: 0 {code} Also upped a new UI image after manual testing. Show dead RegionServer names in the HMaster info page - Key: HBASE-4359 URL: https://issues.apache.org/jira/browse/HBASE-4359 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 0.94.0 Attachments: HBASE-4359.r1.diff, HBASE-4359.r2.diff, HBase Master UI - Dead Servers (Yes, still dead).png, HBase Master UI - Dead Servers.png Unlike other components of the cluster, like NameNode and JobTracker pages, the HMaster's info page does not show any data on dead region servers. While an RS is stateless being a good reason not to count dead nodes, I think having a list of dead nodes helps in cases where an administrator would want to find out which nodes are missing out on RS action (hey, everyone likes consistently spiking graphs! ;)). Following HBASE-3580, I think it makes sense to have a list of already maintained dead nodes show up in the info UI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4360) Maintain information on the time a RS went dead
Maintain information on the time a RS went dead --- Key: HBASE-4360 URL: https://issues.apache.org/jira/browse/HBASE-4360 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.94.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.94.0 Just something that'd be generally helpful, is to maintain DeadServer info with the last timestamp when it was determined as dead. Makes it easier to hunt the logs, and I don't think its much too expensive to maintain (one additional update per dead determination). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4313) Refactor TestHBaseFsck to make adding individual hbck tests easier
[ https://issues.apache.org/jira/browse/HBASE-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101104#comment-13101104 ] Hudson commented on HBASE-4313: --- Integrated in HBase-TRUNK #2193 (See [https://builds.apache.org/job/HBase-TRUNK/2193/]) HBASE-4313 Refactor TestHBaseFsck to make adding individual hbck tests easier stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java Refactor TestHBaseFsck to make adding individual hbck tests easier -- Key: HBASE-4313 URL: https://issues.apache.org/jira/browse/HBASE-4313 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.5 Attachments: 0001-HBASE-4313-Refactor-TestHBaseFsck-to-make-adding-hbc.patch, 0001-HBASE-4313-Refactor-TestHBaseFsck-to-make-adding-hbc.patch, hbase-4313-trunk.patch The current TestHBaseFsck has one test case that tests multiple things in the same table. This refactor essentially preserves what is tested but isolates each error type so that there is no bleed over in error from table to table. This will also enable the writing of other simple to read tests for other hbck detectable errors. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4350) Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that caught it
[ https://issues.apache.org/jira/browse/HBASE-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101105#comment-13101105 ] Hudson commented on HBASE-4350: --- Integrated in HBase-TRUNK #2193 (See [https://builds.apache.org/job/HBase-TRUNK/2193/]) HBASE-4350 Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that caught it stack : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that caught it --- Key: HBASE-4350 URL: https://issues.apache.org/jira/browse/HBASE-4350 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.92.0 Attachments: 0001-TestMultiColumnScanner-and-Bloom-filter-fix.patch Nicolas pointed out to me that the new unit test TestMultiColumnScanner that I wrote for the multi-column scanner Bloom filter optimization (which we will soon release) did not pass on the open-source trunk, and it bisected down to the HFile v2 commit. I debugged the unit test and found that there was a serious bug in HFile v2 Bloom filter lookup not caught by any of the existing unit tests: Bloom filters were used for non-Get Scans, which did not have minimum/maximum row set correctly, and some scan results were not returned. This diff is the unit test that helped catch the problem and a one-line fix for the bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4301) META migration from 0.90 to trunk fails
[ https://issues.apache.org/jira/browse/HBASE-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101106#comment-13101106 ] Hudson commented on HBASE-4301: --- Integrated in HBase-TRUNK #2193 (See [https://builds.apache.org/job/HBase-TRUNK/2193/]) HBASE-4301 META migration from 0.90 to trunk fails (Subbu Iyer) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java META migration from 0.90 to trunk fails --- Key: HBASE-4301 URL: https://issues.apache.org/jira/browse/HBASE-4301 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Subbu M Iyer Priority: Blocker Fix For: 0.92.0 Attachments: 4301-1-Fixed_Root_migration_to_newer_HRI_format_.patch, 4301-2-Fixed_Root_migration_to_newer_HRI_format_.patch, 4301-Fixed_Root_migration_to_newer_HRI_format_.patch, 4301-v3.txt, 4301-v4.txt, 4301-v7.txt, 4301.txt, 4301_v2.txt, logs.tar.gz, master-log.txt, meta_migrate, meta_trunk, root_migrate, root_trunk I started a trunk cluster as an upgrade from 0.90.4ish, and now I can't scan my .META. table, etc, and other operations fail. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4340) Hbase can't balance.
[ https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101180#comment-13101180 ] gaojinchao commented on HBASE-4340: --- Yes, All test cases have passed. Hbase can't balance. Key: HBASE-4340 URL: https://issues.apache.org/jira/browse/HBASE-4340 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4340_branch90.patch Version: 0.90.4 Cluster : 40 boxes As I saw below logs. It said that balance couldn't work because of a dead RS. I dug deeply and found two issues: 1. shutdownhandler didn't clear numProcessing deal with some exceptions. It seems whatever exceptions we should clear the flag or close master. 2. dead regionserver(s): [158-1-130-12,20020,1314971097929] is inaccurate. The dead sever should be 158-1-130-10,20020,1315068597979 //master logs: 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:18:00,543 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running
[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101200#comment-13101200 ] ramkrishna.s.vasudevan commented on HBASE-4153: --- After HBASE-4015 these are the following changes in my previous observation and pls note that as part of this JIRA the fix will be once we get RegionAlreadyInTransition I will not be moving the memory state to OFFLINE - Open Open Here if the first open region is in progress a) before transition OFFLINE-OPENING or OPENING-OPENED The second open region call will set the data to OFFLINE and there will be a version mismatch when the first RS tries to transit to OPENING and hence the first open region call will fail. So the second open region call will get RegionAlreadyInTransition and its upto the TimeOutMonitor to now open the region as it finds the RIT in PENDING_OPEN b) After transition to OPENED By not moving the inmemory state to OFFLINE on RegionAlreadyIntransition, once a call back comes for OPENED node to Master we can delete the inmemory state (this is already happening) of PENDING_OPEN due to second open region If we leave memory state in OFFLINE as per current behaviour {code} if (regionState == null || (!regionState.isPendingOpen() !regionState.isOpening())) { LOG.warn(Received OPENED for region + prettyPrintedRegionName + from server + data.getOrigin() + but region was in + the state + regionState + and not + in expected PENDING_OPEN or OPENING states); return; } {code} . This is the major problem i see. - Close Open As per my previous analysis a) before transition from CLOSING to CLOSED when an open call arrives while close region is in progress, {code} try { if (ZKAssign.transitionNodeClosed(server.getZooKeeper(), regionInfo, server.getServerName(), expectedVersion) == FAILED) { LOG.warn(Completed the CLOSE of a region but when transitioning from + CLOSING to CLOSED got a version mismatch, someone else clashed + so now unassigning); region.close(); return; } {code} the region will be closed in RS side but the RIT in master will be in PENDING_OPEN due to regionalready in transtition which again the timeoutmonitor will take care of opening the region. b) after setting the node to CLOSED state here once again the assign call will happen as part of CloseRegionProcessing and if a parallel new open region arrives it goes back to Open Open state as described previously. Pls note that in all cases manually through admin assign() and unassign() has been invoked parallely. I am not sure if you guys are planning to handle this scenario totally in a different way as from my above analysis we can infer that things largely depend on the timeoutmonitor for the second operation to be successful. Handle RegionAlreadyInTransitionException in AssignmentManager -- Key: HBASE-4153 URL: https://issues.apache.org/jira/browse/HBASE-4153 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Comment from Stack over in HBASE-3741: {quote} Question: Looking at this patch again, if we throw a RegionAlreadyInTransitionException, won't we just assign the region elsewhere though RegionAlreadyInTransitionException in at least one case here is saying that the region is already open on this regionserver? {quote} Indeed looking at the code it's going to be handled the same way other exceptions are. Need to add special cases for assign and unassign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4212) TestMasterFailover fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101213#comment-13101213 ] gaojinchao commented on HBASE-4212: --- @Stack, Thanks for your review. In our environment, it often fails, so we skip this case(for my case is that all test cases are performed automatically every day). The step for opening a root region: step A: Master tells Region server to open root region. step B: Region server opens root region and sets zk node(rootServerZNodezk). It is finished means that catalogtracker can works. step C: Region server updates the zk node(assignmentZNode) tells master that root has opened(some cases may fail, but we have told the root could be used). step D: Master deletes the zk node (assignmentZNode) and adds root region to online set. In my case, master skipped the step D because delayed. master forced root region online in processFailover. So zk node couldn't be deleted and failover case failed. finishInitialization code: // Make sure root and meta assigned before proceeding. assignRootAndMeta(); // Is this fresh start with no regions assigned or are we a master joining // an already-running cluster? If regionsCount == 0, then for sure a // fresh start. TOOD: Be fancier. If regionsCount == 2, perhaps the // 2 are .META. and -ROOT- and we should fall into the fresh startup // branch below. For now, do processFailover. if (regionCount == 0) { LOG.info(Master startup proceeding: cluster startup); this.assignmentManager.cleanoutUnassigned(); this.assignmentManager.assignAllUserRegions(); } else { LOG.info(Master startup proceeding: master failover); this.assignmentManager.processFailover(); } processFailover code: HServerInfo hsi = this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation()); regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi); hsi = this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation()); regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi); TestMasterFailover fails occasionally - Key: HBASE-4212 URL: https://issues.apache.org/jira/browse/HBASE-4212 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4212_TrunkV1.patch, HBASE-4212_branch90V1.patch It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever. void processFailover() throws KeeperException, IOException, InterruptedException { // we enforce on-line root. HServerInfo hsi = this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation()); regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi); hsi = this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation()); regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi); It seems that we should wait finished as meta region int assignRootAndMeta() throws InterruptedException, IOException, KeeperException { int assigned = 0; long timeout = this.conf.getLong(hbase.catalog.verification.timeout, 1000); // Work on ROOT region. Is it in zk in transition? boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); if (!catalogTracker.verifyRootRegionLocation(timeout)) { this.assignmentManager.assignRoot(); this.catalogTracker.waitForRoot(); //we need add this code and guarantee that the transition has completed this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO); assigned++; } logs: 2011-08-16 07:45:40,715 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,715 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2011-08-16 07:45:40,715 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,716 INFO [PostOpenDeployTasks:70236052] catalog.RootLocationEditor(62): Setting ROOT region location in ZooKeeper as C4S2.site:47710 2011-08-16 07:45:40,716 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved
[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101231#comment-13101231 ] Ted Yu commented on HBASE-4153: --- So we should handle (Open Open) case b. Thanks for the analysis Ramkrishna. Handle RegionAlreadyInTransitionException in AssignmentManager -- Key: HBASE-4153 URL: https://issues.apache.org/jira/browse/HBASE-4153 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Comment from Stack over in HBASE-3741: {quote} Question: Looking at this patch again, if we throw a RegionAlreadyInTransitionException, won't we just assign the region elsewhere though RegionAlreadyInTransitionException in at least one case here is saying that the region is already open on this regionserver? {quote} Indeed looking at the code it's going to be handled the same way other exceptions are. Need to add special cases for assign and unassign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4357) Region in transition - in closing state
[ https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101273#comment-13101273 ] Ming Ma commented on HBASE-4357: Stack, it is the trunk. I don't know the root cause yet. Region in transition - in closing state --- Key: HBASE-4357 URL: https://issues.apache.org/jira/browse/HBASE-4357 Project: HBase Issue Type: Bug Reporter: Ming Ma Got the following during testing, 1. On a given machine, kill RS process id. Then kill HMaster process id. 2. Start RS first via bin/hbase-daemon.sh --config ./conf start regionserver.. Start HMaster via bin/hbase-daemon.sh --config ./conf start master. One region of a table stayed in closing state. According to zookeeper, 794a6ff17a4de0dd0a19b984ba18eea9 miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), server=sea-esxi-0,6,1315428682281 According to .META. table, the region has been assigned to from sea-esxi-0 to sea-esxi-4. miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. sea-esxi-4:60030 H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2195) Support cyclic replication
[ https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101379#comment-13101379 ] Lars Hofhansl commented on HBASE-2195: -- Should CopyTable be made aware of Master - Master scenarios as well? Otherwise everything that CopyTable copies from MasterI to MasterII is replicated back to the MasterI once. At least maybe it should be added to the documentation (i.e. setup Master - Master replication after CopyTable is finished). Support cyclic replication -- Key: HBASE-2195 URL: https://issues.apache.org/jira/browse/HBASE-2195 Project: HBase Issue Type: Sub-task Components: replication Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 2195-v5.txt, 2195-v6.txt, 2195.txt We need to support cyclic replication by using the cluster id of each HlogKey and stop replicating when it goes back to the original cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2195) Support cyclic replication
[ https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-2195: - Affects Version/s: 0.92.0 Fix Version/s: 0.92.0 Support cyclic replication -- Key: HBASE-2195 URL: https://issues.apache.org/jira/browse/HBASE-2195 Project: HBase Issue Type: Sub-task Components: replication Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 2195-v5.txt, 2195-v6.txt, 2195.txt We need to support cyclic replication by using the cluster id of each HlogKey and stop replicating when it goes back to the original cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4354) track region history
[ https://issues.apache.org/jira/browse/HBASE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101412#comment-13101412 ] Andrew Purtell commented on HBASE-4354: --- bq. There may have been deadlocks too around updating history while trying to do edits in .META. but my memory may not be serving me right here Yes. bq. The natural place to do this stuff would be in a table inside hbase I'd think. The mistake we made last time IMHO was making region historian updating synchronous with the transitions. If we instead log the transitions to a table in a background thread (executor?) with best effort, the result could be viable. track region history Key: HBASE-4354 URL: https://issues.apache.org/jira/browse/HBASE-4354 Project: HBase Issue Type: New Feature Components: master, metrics, regionserver Reporter: Ming Ma Assignee: Ming Ma For debugging and analysis purposes it will be useful to understand regions' lifecycle, how it is created ( from which parent region, for example), how it is splitted, assigned, etc. Some of these info are in the logs, hbase .META. table, zookeeper, metrics. Certain history data is lost; for example, the states will be removed from zookeeper /hbase/unassigned once the region is assigned; also .META. table has max version of 10 thus only tracks the last 10 RS assignments of a given region. It will be nice to put it a central place. It can provide: 1. How applications use hbase. For example, it might create large number of regions in a short period of time and drop the table later. 2. How HBase internally manage regions such as how regions are splitted, assigned, turned offline, etc. Things to track 1. How it is created, parent region in the case of split. 2. Region tranisition process such as region state change, region server change. One idea is to put such transition history data to zookeeper. One issue is it could blow up zookeeper memory if we have large number of regions and the cluster runs for a long time. I would like to get your feedback on different approaches to address the issue. One assumption is region assignment doesn't happen with high frequency and thus the overhead introduced won't have much impact on the system performance. Approach 1: Zookeeper knows the history of how /hbase/unassigned is modified, if we can get zookeeper's logs (Bookkeeper ? ) somehow, we know the history of region transition. Approach 2: 1.HBase logs extra region transition data to zookeeper. It could be one zookeeper node per transaction. 2.Have a separate thread on the Master to move data from zookeeper and append to HDFS. That will keep the zookeeper size in check. 3.Have some tool or web UI to show the history of a given region by looking at zookeeper and HDFS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101457#comment-13101457 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- bq. On 2011-09-09 02:13:22, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java, line 63 bq. https://reviews.apache.org/r/1768/diff/1/?file=38944#file38944line63 bq. bq. addFamily() can perform overwrite. bq. Better add more javadoc. I'm not clear on exactly what you mean here. If it's that addFamily() will replace the old family descriptor with the new one, it seems like that would be expected behavior for a modify family handler. - Riley --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1826 --- On 2011-09-09 18:39:05, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 18:39:05) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION bq./src/main/ruby/hbase/admin.rb 1166933 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 bq. bq. Diff: https://reviews.apache.org/r/1768/diff bq. bq. bq. Testing bq. --- bq. bq. Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. bq. bq. bq. Thanks, bq. bq. Riley bq. bq. Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement Components: ipc, master, shell Affects Versions: 0.92.0 Reporter: Riley Patterson Assignee: Riley Patterson
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101456#comment-13101456 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- bq. On 2011-09-09 03:40:32, Lars Hofhansl wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 949 bq. https://reviews.apache.org/r/1768/diff/1/?file=38934#file38934line949 bq. bq. Coprocessors can modify these lists,right? bq. The list returned by Arrays.asList(...) is fixed-length, i.e. the coprocessor can neither add nor remove entries. bq. bq. If that's OK you can consider Collections.singletonList(column) instead. bq. bq. If it's not OK - which I think is the case - this probably needs to be new ArrayList(Collections.singleton(column)) bq. Makes sense. Good catch. - Riley --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1827 --- On 2011-09-09 18:39:05, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 18:39:05) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION bq./src/main/ruby/hbase/admin.rb 1166933 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 bq. bq. Diff: https://reviews.apache.org/r/1768/diff bq. bq. bq. Testing bq. --- bq. bq. Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. bq. bq. bq. Thanks, bq. bq. Riley bq. bq. Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101462#comment-13101462 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- bq. On 2011-09-09 03:40:33, Michael Stack wrote: bq. Looks good to me. All the table mod tests still pass though they go via a different path now? All unit tests pass. - Riley --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1828 --- On 2011-09-09 18:39:05, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 18:39:05) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION bq./src/main/ruby/hbase/admin.rb 1166933 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 bq. bq. Diff: https://reviews.apache.org/r/1768/diff bq. bq. bq. Testing bq. --- bq. bq. Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. bq. bq. bq. Thanks, bq. bq. Riley bq. bq. Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement Components: ipc, master, shell Affects Versions: 0.92.0 Reporter: Riley Patterson Assignee: Riley Patterson Priority: Minor Attachments: HBASE-4358.patch Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101461#comment-13101461 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- bq. On 2011-09-09 04:38:24, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java, line 75 bq. https://reviews.apache.org/r/1768/diff/1/?file=38945#file38945line75 bq. bq. IOE may pop up from any operation. bq. I think we should document that we adopt fail fast strategy. bq. bq. Personally I think we should catch and store one InvalidFamilyOperationException, if any pops up. bq. After completing all operations, we throw the stored InvalidFamilyOperationException. bq. How about we just allocate a new HTableDescriptor object that we pass to updateTableDescriptor? Then we can document that if there is even a single exception, no changes were made. The updates to the FS and region restarts don't occur until after the potential IOExceptions due to updateTableDescriptor. - Riley --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1830 --- On 2011-09-09 18:39:05, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 18:39:05) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION bq./src/main/ruby/hbase/admin.rb 1166933 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 bq. bq. Diff: https://reviews.apache.org/r/1768/diff bq. bq. bq. Testing bq. --- bq. bq. Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. bq. bq. bq. Thanks, bq. bq. Riley bq. bq. Batch Table
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101490#comment-13101490 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1841 --- Can you perform testing on a small, real cluster ? /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java https://reviews.apache.org/r/1768/#comment4212 Where does this method call end up in this patch ? /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java https://reviews.apache.org/r/1768/#comment4213 I meant we should document that addFamily() performs modification here. This is minor. - Ted On 2011-09-09 18:39:05, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 18:39:05) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION bq./src/main/ruby/hbase/admin.rb 1166933 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 bq. bq. Diff: https://reviews.apache.org/r/1768/diff bq. bq. bq. Testing bq. --- bq. bq. Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. bq. bq. bq. Thanks, bq. bq. Riley bq. bq. Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement Components: ipc, master, shell Affects Versions: 0.92.0 Reporter: Riley Patterson Assignee: Riley Patterson Priority: Minor
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101503#comment-13101503 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- bq. On 2011-09-09 19:00:11, Ted Yu wrote: bq. Can you perform testing on a small, real cluster ? Will do with this next revision. bq. On 2011-09-09 19:00:11, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java, line 64 bq. https://reviews.apache.org/r/1768/diff/1/?file=38941#file38941line64 bq. bq. Where does this method call end up in this patch ? It is not in the patch - its functionality is redundant with TableFamilyHandler.handleTableOperation(). Both the MasterFileSystem's services and the services passed to the handlers are simply a reference to the master itself, and both run getTableDescriptors().add() and getTableDescriptors().get() on the reference. - Riley --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1841 --- On 2011-09-09 18:39:05, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 18:39:05) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION bq./src/main/ruby/hbase/admin.rb 1166933 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 bq. bq. Diff: https://reviews.apache.org/r/1768/diff bq. bq. bq. Testing bq. --- bq. bq. Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. bq. bq. bq. Thanks, bq. bq. Riley bq. bq. Batch Table Alter Operations Key: HBASE-4358 URL:
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101515#comment-13101515 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1843 --- /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java https://reviews.apache.org/r/1768/#comment4215 After HBASE-451, changes to table descriptor have to be persisted to HDFS. I browsed handleTableOperation() methods in this patch and didn't find that. - Ted On 2011-09-09 18:39:05, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 18:39:05) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION bq./src/main/ruby/hbase/admin.rb 1166933 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 bq. bq. Diff: https://reviews.apache.org/r/1768/diff bq. bq. bq. Testing bq. --- bq. bq. Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. bq. bq. bq. Thanks, bq. bq. Riley bq. bq. Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement Components: ipc, master, shell Affects Versions: 0.92.0 Reporter: Riley Patterson Assignee: Riley Patterson Priority: Minor Attachments: HBASE-4358.patch Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests.
[jira] [Updated] (HBASE-4194) RegionSplitter: Split on under-loaded region servers first
[ https://issues.apache.org/jira/browse/HBASE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4194: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) This was committed by Ted a while back. Resolving. RegionSplitter: Split on under-loaded region servers first -- Key: HBASE-4194 URL: https://issues.apache.org/jira/browse/HBASE-4194 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.92.0 Attachments: HBASE-4194.patch When running RegionSplitter, our app devs noticed that they were getting a lot of NSREs. This is caused by 2 factors: 1. the split itself will cause an NSRE 2. any load balancing will cause one. The former cannot be helped. We can more tightly control load balancing though. Instead of doing a name-sorted round-robin split across RS in the tier, we could sort the RS's by region count. That way, we only split an RS with 10 regions after there are no more RS with 9 regions. This will prevent the load balancing slop from kicking in and will fix the problem where restarting RegionSplitter always starts splitting at RS #1. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4243) HADOOP_HOME should be auto-detected
[ https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4243: - Fix Version/s: (was: 0.92.0) HADOOP_HOME should be auto-detected --- Key: HBASE-4243 URL: https://issues.apache.org/jira/browse/HBASE-4243 Project: HBase Issue Type: Improvement Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Priority: Minor Attachments: HBASE-4243.patch.txt Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect the HADOOP_HOME setting if it is not given explicitly. Something along the lines of: {noformat} # check for hadoop in the path 141 HADOOP_IN_PATH=`which hadoop 2/dev/null` 142 if [ -f ${HADOOP_IN_PATH} ]; then 143 HADOOP_DIR=`dirname $HADOOP_IN_PATH`/.. 144 fi 145 # HADOOP_HOME env variable overrides hadoop in the path 146 HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR} 147 if [ $HADOOP_HOME == ]; then 148 echo Cannot find hadoop installation: \$HADOOP_HOME must be set or hadoop must be in the path; 149 exit 4; 150 fi {noformat} Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut
[ https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101523#comment-13101523 ] stack commented on HBASE-4347: -- Mind running tests Lars? I can fix license on commit. Remove duplicated code from Put, Delete, Get, Scan, MultiPut Key: HBASE-4347 URL: https://issues.apache.org/jira/browse/HBASE-4347 Project: HBase Issue Type: Improvement Components: util Affects Versions: 0.92.0 Reporter: Lars Hofhansl Priority: Minor Fix For: 0.92.0 Attachments: 4347-v2.txt, 4347.txt This came from discussion with Stack w.r.t. HBASE-2195. There is currently a lot of duplicated code especially between Put and Delete, and also between all Operations. For example all of Put/Delete/Get/Scan have attributes with exactly the same code in all classes. Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc. One way to do this is to introduce OperationWithAttributes which extends Operation, and have Put/Delete/Get/Scan extend that rather than Operation. In addition Put and Delete could extends from Mutation (which itself would extend OperationWithAttributes). If a static inheritance hierarchy is not desired here, we can use delegation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4301) META migration from 0.90 to trunk fails
[ https://issues.apache.org/jira/browse/HBASE-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4301. -- Resolution: Fixed Hadoop Flags: [Reviewed] Committed to TRUNK yesterday by Ted. Thanks for the patch Subbu (and Ted) and to Sebastian for debugging help. META migration from 0.90 to trunk fails --- Key: HBASE-4301 URL: https://issues.apache.org/jira/browse/HBASE-4301 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Subbu M Iyer Priority: Blocker Fix For: 0.92.0 Attachments: 4301-1-Fixed_Root_migration_to_newer_HRI_format_.patch, 4301-2-Fixed_Root_migration_to_newer_HRI_format_.patch, 4301-Fixed_Root_migration_to_newer_HRI_format_.patch, 4301-v3.txt, 4301-v4.txt, 4301-v7.txt, 4301.txt, 4301_v2.txt, logs.tar.gz, master-log.txt, meta_migrate, meta_trunk, root_migrate, root_trunk I started a trunk cluster as an upgrade from 0.90.4ish, and now I can't scan my .META. table, etc, and other operations fail. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut
[ https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101538#comment-13101538 ] Lars Hofhansl commented on HBASE-4347: -- This change is causing a *significant* slowdown in some of the tests. I must have missed something... Will report back when I found the problem. Remove duplicated code from Put, Delete, Get, Scan, MultiPut Key: HBASE-4347 URL: https://issues.apache.org/jira/browse/HBASE-4347 Project: HBase Issue Type: Improvement Components: util Affects Versions: 0.92.0 Reporter: Lars Hofhansl Priority: Minor Fix For: 0.92.0 Attachments: 4347-v2.txt, 4347.txt This came from discussion with Stack w.r.t. HBASE-2195. There is currently a lot of duplicated code especially between Put and Delete, and also between all Operations. For example all of Put/Delete/Get/Scan have attributes with exactly the same code in all classes. Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc. One way to do this is to introduce OperationWithAttributes which extends Operation, and have Put/Delete/Get/Scan extend that rather than Operation. In addition Put and Delete could extends from Mutation (which itself would extend OperationWithAttributes). If a static inheritance hierarchy is not desired here, we can use delegation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4243) HADOOP_HOME should be auto-detected
[ https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101540#comment-13101540 ] Roman Shaposhnik commented on HBASE-4243: - Sorry about that. I tend to assume Linux these days. What's the level of UNIX API I can count on? POSIX? Or even less than that? Please let me know and I'll update the patch. HADOOP_HOME should be auto-detected --- Key: HBASE-4243 URL: https://issues.apache.org/jira/browse/HBASE-4243 Project: HBase Issue Type: Improvement Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Priority: Minor Attachments: HBASE-4243.patch.txt Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect the HADOOP_HOME setting if it is not given explicitly. Something along the lines of: {noformat} # check for hadoop in the path 141 HADOOP_IN_PATH=`which hadoop 2/dev/null` 142 if [ -f ${HADOOP_IN_PATH} ]; then 143 HADOOP_DIR=`dirname $HADOOP_IN_PATH`/.. 144 fi 145 # HADOOP_HOME env variable overrides hadoop in the path 146 HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR} 147 if [ $HADOOP_HOME == ]; then 148 echo Cannot find hadoop installation: \$HADOOP_HOME must be set or hadoop must be in the path; 149 exit 4; 150 fi {noformat} Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101551#comment-13101551 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- bq. On 2011-09-09 19:48:45, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java, line 64 bq. https://reviews.apache.org/r/1768/diff/1/?file=38941#file38941line64 bq. bq. After HBASE-451, changes to table descriptor have to be persisted to HDFS. bq. bq. I browsed handleTableOperation() methods in this patch and didn't find that. I'm not familiar enough with how exactly table descriptors are persisted to be able to tell you for certain that this approach correctly ensures persistence. But I can confidently tell you that this diff does everything that the current trunk does with regards to updating table descriptors. If you look at the actual implementation of MasterFileSystem.{add,modify,Delete}Column(), it gets a table descriptor from master services, modifies the table descriptor appropriately, then adds it back to master services' table descriptors. Between handleTableOperation() and updateTableDescriptor(), this patch follows the exact same procedure, and has the same instance of MasterServices. This separation is for the purpose of enabling batching to happen in a way that doesn't leave the system in an intermediate state in the case of a thrown exception. From basic testing, I see that the table descriptor changes are actually being written to my fs. If this procedure is not enough to actually ensure that this happens, we should file a separate JIRA to look into it. - Riley --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1843 --- On 2011-09-09 18:39:05, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 18:39:05) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq.
[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut
[ https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101557#comment-13101557 ] Lars Hofhansl commented on HBASE-4347: -- Found the problem. I had accidentally written readFields(in) instead of readAttributes(in) in Scan.readFields(in). So that would just wait forever for the stream... Running tests now. Remove duplicated code from Put, Delete, Get, Scan, MultiPut Key: HBASE-4347 URL: https://issues.apache.org/jira/browse/HBASE-4347 Project: HBase Issue Type: Improvement Components: util Affects Versions: 0.92.0 Reporter: Lars Hofhansl Priority: Minor Fix For: 0.92.0 Attachments: 4347-v2.txt, 4347.txt This came from discussion with Stack w.r.t. HBASE-2195. There is currently a lot of duplicated code especially between Put and Delete, and also between all Operations. For example all of Put/Delete/Get/Scan have attributes with exactly the same code in all classes. Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc. One way to do this is to introduce OperationWithAttributes which extends Operation, and have Put/Delete/Get/Scan extend that rather than Operation. In addition Put and Delete could extends from Mutation (which itself would extend OperationWithAttributes). If a static inheritance hierarchy is not desired here, we can use delegation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4357) Region in transition - in closing state
[ https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101574#comment-13101574 ] Ming Ma commented on HBASE-4357: Here is the issue. It has nothing to do with master restart. CloseRegionHandler.getCurrentVersion failed. Thus regionserver can't close the region properly. One reason it can't get data from zookeeper could be that there are lots of regions in transition. 11/09/07 17:21:48 WARN handler.CloseRegionHandler: Error getting node's version in CLOSING state, aborting close of miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. Possible fixes: 1. Perhaps CloseRegionHandler.getCurrentVersion should retry on calls to ZKAssign.getVersion? 2. Timeout Monitor doesn't do anything for region that stays in CLOSING state for long. Perhaps it can try to repair it like reissuing a closeregion request in RS? Region in transition - in closing state --- Key: HBASE-4357 URL: https://issues.apache.org/jira/browse/HBASE-4357 Project: HBase Issue Type: Bug Reporter: Ming Ma Got the following during testing, 1. On a given machine, kill RS process id. Then kill HMaster process id. 2. Start RS first via bin/hbase-daemon.sh --config ./conf start regionserver.. Start HMaster via bin/hbase-daemon.sh --config ./conf start master. One region of a table stayed in closing state. According to zookeeper, 794a6ff17a4de0dd0a19b984ba18eea9 miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), server=sea-esxi-0,6,1315428682281 According to .META. table, the region has been assigned to from sea-esxi-0 to sea-esxi-4. miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. sea-esxi-4:60030 H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riley Patterson updated HBASE-4358: --- Attachment: HBASE-4358-v2.patch Addressed comments made on the review board. Cleaned up whitespace. Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement Components: ipc, master, shell Affects Versions: 0.92.0 Reporter: Riley Patterson Assignee: Riley Patterson Priority: Minor Attachments: HBASE-4358-v2.patch, HBASE-4358.patch Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: Before patch: hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} Updating all regions with the new schema... 1/1 regions updated. Done. Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 2.6450 seconds After patch: hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 1.1930 seconds Regions are only brought down once, and the duration is cut 1/N. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101579#comment-13101579 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/ --- (Updated 2011-09-09 21:23:14.655517) Review request for hbase. Changes --- Addressed various comments made here. Summary --- Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: Before patch: hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} Updating all regions with the new schema... 1/1 regions updated. Done. Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 2.6450 seconds After patch: hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 1.1930 seconds Regions are only brought down once, and the duration is cut 1/N. This addresses bug HBASE-4358. https://issues.apache.org/jira/browse/HBASE-4358 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 1166933 /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 /src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 /src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 /src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 /src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 /src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION /src/main/ruby/hbase/admin.rb 1166933 /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 Diff: https://reviews.apache.org/r/1768/diff Testing --- Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. Thanks, Riley Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement Components: ipc, master, shell Affects Versions: 0.92.0 Reporter: Riley Patterson Assignee: Riley Patterson Priority: Minor Attachments: HBASE-4358-v2.patch, HBASE-4358.patch Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: Before patch: hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} Updating all regions with the new schema... 1/1 regions
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101592#comment-13101592 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1846 --- /src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/1768/#comment4218 Missed another Arrays.asList(...) :) /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java https://reviews.apache.org/r/1768/#comment4219 Trailing whitespace :( - Lars On 2011-09-09 21:23:14, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 21:23:14) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION bq./src/main/ruby/hbase/admin.rb 1166933 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 bq. bq. Diff: https://reviews.apache.org/r/1768/diff bq. bq. bq. Testing bq. --- bq. bq. Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. bq. bq. bq. Thanks, bq. bq. Riley bq. bq. Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement Components: ipc, master, shell Affects Versions: 0.92.0 Reporter: Riley Patterson Assignee: Riley Patterson Priority: Minor Attachments: HBASE-4358-v2.patch, HBASE-4358.patch Currently, the RPC provides no way of
[jira] [Created] (HBASE-4361) Certain filter expressions fail in the shell
Certain filter expressions fail in the shell Key: HBASE-4361 URL: https://issues.apache.org/jira/browse/HBASE-4361 Project: HBase Issue Type: Bug Components: filters, shell Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Running the following in the shell hangs and then fails: {noformat} scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') } {noformat} The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) undefined method `write' for true:TrueClass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4361) Certain filter expressions fail in the shell
[ https://issues.apache.org/jira/browse/HBASE-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101604#comment-13101604 ] Todd Lipcon commented on HBASE-4361: After hacking HBase to show a full stack trace: {noformat} org.jruby.exceptions.RaiseException: (NoMethodError) undefined method `write' for true:TrueClassorg.jruby.exceptions.RaiseException: (NoMethodError) undefined method `write' for true:TrueClass at Hbase::Table.scan(/home/todd/git/hbase/bin/../bin/../src/main/ruby/hbase/table.rb:255) at Shell::Commands::Scan.command(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell/commands/scan.rb:61) at Shell::Commands::Scan.command_safe(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell/commands.rb:31) at Shell::Commands::Command.translate_hbase_exceptions(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell/commands.rb:70) at Shell::Commands::Command.command_safe(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell/commands.rb:31) at Shell::Shell.command(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell.rb:106) {noformat} Certain filter expressions fail in the shell Key: HBASE-4361 URL: https://issues.apache.org/jira/browse/HBASE-4361 Project: HBase Issue Type: Bug Components: filters, shell Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Running the following in the shell hangs and then fails: {noformat} scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') } {noformat} The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) undefined method `write' for true:TrueClass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4362) SITE: Center logo
[ https://issues.apache.org/jira/browse/HBASE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4362: - Attachment: site.txt SITE: Center logo - Key: HBASE-4362 URL: https://issues.apache.org/jira/browse/HBASE-4362 Project: HBase Issue Type: Task Reporter: stack Attachments: site.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4354) track region history
[ https://issues.apache.org/jira/browse/HBASE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101611#comment-13101611 ] Ming Ma commented on HBASE-4354: Thanks, Stack, Andy. Writing the data to RegionHistory table in HBASE sounds a good idea. The key point is to make it async as Andy said, or to handle situation when RegionHistory isn't available. 1. Track the regions of RegionHistory. When the regions of RegionHistory are moved around, the write to RegionHistory won't work. 2. Track the regions of -ROOT- and .META.. Ideally we would like to track all regions including those for -ROOT-, .META.. In the case of cluster startup, RegionHistory will be available after -ROOT-, .META.. So to make it work: 1. Make the logging async. 2. If we want to keep every entry even in the case of error like master failover, make the logging reliable. For example, persist the data to zookeeper or HDFS as buffer when RegionHistory isn't available. We could also log it to another hbase cluster. But that will create operational overheads, unless it can be combined with other metrics, logging scenarios ( like OpenTSDB ). track region history Key: HBASE-4354 URL: https://issues.apache.org/jira/browse/HBASE-4354 Project: HBase Issue Type: New Feature Components: master, metrics, regionserver Reporter: Ming Ma Assignee: Ming Ma For debugging and analysis purposes it will be useful to understand regions' lifecycle, how it is created ( from which parent region, for example), how it is splitted, assigned, etc. Some of these info are in the logs, hbase .META. table, zookeeper, metrics. Certain history data is lost; for example, the states will be removed from zookeeper /hbase/unassigned once the region is assigned; also .META. table has max version of 10 thus only tracks the last 10 RS assignments of a given region. It will be nice to put it a central place. It can provide: 1. How applications use hbase. For example, it might create large number of regions in a short period of time and drop the table later. 2. How HBase internally manage regions such as how regions are splitted, assigned, turned offline, etc. Things to track 1. How it is created, parent region in the case of split. 2. Region tranisition process such as region state change, region server change. One idea is to put such transition history data to zookeeper. One issue is it could blow up zookeeper memory if we have large number of regions and the cluster runs for a long time. I would like to get your feedback on different approaches to address the issue. One assumption is region assignment doesn't happen with high frequency and thus the overhead introduced won't have much impact on the system performance. Approach 1: Zookeeper knows the history of how /hbase/unassigned is modified, if we can get zookeeper's logs (Bookkeeper ? ) somehow, we know the history of region transition. Approach 2: 1.HBase logs extra region transition data to zookeeper. It could be one zookeeper node per transaction. 2.Have a separate thread on the Master to move data from zookeeper and append to HDFS. That will keep the zookeeper size in check. 3.Have some tool or web UI to show the history of a given region by looking at zookeeper and HDFS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4361) Certain filter expressions fail in the shell
[ https://issues.apache.org/jira/browse/HBASE-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101613#comment-13101613 ] Todd Lipcon commented on HBASE-4361: Several problems here: 1) I was using double-quotes twice, so it was passing true as the filter value. JRuby and its lovely lack of type checking then passed that through to the point where it tried to write true to the wire as a Writable, and failed. 2) The documentation for SingleColumnValueFilter has the incorrect order of arguments. 3) The errors given back by the filter parsing code are inscrutable. Certain filter expressions fail in the shell Key: HBASE-4361 URL: https://issues.apache.org/jira/browse/HBASE-4361 Project: HBase Issue Type: Bug Components: filters, shell Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Running the following in the shell hangs and then fails: {noformat} scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') } {noformat} The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) undefined method `write' for true:TrueClass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4363) [replication] ReplicationSource won't close if failing to contact the sink
[replication] ReplicationSource won't close if failing to contact the sink -- Key: HBASE-4363 URL: https://issues.apache.org/jira/browse/HBASE-4363 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.90.5 When trying to close a source, it will hang if it's already in shipEdits() and has issues reaching the sink. The reason is that in that method the while loop only checks if the RS is going down but not if the source was asked to shutdown. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4361) Certain filter expressions fail in the shell
[ https://issues.apache.org/jira/browse/HBASE-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101618#comment-13101618 ] Todd Lipcon commented on HBASE-4361: For reference, the correct way to specify this is: scan 't1', { FILTER = SingleColumnValueFilter('f1', 'col_a', , 'binary:1') } But I had to read the code for 30 minutes to figure it out. We need lots of docs updates on the filter language. Certain filter expressions fail in the shell Key: HBASE-4361 URL: https://issues.apache.org/jira/browse/HBASE-4361 Project: HBase Issue Type: Bug Components: filters, shell Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Running the following in the shell hangs and then fails: {noformat} scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') } {noformat} The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) undefined method `write' for true:TrueClass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters
[ https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101621#comment-13101621 ] Jean-Daniel Cryans commented on HBASE-3130: --- Looks like it got even worse recently, we got a situation where the SessionExpired was treated like if it was the RS's own and it FATAL'ed. [replication] ReplicationSource can't recover from session expired on remote clusters - Key: HBASE-3130 URL: https://issues.apache.org/jira/browse/HBASE-3130 Project: HBase Issue Type: Bug Components: replication Reporter: Jean-Daniel Cryans Currently ReplicationSource cannot recover when its zookeeper connection to its remote cluster expires. HLogs are still being tracked, but a cluster restart is required to continue replication (or a rolling restart). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4362) SITE: Center logo
[ https://issues.apache.org/jira/browse/HBASE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4362. -- Resolution: Fixed Assignee: stack Committed. Updated site. SITE: Center logo - Key: HBASE-4362 URL: https://issues.apache.org/jira/browse/HBASE-4362 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Attachments: site.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4364) Column family pruning incorrectly prunes CFs referred to by filters
[ https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4364: --- Affects Version/s: 0.92.0 0.90.4 Column family pruning incorrectly prunes CFs referred to by filters --- Key: HBASE-4364 URL: https://issues.apache.org/jira/browse/HBASE-4364 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0 Reporter: Todd Lipcon Priority: Critical For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, and those non-selected columns are part of a separate column family, then those filter conditions are ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4364) Column family pruning incorrectly prunes CFs referred to by filters
[ https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101629#comment-13101629 ] Todd Lipcon commented on HBASE-4364: Example shell code to reproduce this: {noformat} create 't1', 'f1', f2' put 't1', 'r1', 'f1:word', 'hello' put 't1', 'r1', 'f2:word', 'bonjour' put 't1', 'r2', 'f1:word', 'goodbye' put 't1', 'r2', 'f2:word', 'au revoir' # scan whole table, has 2 rows, each with 2 cols scan 't1' # scan selecting only one column - returns 2 distinct rows scan 't1', { COLUMNS = ['f1:word'] } # scan with a predicate of the french word 'b', returns 1 row scan 't1', { FILTER = SingleColumnValueFilter('f2', 'word', , 'binary:b') } # scan with a predicate of the french word 'b', selecting only the english word scan 't1', { COLUMNS = ['f1:word'], FILTER = SingleColumnValueFilter('f2', 'word', , 'binary:b') } {noformat} The incorrect result is as follows: {noformat} hbase(main):008:0 scan 't1' ROWCOLUMN+CELL r1column=f1:word, timestamp=1315608975212, value=hello r1column=f2:word, timestamp=1315608975238, value=bonjour r2column=f1:word, timestamp=1315608975258, value=goodbye r2column=f2:word, timestamp=1315608975286, value=au revoir 2 row(s) in 0.0270 seconds hbase(main):009:0 scan 't1', { COLUMNS = ['f1:word'] } ROWCOLUMN+CELL r1column=f1:word, timestamp=1315608975212, value=hello r2column=f1:word, timestamp=1315608975258, value=goodbye 2 row(s) in 0.0140 seconds hbase(main):010:0 scan 't1', { FILTER = SingleColumnValueFilter('f2', 'word', , 'binary:b') } ROWCOLUMN+CELL r1column=f1:word, timestamp=1315608975212, value=hello r1column=f2:word, timestamp=1315608975238, value=bonjour 1 row(s) in 0.0250 seconds hbase(main):011:0 scan 't1', { COLUMNS = ['f1:word'], FILTER = SingleColumnValueFilter('f2', 'word', , 'binary:b') } ROWCOLUMN+CELL r1column=f1:word, timestamp=1315608975212, value=hello r2column=f1:word, timestamp=1315608975258, value=goodbye 2 row(s) in 0.0270 seconds SHOULD NOT HAVE RETURNED ANY VALUE FOR r2! {noformat} Column family pruning incorrectly prunes CFs referred to by filters --- Key: HBASE-4364 URL: https://issues.apache.org/jira/browse/HBASE-4364 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0 Reporter: Todd Lipcon Priority: Critical For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, and those non-selected columns are part of a separate column family, then those filter conditions are ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close causes dataloss
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101630#comment-13101630 ] jirapos...@reviews.apache.org commented on HBASE-4270: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1784/ --- Review request for hbase. Summary --- Todd wrote the patch for this issue. Whats posted here is his patch plus a unit test. The diff is pretty big because I refactored the TestOpenRegionHandler so I could share bits of it creating this new TestCloseRegionHandler; the bulk of the patch is making shared mock server and shared mock regionserverservice files. This addresses bug hbase-4270. https://issues.apache.org/jira/browse/hbase-4270 Diffs - src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java b684af2 src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockRegionServerServices.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockServer.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java ab12968 Diff: https://reviews.apache.org/r/1784/diff Testing --- I ran the new TestCloseRegionHandler test. Thanks, Michael IOE ignored during flush-on-close causes dataloss - Key: HBASE-4270 URL: https://issues.apache.org/jira/browse/HBASE-4270 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4, 0.92.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.92.0 Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed. Instead, the RS should do a hard abort so that its logs will be replayed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4364) Column family pruning incorrectly prunes CFs referred to by filters
[ https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101632#comment-13101632 ] Todd Lipcon commented on HBASE-4364: Actually, it turns out this isn't due to column family pruning - the same behavior occurs even with just one column family: {noformat} create 't2', 'f' put 't2', 'r1', 'f:e_word', 'hello' put 't2', 'r1', 'f:f_word', 'bonjour' put 't2', 'r2', 'f:e_word', 'goodbye' put 't2', 'r2', 'f:f_word', 'au revoir' scan 't2' # scan selecting only one column - returns 2 distinct rows scan 't2', { COLUMNS = ['f:e_word'] } # scan with a predicate of the french word 'b', returns 1 row scan 't2', { FILTER = SingleColumnValueFilter('f', 'f_word', , 'binary:b') } # scan with a predicate of the french word 'b', selecting only the english word scan 't2', { COLUMNS = ['f:e_word'], FILTER = SingleColumnValueFilter('f', 'e_word', , 'binary:b') } {noformat} Column family pruning incorrectly prunes CFs referred to by filters --- Key: HBASE-4364 URL: https://issues.apache.org/jira/browse/HBASE-4364 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0 Reporter: Todd Lipcon Priority: Critical For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, and those non-selected columns are part of a separate column family, then those filter conditions are ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4364) Filters applied to rows not in the selected column list are ignored
[ https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4364: --- Summary: Filters applied to rows not in the selected column list are ignored (was: Column family pruning incorrectly prunes CFs referred to by filters) Filters applied to rows not in the selected column list are ignored --- Key: HBASE-4364 URL: https://issues.apache.org/jira/browse/HBASE-4364 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0 Reporter: Todd Lipcon Priority: Critical For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, and those non-selected columns are part of a separate column family, then those filter conditions are ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4270) IOE ignored during flush-on-close causes dataloss
[ https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4270: - Status: Patch Available (was: Open) Marking patch available. IOE ignored during flush-on-close causes dataloss - Key: HBASE-4270 URL: https://issues.apache.org/jira/browse/HBASE-4270 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4, 0.92.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.92.0 Attachments: 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch If the RS experiences an exception during the flush of a region while closing it, it currently catches the exception, logs a warning, and keeps going. If the exception was a DroppedSnapshotException, this means that it will silently drop any data that was in memstore when the region was closed. Instead, the RS should do a hard abort so that its logs will be replayed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-4364) Column family pruning incorrectly prunes CFs referred to by filters
[ https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101632#comment-13101632 ] Todd Lipcon edited comment on HBASE-4364 at 9/9/11 11:08 PM: - Actually, it turns out this isn't due to column family pruning - the same behavior occurs even with just one column family: {noformat} create 't2', 'f' put 't2', 'r1', 'f:e_word', 'hello' put 't2', 'r1', 'f:f_word', 'bonjour' put 't2', 'r2', 'f:e_word', 'goodbye' put 't2', 'r2', 'f:f_word', 'au revoir' scan 't2' # scan selecting only one column - returns 2 distinct rows scan 't2', { COLUMNS = ['f:e_word'] } # scan with a predicate of the french word 'b', returns 1 row scan 't2', { FILTER = SingleColumnValueFilter('f', 'f_word', , 'binary:b') } # scan with a predicate of the french word 'b', selecting only the english word scan 't2', { COLUMNS = ['f:e_word'], FILTER = SingleColumnValueFilter('f', 'f_word', , 'binary:b') } {noformat} was (Author: tlipcon): Actually, it turns out this isn't due to column family pruning - the same behavior occurs even with just one column family: {noformat} create 't2', 'f' put 't2', 'r1', 'f:e_word', 'hello' put 't2', 'r1', 'f:f_word', 'bonjour' put 't2', 'r2', 'f:e_word', 'goodbye' put 't2', 'r2', 'f:f_word', 'au revoir' scan 't2' # scan selecting only one column - returns 2 distinct rows scan 't2', { COLUMNS = ['f:e_word'] } # scan with a predicate of the french word 'b', returns 1 row scan 't2', { FILTER = SingleColumnValueFilter('f', 'f_word', , 'binary:b') } # scan with a predicate of the french word 'b', selecting only the english word scan 't2', { COLUMNS = ['f:e_word'], FILTER = SingleColumnValueFilter('f', 'e_word', , 'binary:b') } {noformat} Column family pruning incorrectly prunes CFs referred to by filters --- Key: HBASE-4364 URL: https://issues.apache.org/jira/browse/HBASE-4364 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0 Reporter: Todd Lipcon Priority: Critical For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, and those non-selected columns are part of a separate column family, then those filter conditions are ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4361) Certain filter expressions fail in the shell
[ https://issues.apache.org/jira/browse/HBASE-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4361: --- Attachment: small-improvements.txt here are a few improvements. Still need to fix the docs, etc, to be correct. Ideally IMO the filter parsing would be done by javacc or antlr so we'd have a real grammar. Certain filter expressions fail in the shell Key: HBASE-4361 URL: https://issues.apache.org/jira/browse/HBASE-4361 Project: HBase Issue Type: Bug Components: filters, shell Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Attachments: small-improvements.txt Running the following in the shell hangs and then fails: {noformat} scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') } {noformat} The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) undefined method `write' for true:TrueClass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4364) Filters applied to columns not in the selected column list are ignored
[ https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4364: --- Component/s: filters Description: For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, then those filter conditions are ignored. (was: For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, and those non-selected columns are part of a separate column family, then those filter conditions are ignored.) Summary: Filters applied to columns not in the selected column list are ignored (was: Filters applied to rows not in the selected column list are ignored) Updated description to reflect the above: this is a general issue, not related to CFs. Filters applied to columns not in the selected column list are ignored -- Key: HBASE-4364 URL: https://issues.apache.org/jira/browse/HBASE-4364 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.4, 0.92.0 Reporter: Todd Lipcon Priority: Critical For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, then those filter conditions are ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4243) HADOOP_HOME should be auto-detected
[ https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101637#comment-13101637 ] stack commented on HBASE-4243: -- np. Linux is usually a safe bet but then you also have to make sure it works on the desktop machine a bunch of us whiney engineers use. I saw this hunting around for shell portable readlink: http://stackoverflow.com/questions/1055671/how-can-i-get-the-behavior-of-gnus-readlink-f-on-a-mac Maybe it'll help? HADOOP_HOME should be auto-detected --- Key: HBASE-4243 URL: https://issues.apache.org/jira/browse/HBASE-4243 Project: HBase Issue Type: Improvement Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Priority: Minor Attachments: HBASE-4243.patch.txt Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect the HADOOP_HOME setting if it is not given explicitly. Something along the lines of: {noformat} # check for hadoop in the path 141 HADOOP_IN_PATH=`which hadoop 2/dev/null` 142 if [ -f ${HADOOP_IN_PATH} ]; then 143 HADOOP_DIR=`dirname $HADOOP_IN_PATH`/.. 144 fi 145 # HADOOP_HOME env variable overrides hadoop in the path 146 HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR} 147 if [ $HADOOP_HOME == ]; then 148 echo Cannot find hadoop installation: \$HADOOP_HOME must be set or hadoop must be in the path; 149 exit 4; 150 fi {noformat} Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101640#comment-13101640 ] jirapos...@reviews.apache.org commented on HBASE-4358: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1768/#review1847 --- /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java https://reviews.apache.org/r/1768/#comment4220 Riley: Thanks for the detailed explanation. I should have patched your patch locally and performed the drill down. There're 3 Arrays.asList() calls in patch v2. If you don't have time, I can change them before committing. Thanks for the nice work. - Ted On 2011-09-09 21:23:14, Riley Patterson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1768/ bq. --- bq. bq. (Updated 2011-09-09 21:23:14) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. bq. bq. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: bq. bq. Before patch: bq. hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 2.6450 seconds bq. bq. After patch: bq. hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} bq. Updating all regions with the new schema... bq. 1/1 regions updated. bq. Done. bq. 0 row(s) in 1.1930 seconds bq. bq. Regions are only brought down once, and the duration is cut 1/N. bq. bq. bq. This addresses bug HBASE-4358. bq. https://issues.apache.org/jira/browse/HBASE-4358 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1166933 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java PRE-CREATION bq./src/main/ruby/hbase/admin.rb 1166933 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1166933 bq. bq. Diff: https://reviews.apache.org/r/1768/diff bq. bq. bq. Testing bq. --- bq. bq. Sanity checked functionality in psuedo-distributed mode (tried several permutations of different alterations, all completed successfully and with only one round of region restarts). Ran all unit tests successfully. bq. bq. bq. Thanks, bq. bq. Riley bq. bq. Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement Components: ipc, master, shell Affects Versions: 0.92.0 Reporter: Riley Patterson Assignee: Riley Patterson
[jira] [Commented] (HBASE-4364) Filters applied to columns not in the selected column list are ignored
[ https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101641#comment-13101641 ] Todd Lipcon commented on HBASE-4364: Apparently this is actually known behavior according to SingleColumnValueFilter. From the JavaDoc: {noformat} When using this filter on a {@link Scan} with specified * inputs, the column to be tested should also be added as input (otherwise * the filter will regard the column as missing). {noformat} IMO, it's a bug, though, not a feature! Filters with requirements like this should automatically push their column requirements through to the ExplicitColumnTracker. Filters applied to columns not in the selected column list are ignored -- Key: HBASE-4364 URL: https://issues.apache.org/jira/browse/HBASE-4364 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.4, 0.92.0 Reporter: Todd Lipcon Priority: Critical For a scan, if you select some set of columns using addColumns(), and then apply a SingleColumnValueFilter that restricts the results based on some other columns which aren't selected, then those filter conditions are ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2195) Support cyclic replication
[ https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101647#comment-13101647 ] Jean-Daniel Cryans commented on HBASE-2195: --- I guess we could add some wits... maybe even verify beforehand the definition of each table and see if there's a problem. Support cyclic replication -- Key: HBASE-2195 URL: https://issues.apache.org/jira/browse/HBASE-2195 Project: HBase Issue Type: Sub-task Components: replication Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 2195-v5.txt, 2195-v6.txt, 2195.txt We need to support cyclic replication by using the cluster id of each HlogKey and stop replicating when it goes back to the original cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4358) Batch Table Alter Operations
[ https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4358: -- Attachment: 4358-v3.txt Patch version 3 removes Arrays.asList() calls. Batch Table Alter Operations Key: HBASE-4358 URL: https://issues.apache.org/jira/browse/HBASE-4358 Project: HBase Issue Type: Improvement Components: ipc, master, shell Affects Versions: 0.92.0 Reporter: Riley Patterson Assignee: Riley Patterson Priority: Minor Attachments: 4358-v3.txt, HBASE-4358-v2.patch, HBASE-4358.patch Currently, the RPC provides no way of asking for several table alterations at once, and the master has no way of batch handling alter requests. Thus, when the user requests several changes at the same time (i.e. add these I columns, delete these J columns, and modify these K columns), each region is brought down (I+J+K) times so that it can reflect the new schema. Additionally, multiple writes are made to META, and multiple RPC calls must be made. This patch provides batching for these operations, both at the RPC level and within the Master's TableEventHandlers. This involves a bit of reorganization in the TableEventHandler class hierarchy, and a new TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the difference seen here: Before patch: hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} Updating all regions with the new schema... 1/1 regions updated. Done. Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 2.6450 seconds After patch: hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME = 'name'} Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 1.1930 seconds Regions are only brought down once, and the duration is cut 1/N. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4354) track region history
[ https://issues.apache.org/jira/browse/HBASE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101659#comment-13101659 ] Ming Ma commented on HBASE-4354: Thanks, Todd. Yes, interface is good to abstract various implementations. I was about to open a separate jira dynamic metrics logging for a more general strutured data logging infracture, something useful to collect hbase/mapreduce/hdfs dynamic metrics which aren't predefined and could change over time. It seems like region transaction history could an application for that system. track region history Key: HBASE-4354 URL: https://issues.apache.org/jira/browse/HBASE-4354 Project: HBase Issue Type: New Feature Components: master, metrics, regionserver Reporter: Ming Ma Assignee: Ming Ma For debugging and analysis purposes it will be useful to understand regions' lifecycle, how it is created ( from which parent region, for example), how it is splitted, assigned, etc. Some of these info are in the logs, hbase .META. table, zookeeper, metrics. Certain history data is lost; for example, the states will be removed from zookeeper /hbase/unassigned once the region is assigned; also .META. table has max version of 10 thus only tracks the last 10 RS assignments of a given region. It will be nice to put it a central place. It can provide: 1. How applications use hbase. For example, it might create large number of regions in a short period of time and drop the table later. 2. How HBase internally manage regions such as how regions are splitted, assigned, turned offline, etc. Things to track 1. How it is created, parent region in the case of split. 2. Region tranisition process such as region state change, region server change. One idea is to put such transition history data to zookeeper. One issue is it could blow up zookeeper memory if we have large number of regions and the cluster runs for a long time. I would like to get your feedback on different approaches to address the issue. One assumption is region assignment doesn't happen with high frequency and thus the overhead introduced won't have much impact on the system performance. Approach 1: Zookeeper knows the history of how /hbase/unassigned is modified, if we can get zookeeper's logs (Bookkeeper ? ) somehow, we know the history of region transition. Approach 2: 1.HBase logs extra region transition data to zookeeper. It could be one zookeeper node per transaction. 2.Have a separate thread on the Master to move data from zookeeper and append to HDFS. That will keep the zookeeper size in check. 3.Have some tool or web UI to show the history of a given region by looking at zookeeper and HDFS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4331) Bypassing default actions in prePut fails sometimes with HTable client
[ https://issues.apache.org/jira/browse/HBASE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling resolved HBASE-4331. -- Resolution: Fixed Hadoop Flags: [Reviewed] Committed to trunk. Thanks for the patch Lars. Bypassing default actions in prePut fails sometimes with HTable client -- Key: HBASE-4331 URL: https://issues.apache.org/jira/browse/HBASE-4331 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4331-v2.txt, 4331-v3.txt, 4331-v4.txt, 4331.txt While testing some other scenario I found calling CoprocessorEnvironment.bypass() fails if all trailing puts in a batch are bypassed that way. By extension a single bypassed put will also fail. The problem is that the puts are removed from the batch in a way that does not align them with the result-status, and in addition the result is never marked as success. A possible fix is to just mark bypassed puts as SUCCESS and filter them in the following logic. (I also contemplated a new BYPASSED OperationStatusCode, but that turned out to be not necessary). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4365) Add a decent heuristic for region size
Add a decent heuristic for region size -- Key: HBASE-4365 URL: https://issues.apache.org/jira/browse/HBASE-4365 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: Todd Lipcon A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) - for small tables you may want a small region size just so you can distribute load better across a cluster - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4209) The HBase hbase-daemon.sh SIGKILLs master when stopping it
[ https://issues.apache.org/jira/browse/HBASE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101674#comment-13101674 ] Roman Shaposhnik commented on HBASE-4209: - stack, I'm sorry for putting this on a backburner, but at least I now have a better understanding of what's going on. Basically I got confused in a situation where suppressHdfsShutdownHook would be called multiple times on the same filesystem object. The first call would succeed, but all the other ones would fail. This is, obviously, just a problem with my patch, not the minihdfs cluster. I'll cook up an alternative and once I run the tests will attach an updated version. P.S. Thanks for the encouragement! The HBase hbase-daemon.sh SIGKILLs master when stopping it -- Key: HBASE-4209 URL: https://issues.apache.org/jira/browse/HBASE-4209 Project: HBase Issue Type: Bug Components: master Reporter: Roman Shaposhnik There's a bit of code in hbase-daemon.sh that makes HBase master being SIGKILLed when stopping it rather than trying SIGTERM (like it does for other daemons). When HBase is executed in a standalone mode (and the only daemon you need to run is master) that causes newly created tables to go missing as unflushed data is thrown out. If there was not a good reason to kill master with SIGKILL perhaps we can take that special case out and rely on SIGTERM. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4366) dynamic metrics logging
dynamic metrics logging --- Key: HBASE-4366 URL: https://issues.apache.org/jira/browse/HBASE-4366 Project: HBase Issue Type: New Feature Components: metrics Reporter: Ming Ma Assignee: Ming Ma First, if there is existing solution for this, I would close this jira. Also I realize we already have various overlapping solutions; creating another solution isn't necessarily the best approach. However, I couldn't find anything that can meet the need. So open this jira for discussion. We have some scenarios in hbase/mapreduce/hdfs that requires logging large number of dynamic metrics. They can be used for troubleshooting, better measurement on the system and scorecard. For example, 1.HBase. Get metrics such as request per sec that are specific to a table, or column family. 2.Mapreduce Job history analysis. Would like to found out all the job ids that are submitted, completed, etc. in a specific time window. For troubleshooting, what people usually do today, 1) Use current machine-level metrics to find out which machine has the issue. 2) go to that machine, analysis the local log. The characteristics of such kind of metrics: 1.It isn't something that can be predefined. The key such as table name, job id is dynamic. 2.The number of such metrics could be much larger than what the current metrics framework can handle. 3.We don't have a scenario that require near real time query support, e.g., from the time the metrics is generated to the time it is available to query can be at like an hour. 4.How data is consumed is highly application specific. Some ideas: 1. Provide some interface for any application to log data. 2. The metrics can be written to log files. The log files or log entries will be loaded to HBase, or HDFS asynchronously. That could go to a separate cluster. 3. To consume such data, application could run map reduce job on the log files for aggregation, or do random read directly from HBase. Comments? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut
[ https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4347: - Attachment: 4347-v3.txt New patch. Passes all test. (That is, the few tests that fail locally, fail with or without the patch). Remove duplicated code from Put, Delete, Get, Scan, MultiPut Key: HBASE-4347 URL: https://issues.apache.org/jira/browse/HBASE-4347 Project: HBase Issue Type: Improvement Components: util Affects Versions: 0.92.0 Reporter: Lars Hofhansl Priority: Minor Fix For: 0.92.0 Attachments: 4347-v2.txt, 4347-v3.txt, 4347.txt This came from discussion with Stack w.r.t. HBASE-2195. There is currently a lot of duplicated code especially between Put and Delete, and also between all Operations. For example all of Put/Delete/Get/Scan have attributes with exactly the same code in all classes. Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc. One way to do this is to introduce OperationWithAttributes which extends Operation, and have Put/Delete/Get/Scan extend that rather than Operation. In addition Put and Delete could extends from Mutation (which itself would extend OperationWithAttributes). If a static inheritance hierarchy is not desired here, we can use delegation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4362) SITE: Center logo
[ https://issues.apache.org/jira/browse/HBASE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101728#comment-13101728 ] Hudson commented on HBASE-4362: --- Integrated in HBase-TRUNK #2195 (See [https://builds.apache.org/job/HBase-TRUNK/2195/]) HBASE-4362 Center logo stack : Files : * /hbase/trunk/src/site/resources/css/site.css * /hbase/trunk/src/site/site.vm SITE: Center logo - Key: HBASE-4362 URL: https://issues.apache.org/jira/browse/HBASE-4362 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Attachments: site.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut
[ https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reassigned HBASE-4347: Assignee: Lars Hofhansl Remove duplicated code from Put, Delete, Get, Scan, MultiPut Key: HBASE-4347 URL: https://issues.apache.org/jira/browse/HBASE-4347 Project: HBase Issue Type: Improvement Components: util Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.92.0 Attachments: 4347-v2.txt, 4347-v3.txt, 4347.txt This came from discussion with Stack w.r.t. HBASE-2195. There is currently a lot of duplicated code especially between Put and Delete, and also between all Operations. For example all of Put/Delete/Get/Scan have attributes with exactly the same code in all classes. Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc. One way to do this is to introduce OperationWithAttributes which extends Operation, and have Put/Delete/Get/Scan extend that rather than Operation. In addition Put and Delete could extends from Mutation (which itself would extend OperationWithAttributes). If a static inheritance hierarchy is not desired here, we can use delegation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4340) Hbase can't balance.
[ https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101815#comment-13101815 ] Ted Yu commented on HBASE-4340: --- Can you prepare patch for TRUNK as well ? I think 0.90 branch and TRUNK should be kept in sync. Hbase can't balance. Key: HBASE-4340 URL: https://issues.apache.org/jira/browse/HBASE-4340 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4340_branch90.patch Version: 0.90.4 Cluster : 40 boxes As I saw below logs. It said that balance couldn't work because of a dead RS. I dug deeply and found two issues: 1. shutdownhandler didn't clear numProcessing deal with some exceptions. It seems whatever exceptions we should clear the flag or close master. 2. dead regionserver(s): [158-1-130-12,20020,1314971097929] is inaccurate. The dead sever should be 158-1-130-10,20020,1315068597979 //master logs: 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:18:00,543 DEBUG
[jira] [Updated] (HBASE-4340) Hbase can't balance if ServerShutdownHandler encountered exception
[ https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4340: -- Comment: was deleted (was: Can you prepare patch for TRUNK as well ? I think 0.90 branch and TRUNK should be kept in sync.) Hbase can't balance if ServerShutdownHandler encountered exception -- Key: HBASE-4340 URL: https://issues.apache.org/jira/browse/HBASE-4340 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4340_branch90.patch Version: 0.90.4 Cluster : 40 boxes As I saw below logs. It said that balance couldn't work because of a dead RS. I dug deeply and found two issues: 1. shutdownhandler didn't clear numProcessing deal with some exceptions. It seems whatever exceptions we should clear the flag or close master. 2. dead regionserver(s): [158-1-130-12,20020,1314971097929] is inaccurate. The dead sever should be 158-1-130-10,20020,1315068597979 //master logs: 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s):
[jira] [Updated] (HBASE-4330) Fix races in slab cache
[ https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Pi updated HBASE-4330: - Attachment: hbase-4330v6.txt Fixed race condition leading to the test failure. Fix races in slab cache --- Key: HBASE-4330 URL: https://issues.apache.org/jira/browse/HBASE-4330 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Li Pi Fix For: 0.92.0 Attachments: hbase-4330.txt, hbase-4330.txt, hbase-4330v3.txt, hbase-4330v4.txt, hbase-4330v5.txt, hbase-4330v6.txt A few races are still lingering in the slab cache. Here are some tests and proposed fixes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4340) Hbase can't balance if ServerShutdownHandler encountered exception
[ https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4340: -- Summary: Hbase can't balance if ServerShutdownHandler encountered exception (was: Hbase can't balance.) Hbase can't balance if ServerShutdownHandler encountered exception -- Key: HBASE-4340 URL: https://issues.apache.org/jira/browse/HBASE-4340 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4340_branch90.patch Version: 0.90.4 Cluster : 40 boxes As I saw below logs. It said that balance couldn't work because of a dead RS. I dug deeply and found two issues: 1. shutdownhandler didn't clear numProcessing deal with some exceptions. It seems whatever exceptions we should clear the flag or close master. 2. dead regionserver(s): [158-1-130-12,20020,1314971097929] is inaccurate. The dead sever should be 158-1-130-10,20020,1315068597979 //master logs: 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s):
[jira] [Updated] (HBASE-4340) Hbase can't balance if ServerShutdownHandler encountered exception
[ https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4340: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Hbase can't balance if ServerShutdownHandler encountered exception -- Key: HBASE-4340 URL: https://issues.apache.org/jira/browse/HBASE-4340 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4340_branch90.patch Version: 0.90.4 Cluster : 40 boxes As I saw below logs. It said that balance couldn't work because of a dead RS. I dug deeply and found two issues: 1. shutdownhandler didn't clear numProcessing deal with some exceptions. It seems whatever exceptions we should clear the flag or close master. 2. dead regionserver(s): [158-1-130-12,20020,1314971097929] is inaccurate. The dead sever should be 158-1-130-10,20020,1315068597979 //master logs: 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [158-1-130-12,20020,1314971097929] 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s):
[jira] [Commented] (HBASE-4330) Fix races in slab cache
[ https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101974#comment-13101974 ] Ted Yu commented on HBASE-4330: --- I don't see much difference for patch v6 on my MacBook: {code} testCacheMultiThreadedEviction(org.apache.hadoop.hbase.io.hfile.slab.TestSlabCache) Time elapsed: 23.649 sec ERROR! java.lang.RuntimeException: Deferred at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:76) at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.stop(MultithreadedTestUtil.java:97) at org.apache.hadoop.hbase.io.hfile.CacheTestUtils.hammerEviction(CacheTestUtils.java:211) at org.apache.hadoop.hbase.io.hfile.slab.TestSlabCache.testCacheMultiThreadedEviction(TestSlabCache.java:87) ... Caused by: java.lang.RuntimeException: already cached key_8_9 at org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache.cacheBlock(SingleSizeCache.java:132) at org.apache.hadoop.hbase.io.hfile.slab.SlabCache.cacheBlock(SlabCache.java:207) at org.apache.hadoop.hbase.io.hfile.CacheTestUtils$3.doAnAction(CacheTestUtils.java:197) at org.apache.hadoop.hbase.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:139) at org.apache.hadoop.hbase.MultithreadedTestUtil$TestThread.run(MultithreadedTestUtil.java:115) {code} Fix races in slab cache --- Key: HBASE-4330 URL: https://issues.apache.org/jira/browse/HBASE-4330 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Li Pi Fix For: 0.92.0 Attachments: hbase-4330.txt, hbase-4330.txt, hbase-4330v3.txt, hbase-4330v4.txt, hbase-4330v5.txt, hbase-4330v6.txt A few races are still lingering in the slab cache. Here are some tests and proposed fixes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira