[jira] Created: (HBASE-3148) Duplicate check table name in HBaseAdmin's createTable method
Duplicate check table name in HBaseAdmin's createTable method - Key: HBASE-3148 URL: https://issues.apache.org/jira/browse/HBASE-3148 Project: HBase Issue Type: Improvement Components: client Reporter: Jeff Zhang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3148) Duplicate check table name in HBaseAdmin's createTable method
[ https://issues.apache.org/jira/browse/HBASE-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924456#action_12924456 ] Jeff Zhang commented on HBASE-3148: --- When I learn the hbase code, I found that there's a duplicate check table name in HBaseAdmin's createTable method. Line 282 in createTable method do one check and line 332 do another check in createTableAsync which is called by createTable. I believe one check can been removed. Duplicate check table name in HBaseAdmin's createTable method - Key: HBASE-3148 URL: https://issues.apache.org/jira/browse/HBASE-3148 Project: HBase Issue Type: Improvement Components: client Reporter: Jeff Zhang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens
[ https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924602#action_12924602 ] stack commented on HBASE-3147: -- I got this when I tried running patch {code} java.lang.IllegalAccessError: tried to access method org.apache.hadoop.hbase.zookeeper.ZKAssign.getNodeName(Lorg/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher;Ljava/lang/String;)Ljava/lang/String; from class org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor at org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1457) at org.apache.hadoop.hbase.Chore.run(Chore.java:66) 2010-10-25 16:07:44,354 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: sv2borg180:6.timeoutMonitor exiting {code} Let me try fix. Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens - Key: HBASE-3147 URL: https://issues.apache.org/jira/browse/HBASE-3147 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.90.0 The rolling restart script is great for bringing on the weird stuff. On my little loaded cluster if I run it, it horks the cluster and it doesn't recover. I notice two issues that need fixing: 1. We'll miss noticing that a server was carrying .META. and it never gets assigned -- the shutdown handlers get stuck in perpetual wait on a .META. assign that will never happen. 2. Perpetual cycling of the this sequence per region not succesfully assigned: {code} 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. state=PENDING_OPEN, ts=1287869814294 45154 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN or OPENING for too long, reassigning region=usertable,user510588360,1287547556587. 7f2d92497d2d03917afd574ea2aca55b. 45155 2010-10-23 21:37:57,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE 45156 2010-10-23 21:37:57,404 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE failed, the node existed but was in the state M_ZK_REGION_OFFLINE 45157 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING to OFFLINE so skipping timeout, region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. ,,, {code} Timeout period again elapses an then same sequence. This is what I've been working on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens
[ https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924614#action_12924614 ] Jonathan Gray commented on HBASE-3147: -- Hmm... you should have: public static String getNodeName(ZooKeeperWatcher zkw, String regionName) { as part of the diff up on RB Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens - Key: HBASE-3147 URL: https://issues.apache.org/jira/browse/HBASE-3147 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.90.0 The rolling restart script is great for bringing on the weird stuff. On my little loaded cluster if I run it, it horks the cluster and it doesn't recover. I notice two issues that need fixing: 1. We'll miss noticing that a server was carrying .META. and it never gets assigned -- the shutdown handlers get stuck in perpetual wait on a .META. assign that will never happen. 2. Perpetual cycling of the this sequence per region not succesfully assigned: {code} 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. state=PENDING_OPEN, ts=1287869814294 45154 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN or OPENING for too long, reassigning region=usertable,user510588360,1287547556587. 7f2d92497d2d03917afd574ea2aca55b. 45155 2010-10-23 21:37:57,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE 45156 2010-10-23 21:37:57,404 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE failed, the node existed but was in the state M_ZK_REGION_OFFLINE 45157 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING to OFFLINE so skipping timeout, region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. ,,, {code} Timeout period again elapses an then same sequence. This is what I've been working on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2001) Coprocessors: Colocate user code with regions
[ https://issues.apache.org/jira/browse/HBASE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924653#action_12924653 ] HBase Review Board commented on HBASE-2001: --- Message from: Andrew Purtell apurt...@apache.org bq. On 2010-10-25 06:49:15, Himanshu Vashishtha wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/CoprocessorHost.java, line 343 bq. http://review.cloudera.org/r/876/diff/7/?file=14190#file14190line343 bq. bq. What is its purpose here? I couldn't see it being used as of now. Is it for some future functionality. The access controller coprocessor (HBASE-3025) needs a CatalogTracker. Other future functionality is also considered. - Andrew --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/876/#review1646 --- Coprocessors: Colocate user code with regions - Key: HBASE-2001 URL: https://issues.apache.org/jira/browse/HBASE-2001 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Mingjie Lai Fix For: 0.92.0 Attachments: asm-transformations.pdf, HBASE-2001-RegionObserver-2.patch, HBASE-2001-RegionObserver.patch, HBASE-2001.patch.gz, packge-info.html, packge-info.html Support user code that runs run next to each region in table. As regions split and move, coprocessor code should automatically move also. Use classloader which looks on HDFS. Associate a list of classes to load with each table. Put this in HRI so it inherits from table but can be changed on a per region basis (so then those region specific changes can inherited by daughters). Not completely arbitrary code, should require implementation of an interface with callbacks for: * Open * Close * Split * Compact * (Multi)get and scanner next() * (Multi)put * (Multi)delete Add method to HRegionInterface for invoking coprocessor methods and retrieving results. Add methods in o.a.h.h.regionserver or subpackage which implement convenience functions for coprocessor methods and consistent/controlled access to internals: store access, threading, persistent and ephemeral state, scratch storage, etc. GitHub: http://github.com/mlai/hbase/tree/0.90_coprocessor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable
[ https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924664#action_12924664 ] Nicolas Spiegelberg commented on HBASE-2462: So, we've been talking about a new compaction algorithm internally and wanted to get external feedback as well... The existing store file selection algorithm seems to not utilize enough context. We start at the oldest and compact everything else when it's no longer 2x the next oldest. It seems like we want to approach from the opposite direction: 1. Start at the newest file 2. Unconditionally compact as long as the StoreFiles are less than a certain size (thinking hbase.regionserver.hlog.blocksize). 3. After that metric has been met, if next oldest file sum(all newer files) * R, we include it in the compaction. R = 2. 4. If files-to-compact max(HColumnDescriptor.maxVersions(),3), skip the compaction This algorithm can serve a very generic workload. Axiom: It's worth compacting if sum(files) = 150% * max(files). Maybe make this adjustable. The main point is that the ratio between file[i], file[i+1] is less useful than sum(files), max(files). A. With files[i] files[i+1] * 2, our worst case ends up with a decreasing triangle of 2x. B. With files[i] sum(files[0..i-1]) * 2, we are dealing with the derivative. Our worst case ends up with decreasing triangle of 4x With a 4x ratio 64 MB hlog blocksize, we could support up to a 21.4GB Store while using less than 8 files. 3 minimal threshold fiels + 5 worst case files that would be roughly: 64MB, 256MB, 1GB, 4GB, 16GB == 21.3GB. Assuming that the average user has a 1-2 GB store, the number of HFiles should never get above 6. Review compaction heuristic and move compaction code out so standalone and independently testable - Key: HBASE-2462 URL: https://issues.apache.org/jira/browse/HBASE-2462 Project: HBase Issue Type: Improvement Reporter: stack Assignee: Jonathan Gray Priority: Critical Anything that improves our i/o profile makes hbase run smoother. Over in HBASE-2457, good work has been done already describing the tension between minimizing compactions versus minimizing count of store files. This issue is about following on from what has been done in 2457 but also, breaking the hard-to-read compaction code out of Store.java out to a standalone class that can be the easier tested (and easily analyzed for its performance characteristics). If possible, in the refactor, we'd allow specification of alternate merge sort implementations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3149) Make flush decisions per column family
Make flush decisions per column family -- Key: HBASE-3149 URL: https://issues.apache.org/jira/browse/HBASE-3149 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Today, the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3149) Make flush decisions per column family
[ https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924705#action_12924705 ] Jean-Daniel Cryans commented on HBASE-3149: --- I have been thinking about this one for some time... I think it makes sense in loads of ways since a common problem of multi-CF is that during the initial import the user ends up with thousands of small store files because some family grows faster and triggered the flushes, which in turn generates incredible compaction churn. On the other hand, it means that we almost consider a family as a region e.g. one region with 3 CF can have up to 3x64MB in the memstores. Make flush decisions per column family -- Key: HBASE-3149 URL: https://issues.apache.org/jira/browse/HBASE-3149 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Today, the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable
[ https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924717#action_12924717 ] stack commented on HBASE-2462: -- hbase.regionserver.hlog.blocksize == fs default block size. Better to use fs default block size rather than an hlog setting. Whats rationale of rule 4? Do you rather mean the compaction threshold here? Sorry, whats max(files)? The largest file? And sum(files) is all files or just some subset (you keep adding to the subset till you are 150% the biggest?) So, you think this algo will make for less compactions yet keep count of files low? Review compaction heuristic and move compaction code out so standalone and independently testable - Key: HBASE-2462 URL: https://issues.apache.org/jira/browse/HBASE-2462 Project: HBase Issue Type: Improvement Reporter: stack Assignee: Jonathan Gray Priority: Critical Anything that improves our i/o profile makes hbase run smoother. Over in HBASE-2457, good work has been done already describing the tension between minimizing compactions versus minimizing count of store files. This issue is about following on from what has been done in 2457 but also, breaking the hard-to-read compaction code out of Store.java out to a standalone class that can be the easier tested (and easily analyzed for its performance characteristics). If possible, in the refactor, we'd allow specification of alternate merge sort implementations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3150) Allow some column to not write WALs
Allow some column to not write WALs --- Key: HBASE-3150 URL: https://issues.apache.org/jira/browse/HBASE-3150 Project: HBase Issue Type: Improvement Reporter: Karthik Ranganathan Priority: Minor We have this unique requirement where some column families hold data that is indexed from other existing column families. The index data is very large, and we end up writing these inserts into the WAL and then into the store files. In addition to taking more iops, this also slows down splitting files for recovery, etc. Creating this task to have an option to suppress WAL logging on a per CF basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2946) Increment multiple columns in a row at once
[ https://issues.apache.org/jira/browse/HBASE-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924728#action_12924728 ] HBase Review Board commented on HBASE-2946: --- Message from: Jonathan Gray jg...@apache.org bq. On 2010-10-24 21:41:48, khemani wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/client/Increment.java, line 45 bq. http://review.cloudera.org/r/1088/diff/2/?file=15904#file15904line45 bq. bq. setWriteToWal() is missing? yup. will add. bq. On 2010-10-24 21:41:48, khemani wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/client/Increment.java, lines 46-47 bq. http://review.cloudera.org/r/1088/diff/2/?file=15904#file15904line46 bq. bq. why a navigable map? why not just a map? u can do things like tailMap[() with it. bq. On 2010-10-24 21:41:48, khemani wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 3012 bq. http://review.cloudera.org/r/1088/diff/2/?file=15907#file15907line3012 bq. bq. I am not sure how it is ensured that the order of iteration over the columns in family.getValue.entrySet() is same as the order of results returned? bq. bq. Also, if get finds multiple matches then will it return all of them? If yes then this will not work. familyMap and the map of columns to amounts are both TreeMaps ordered with Bytes.BYTES_COMPARATOR. Results are also guaranteed to be in order. And our Get has maxVersions=1 so we will not get multiple matches per column. Increment multiple columns in a row at once --- Key: HBASE-2946 URL: https://issues.apache.org/jira/browse/HBASE-2946 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Jonathan Gray Assignee: Jonathan Gray Currently there is no way to do multiple increments to a single row in one RPC. This jira is about adding an HTable and HRegionInterface method to increment multiple columns within a single row at once. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans
[ https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924759#action_12924759 ] HBase Review Board commented on HBASE-2753: --- Message from: Ryan Rawson ryano...@gmail.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1092/ --- Review request for hbase. Summary --- the last hurrah, removing the sorting from Result. sorted() is already @deprecated This addresses bug HBASE-2753. http://issues.apache.org/jira/browse/HBASE-2753 Diffs - trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java 1026537 Diff: http://review.cloudera.org/r/1092/diff Testing --- Thanks, Ryan Remove sorted() methods from Result now that Gets are Scans --- Key: HBASE-2753 URL: https://issues.apache.org/jira/browse/HBASE-2753 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.0 Reporter: Jonathan Gray Assignee: ryan rawson Fix For: 0.90.0 With the old Get codepath, we used to sometimes get results sent to the client that weren't fully sorted. Now that Gets are Scans, results should always be sorted. Confirm that we always get back sorted results and if so drop the Result.sorted() method and update javadoc accordingly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans
[ https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924762#action_12924762 ] HBase Review Board commented on HBASE-2753: --- Message from: Jonathan Gray jg...@apache.org --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1092/#review1661 --- Ship it! looks good to me - Jonathan Remove sorted() methods from Result now that Gets are Scans --- Key: HBASE-2753 URL: https://issues.apache.org/jira/browse/HBASE-2753 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.0 Reporter: Jonathan Gray Assignee: ryan rawson Fix For: 0.90.0 With the old Get codepath, we used to sometimes get results sent to the client that weren't fully sorted. Now that Gets are Scans, results should always be sorted. Confirm that we always get back sorted results and if so drop the Result.sorted() method and update javadoc accordingly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans
[ https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson resolved HBASE-2753. Resolution: Fixed this is committed, we no longer sort in Result. Remove sorted() methods from Result now that Gets are Scans --- Key: HBASE-2753 URL: https://issues.apache.org/jira/browse/HBASE-2753 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.0 Reporter: Jonathan Gray Assignee: ryan rawson Fix For: 0.90.0 With the old Get codepath, we used to sometimes get results sent to the client that weren't fully sorted. Now that Gets are Scans, results should always be sorted. Confirm that we always get back sorted results and if so drop the Result.sorted() method and update javadoc accordingly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2645) HLog writer can do 1-2 sync operations after lease has been recovered for split process.
[ https://issues.apache.org/jira/browse/HBASE-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-2645: --- Fix Version/s: (was: 0.90.0) 0.92.0 moving fix version to 0.92 HLog writer can do 1-2 sync operations after lease has been recovered for split process. Key: HBASE-2645 URL: https://issues.apache.org/jira/browse/HBASE-2645 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.92.0 Reporter: Cosmin Lehene Assignee: Todd Lipcon Priority: Blocker Fix For: 0.92.0 TestHLogSplit.testLogCannotBeWrittenOnceParsed is failing. This test starts a thread that writes one edit to the log, syncs and counts. During this, a HLog.splitLog operation is started. splitLog recovers the log lease before reading the log, so that the original regionserver could not wake up and write after the split process started. The test compares the number of edits reported by the split process and by the writer thread. Writer thread (called zombie in the test) should report = than the splitLog (sync() might raise after the last edit gets written and the edit won't get counted by zombie thread). However it appears that the zombie counts 1-2 more edits. So it looks like it can sync without a lease. This might be a hdfs-0.20 related issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens
[ https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924782#action_12924782 ] HBase Review Board commented on HBASE-3147: --- Message from: st...@duboce.net --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1087/ --- (Updated 2010-10-25 16:29:36.379908) Review request for hbase and stack. Changes --- Added metaservershutdownhandler and rootservershutdownhandler Summary (updated) --- Adds new handling of the timeouts for PENDING_OPEN and PENDING_CLOSE in-memory master RIT states. Adds some new broken RIT states into TestMasterFailover. Some of these broken states don't seem possible to me but as long as we aren't breaking the existing behaviors and tests I think it's okay if we handle odd cases that can be mocked. Who knows what will happen in the real world. The reason TestMasterFailover didn't/doesn't really test for the issue in HBASE-3147 is this new broken condition happens when an RS dies / goes offline rather than a master failover concurrent w/ RS failure. v4 of the patch adds to Jons' fixes. It adds a shutdown server handler for root and another for meta so the processing of servers hosting meta/root do not get frozen out. I've seen this in my testing. This addresses bug HBASE-3147. http://issues.apache.org/jira/browse/HBASE-3147 Diffs (updated) - trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 1027291 trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 1027291 trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1027291 trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java 1027291 trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1027291 trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1027291 trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1027291 trunk/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/master/handler/RootServerShutdownHandler.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 1027292 trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1027291 trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 1027291 Diff: http://review.cloudera.org/r/1087/diff Testing --- TestMasterFailover passes. Thanks, Jonathan Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens - Key: HBASE-3147 URL: https://issues.apache.org/jira/browse/HBASE-3147 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.90.0 The rolling restart script is great for bringing on the weird stuff. On my little loaded cluster if I run it, it horks the cluster and it doesn't recover. I notice two issues that need fixing: 1. We'll miss noticing that a server was carrying .META. and it never gets assigned -- the shutdown handlers get stuck in perpetual wait on a .META. assign that will never happen. 2. Perpetual cycling of the this sequence per region not succesfully assigned: {code} 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. state=PENDING_OPEN, ts=1287869814294 45154 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN or OPENING for too long, reassigning region=usertable,user510588360,1287547556587. 7f2d92497d2d03917afd574ea2aca55b. 45155 2010-10-23 21:37:57,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE 45156 2010-10-23 21:37:57,404 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE failed, the node existed but was in the state M_ZK_REGION_OFFLINE 45157 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING to OFFLINE so skipping timeout, region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b.
[jira] Commented: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens
[ https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924784#action_12924784 ] HBase Review Board commented on HBASE-3147: --- Message from: Jonathan Gray jg...@apache.org --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1087/#review1662 --- Ship it! Looks good. Not sure if I can +1 my patch but I think we should commit :) trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java http://review.cloudera.org/r/1087/#comment5542 Should we remove this code from inside of ServerShutdownHandler now? Not a big deal but being done twice. - Jonathan Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens - Key: HBASE-3147 URL: https://issues.apache.org/jira/browse/HBASE-3147 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.90.0 The rolling restart script is great for bringing on the weird stuff. On my little loaded cluster if I run it, it horks the cluster and it doesn't recover. I notice two issues that need fixing: 1. We'll miss noticing that a server was carrying .META. and it never gets assigned -- the shutdown handlers get stuck in perpetual wait on a .META. assign that will never happen. 2. Perpetual cycling of the this sequence per region not succesfully assigned: {code} 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. state=PENDING_OPEN, ts=1287869814294 45154 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN or OPENING for too long, reassigning region=usertable,user510588360,1287547556587. 7f2d92497d2d03917afd574ea2aca55b. 45155 2010-10-23 21:37:57,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE 45156 2010-10-23 21:37:57,404 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE failed, the node existed but was in the state M_ZK_REGION_OFFLINE 45157 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING to OFFLINE so skipping timeout, region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. ,,, {code} Timeout period again elapses an then same sequence. This is what I've been working on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens
[ https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3147: - Attachment: HBASE-3147-v6.patch Here is what I'll commit. It does as Jon suggests removing check of root or meta carrying inside in shutdown handler since we're doing the check on the outside now. This patch also includes missing hookup that testing found. There is still work to do on this issue. What seems to be happening is that a watcher is not being triggered. Need to figure how that is happening. I'll see a regionserver with all of its opener handlers stuck waiting on notification that meta has been deployed Other servers will have gotten their watcher triggered but not one or two in the cluster Master is then stuck timing out this regionservers allocations and then reassigning... calling open on the rpc which adds region to queue but since all openers are stuck waiting on meta, the queues don't get processed. Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens - Key: HBASE-3147 URL: https://issues.apache.org/jira/browse/HBASE-3147 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.90.0 Attachments: HBASE-3147-v6.patch The rolling restart script is great for bringing on the weird stuff. On my little loaded cluster if I run it, it horks the cluster and it doesn't recover. I notice two issues that need fixing: 1. We'll miss noticing that a server was carrying .META. and it never gets assigned -- the shutdown handlers get stuck in perpetual wait on a .META. assign that will never happen. 2. Perpetual cycling of the this sequence per region not succesfully assigned: {code} 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. state=PENDING_OPEN, ts=1287869814294 45154 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN or OPENING for too long, reassigning region=usertable,user510588360,1287547556587. 7f2d92497d2d03917afd574ea2aca55b. 45155 2010-10-23 21:37:57,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE 45156 2010-10-23 21:37:57,404 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE failed, the node existed but was in the state M_ZK_REGION_OFFLINE 45157 2010-10-23 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING to OFFLINE so skipping timeout, region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. ,,, {code} Timeout period again elapses an then same sequence. This is what I've been working on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable
[ https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924792#action_12924792 ] Nicolas Spiegelberg commented on HBASE-2462: @stack: 1. FS default blocksize is the default for a non-custom hlog.blocksize, but they are not necessarily 1-1. The idea is that new HFiles created should always be = hlog.blocksize, so we unconditionally compact for HFiles that have not already been compacted at least once. 2. The idea behind step #4 is that compaction becomes extremely useful when you can use it to dedupe. We should definitely use the compactionThreshold metric here instead of hard-coded 3, However, I don't think this should be an absolute number of StoreFiles, but rather the number of relatively-small StoreFiles. If you have huge region sizes (e.g. large object store), then you don't mind having 6 storefiles and really just want to compact when it will save a decent amount of space. 3. This algorithm will perform roughly the same for compacting small/new files; however it will be more aggressive about including older files in the compaction because it can more quickly detect when it's advantageous to compact. Because of the 4x (vs. 2x) multiplier, it's 2x more scalable and should result in 1/2 the amount of large StoreFiles for large regions. For DEFAULT_MAX_FILE_SIZE == 256MB, you should never have more than 5 StoreFiles before triggering a split. Review compaction heuristic and move compaction code out so standalone and independently testable - Key: HBASE-2462 URL: https://issues.apache.org/jira/browse/HBASE-2462 Project: HBase Issue Type: Improvement Reporter: stack Assignee: Jonathan Gray Priority: Critical Anything that improves our i/o profile makes hbase run smoother. Over in HBASE-2457, good work has been done already describing the tension between minimizing compactions versus minimizing count of store files. This issue is about following on from what has been done in 2457 but also, breaking the hard-to-read compaction code out of Store.java out to a standalone class that can be the easier tested (and easily analyzed for its performance characteristics). If possible, in the refactor, we'd allow specification of alternate merge sort implementations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3151) NPE when trying to read regioninfo from .META.
NPE when trying to read regioninfo from .META. -- Key: HBASE-3151 URL: https://issues.apache.org/jira/browse/HBASE-3151 Project: HBase Issue Type: Bug Reporter: stack This is an old issue perhaps in a new guise. From the list, Sebastien Bauer reports: {code} 2010-10-25 08:13:01,690 ERROR org.apache.hadoop.hbase.master.CatalogJanitor: Caught exception java.lang.NullPointerException 2010-10-25 08:13:24,385 INFO org.apache.hadoop.hbase.master.ServerManager: regionservers=2, averageload=2538 2010-10-23 20:16:17,890 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Cached location for .META.,,1.1028785192 is db2a.goldenline.pl:60020 2010-10-23 20:16:18,432 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.MetaScanner$1.processRow(MetaScanner.java:188) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:69) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:54) at org.apache.hadoop.hbase.client.MetaScanner.listAllRegions(MetaScanner.java:195) at org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:1048) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:379) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:265) 2010-10-23 20:16:18,433 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2010-10-23 20:16:18,433 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {code} I think he has an old master... checking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.