[jira] [Commented] (HBASE-5824) HRegion.incrementColumnValue is not used in trunk
[ https://issues.apache.org/jira/browse/HBASE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258297#comment-13258297 ] stack commented on HBASE-5824: -- This patch only makes sense in trunk, not in 0.94. What are the exceptions that now are different? HRegion.incrementColumnValue is not used in trunk - Key: HBASE-5824 URL: https://issues.apache.org/jira/browse/HBASE-5824 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5824.patch, hbase-5824_v2.patch, hbase_5824.addendum on 0.94 a call to client.HTable#incrementColumnValue will cause HRegion#incrementColumnValue. On trunk all calls to HTable.incrementColumnValue got to HRegion#increment. My guess is that HTable#incrementColumnValue and HTable#increment serialize to the same thing over the wire so that the remote HRegionServer no longer knows which htable method was called. To repro I checked out trunk and put a break point in HRegion#incrementColumnValue and then ran TestFromClientSide. The breakpoint wasn't hit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5831) hadoopqa builds not completing
[ https://issues.apache.org/jira/browse/HBASE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257647#comment-13257647 ] stack commented on HBASE-5831: -- @Todd We can do that and then require it. Nkeyway has a script for test categorizations. I've been talking w/ him about adding it to general build. We could expand it to require tests have timeouts too. Let me try this patch again. I want another clean run w/o a hang to be convinced this is the problem test. Need to too amend Ted's little script to look for tests that run 0 tests. hadoopqa builds not completing -- Key: HBASE-5831 URL: https://issues.apache.org/jira/browse/HBASE-5831 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Priority: Blocker Attachments: 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt No test failures but build complains it has failed. trunk build seems to have the same affliction: {code} Results : Tests run: 909, Failures: 0, Errors: 0, Skipped: 9 [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 41:19.273s [INFO] Finished at: Wed Apr 18 21:54:31 UTC 2012 [INFO] Final Memory: 59M/451M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test (secondPartTestsExecution) on project hbase: Failure or timeout - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523250/5811+%281%29.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: {code} Its not apparent that any particular test is not finishing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5829) Inconsistency between the regions map and the servers map in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257704#comment-13257704 ] stack commented on HBASE-5829: -- Please explain where the disparity between this.server and this.regions is in in the code Maryann. Inconsistency between the regions map and the servers map in AssignmentManager -- Key: HBASE-5829 URL: https://issues.apache.org/jira/browse/HBASE-5829 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1 Reporter: Maryann Xue There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt. In AssignmentManager.unassign(HRegionInfo, boolean) try { // TODO: We should consider making this look more like it does for the // region open where we catch all throwables and never abort if (serverManager.sendRegionClose(server, state.getRegion(), versionOfClosingNode)) { LOG.debug(Sent CLOSE to + server + for region + region.getRegionNameAsString()); return; } // This never happens. Currently regionserver close always return true. LOG.warn(Server + server + region CLOSE RPC returned false for + region.getRegionNameAsString()); } catch (NotServingRegionException nsre) { LOG.info(Server + server + returned + nsre + for + region.getRegionNameAsString()); // Presume that master has stale data. Presume remote side just split. // Presume that the split message when it comes in will fix up the master's // in memory cluster state. } catch (Throwable t) { if (t instanceof RemoteException) { t = ((RemoteException)t).unwrapRemoteException(); if (t instanceof NotServingRegionException) { if (checkIfRegionBelongsToDisabling(region)) { // Remove from the regionsinTransition map LOG.info(While trying to recover the table + region.getTableNameAsString() + to DISABLED state the region + region + was offlined but the table was in DISABLING state); synchronized (this.regionsInTransition) { this.regionsInTransition.remove(region.getEncodedName()); } // Remove from the regionsMap synchronized (this.regions) { this.regions.remove(region); } deleteClosingOrClosedNode(region); } } // RS is already processing this region, only need to update the timestamp if (t instanceof RegionAlreadyInTransitionException) { LOG.debug(update + state + the timestamp.); state.update(state.getState()); } } In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean) synchronized (this.regions) { this.regions.put(plan.getRegionInfo(), plan.getDestination()); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassigning the same region
[ https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257729#comment-13257729 ] stack commented on HBASE-5816: -- @Maryann Agree on your 1., and 2. above. Its possible to make a standalone AssignmentManager using mocks -- see TestAssignmentManager. Maybe we should try some of your suppositions over in unit tests Maryann and find holes in AM by writing unit tests? Balancer and ServerShutdownHandler concurrently reassigning the same region --- Key: HBASE-5816 URL: https://issues.apache.org/jira/browse/HBASE-5816 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Maryann Xue Assignee: ramkrishna.s.vasudevan Priority: Critical Attachments: HBASE-5816.patch The first assign thread exits with success after updating the RegionState to PENDING_OPEN, while the second assign follows immediately into assign and fails the RegionState check in setOfflineInZooKeeper(). This causes the master to abort. In the below case, the two concurrent assigns occurred when AM tried to assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to assign this region (from the region plan) spontaneously. 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. (offlining) 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING) 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=CLOSED, ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x236b912e9b3000e Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN, ts=1334613179096, server=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0 java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 12 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 remote=/10.239.47.87:60020] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283) at $Proxy7.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573) at
[jira] [Commented] (HBASE-5654) [findbugs] Address dodgy bugs
[ https://issues.apache.org/jira/browse/HBASE-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257743#comment-13257743 ] stack commented on HBASE-5654: -- @Jon Our hadoopqa has been hanging with a while. Its probably not this patch. Maybe compare to previous runs. I'm working on trying to figure out why the hangs meantime. [findbugs] Address dodgy bugs - Key: HBASE-5654 URL: https://issues.apache.org/jira/browse/HBASE-5654 Project: HBase Issue Type: Sub-task Components: scripts Affects Versions: 0.96.0 Reporter: Jonathan Hsieh Assignee: Ashutosh Jindal Labels: patch Fix For: 0.96.0 Attachments: Hbase 5654_v3.patch, Hbase-5654.patch, Hbase_5654_V2.patch See https://builds.apache.org/job/PreCommit-HBASE-Build/1313//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html#Warnings_STYLE This may be broken down further. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5548) Add ability to get a table in the shell
[ https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257760#comment-13257760 ] stack commented on HBASE-5548: -- Sorry Jesse for taking a while to get back to this. Patch looks good. I tried it some more and got this: {code} hbase(main):011:0 t.put 'x', 'y:x', 'x' 0 row(s) in 0.0110 seconds hbase(main):012:0 t.get 'x' COLUMN CELL ERROR: undefined method `get_internal' for Hbase::Table - y:Hbase::Table Here is some help for this command: Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp, timerange and versions. Examples: hbase get 't1', 'r1' hbase get 't1', 'r1', {TIMERANGE = [ts1, ts2]} hbase get 't1', 'r1', {COLUMN = 'c1'} hbase get 't1', 'r1', {COLUMN = ['c1', 'c2', 'c3']} hbase get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1} hbase get 't1', 'r1', {COLUMN = 'c1', TIMERANGE = [ts1, ts2], VERSIONS = 4} hbase get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1, VERSIONS = 4} hbase get 't1', 'r1', 'c1' hbase get 't1', 'r1', 'c1', 'c2' hbase get 't1', 'r1', ['c1', 'c2'] The same commands also can be run on a table reference. Suppose you had a reference t to table 't1', the corresponding commands would be: hbase t.get 'r1' hbase t.get 'r1', {TIMERANGE = [ts1, ts2]} hbase t.get 'r1', {COLUMN = 'c1'} hbase t.get 'r1', {COLUMN = ['c1', 'c2', 'c3']} hbase t.get 'r1', {COLUMN = 'c1', TIMESTAMP = ts1} hbase t.get 'r1', {COLUMN = 'c1', TIMERANGE = [ts1, ts2], VERSIONS = 4} hbase t.get 'r1', {COLUMN = 'c1', TIMESTAMP = ts1, VERSIONS = 4} hbase t.get 'r1', 'c1' hbase t.get 'r1', 'c1', 'c2' hbase t.get 'r1', ['c1', 'c2'] {code} Seems like an issue? Also in the help, talks about a table reference without explaining what it is (there is no mention of what this is in the general help either it seems). It could be confusing talking about a 't' w/o saying where it came from? I like the output of t.help. This is odd though: {code} hbase t.put 'r', 'c', 'q', 'v' which puts a row 'r' with column family 'c', qualifier 'q' and value 'v' into table t. {code} In the rest of the shell columns are a combo of family and qualifier delimited by the ':'. You are changing that w/ the above. Add ability to get a table in the shell --- Key: HBASE-5548 URL: https://issues.apache.org/jira/browse/HBASE-5548 Project: HBase Issue Type: Improvement Components: shell Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0, 0.94.1 Attachments: ruby_HBASE-5528-v0.patch, ruby_HBASE-5548-v1.patch, ruby_HBASE-5548-v2.patch, ruby_HBASE-5548-v3.patch Currently, all the commands that operate on a table in the shell first have to take the table as name as input. There are two main considerations: * It is annoying to have to write the table name every time, when you should just be able to get a reference to a table * the current implementation is very wasteful - it creates a new HTable for each call (but reuses the connection since it uses the same configuration) We should be able to get a handle to a single HTable and then operate on that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5831) hadoopqa builds not completing
[ https://issues.apache.org/jira/browse/HBASE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257773#comment-13257773 ] stack commented on HBASE-5831: -- @Todd That'd be nice (smile) This test run did 936 tests which is more than normal. Let me try again. hadoopqa builds not completing -- Key: HBASE-5831 URL: https://issues.apache.org/jira/browse/HBASE-5831 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Priority: Blocker Attachments: 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt No test failures but build complains it has failed. trunk build seems to have the same affliction: {code} Results : Tests run: 909, Failures: 0, Errors: 0, Skipped: 9 [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 41:19.273s [INFO] Finished at: Wed Apr 18 21:54:31 UTC 2012 [INFO] Final Memory: 59M/451M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test (secondPartTestsExecution) on project hbase: Failure or timeout - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523250/5811+%281%29.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: {code} Its not apparent that any particular test is not finishing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3614) Expose per-region request rate metrics
[ https://issues.apache.org/jira/browse/HBASE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257787#comment-13257787 ] stack commented on HBASE-3614: -- Since you renamed RegionOperationMetrics, is this right now: {code} + private final OperationMetrics regionMetrics; {code} Should it be named metrics or operationMetrics? Whats 'unknown' in the following? + //null will be treated as unknown. We are updating metrics w/o attributing them to a cf? Fix misspell 'Inctement' in hbase-site change Patch is good to go after addressing above. Good stuff. Expose per-region request rate metrics -- Key: HBASE-3614 URL: https://issues.apache.org/jira/browse/HBASE-3614 Project: HBase Issue Type: Improvement Components: metrics, regionserver Reporter: Gary Helmling Assignee: Elliott Clark Priority: Minor Attachments: HBASE-3614-0.patch, HBASE-3614-1.patch, HBASE-3614-2.patch, HBASE-3614-3.patch, HBASE-3614-4.patch, HBASE-3614-5.patch, HBASE-3614-6.patch, HBASE-3614-7.patch, Screen Shot 2012-04-17 at 2.41.27 PM.png We currently export metrics on request rates for each region server, and this can help with identifying uneven load at a high level. But once you see a given server under high load, you're forced to extrapolate based on your application patterns and the data it's serving what the likely culprit is. This can and should be much easier if we just exported request rate metrics per-region on each server. Dynamically updating the metrics keys based on assigned regions may pose some minor challenges, but this seems a very valuable diagnostic tool to have available. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257810#comment-13257810 ] stack commented on HBASE-5547: -- bq. To get a guaranteed consistent snapshot the RegionServers need to check for the znode's value synchronously in the delete path (or at least I see no other way). Otherwise there are times when the RegionServers do not agree and some files will be deleted and some will be backed up with no possibility for the client to know exactly as of when the backup would be consistent. This would make for the narrowest possible window regards whether backup is on or off. Does it have to be a custom znode? If we had a Configuration or Table znode, it could read the content? Maybe checking existence is cheaper than reading znode content though? Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5824) HRegion.incrementColumnValue is not used in trunk
[ https://issues.apache.org/jira/browse/HBASE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257849#comment-13257849 ] stack commented on HBASE-5824: -- +1 on the Jimmy patch. @Elliott At least add a deprecate pointing to preferred code I'd say? HRegion.incrementColumnValue is not used in trunk - Key: HBASE-5824 URL: https://issues.apache.org/jira/browse/HBASE-5824 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Jimmy Xiang Attachments: hbase-5824.patch on 0.94 a call to client.HTable#incrementColumnValue will cause HRegion#incrementColumnValue. On trunk all calls to HTable.incrementColumnValue got to HRegion#increment. My guess is that HTable#incrementColumnValue and HTable#increment serialize to the same thing over the wire so that the remote HRegionServer no longer knows which htable method was called. To repro I checked out trunk and put a break point in HRegion#incrementColumnValue and then ran TestFromClientSide. The breakpoint wasn't hit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5836) Backport per region metrics from HBASE-3614 to 0.94.1
[ https://issues.apache.org/jira/browse/HBASE-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257886#comment-13257886 ] stack commented on HBASE-5836: -- +1 Backport per region metrics from HBASE-3614 to 0.94.1 - Key: HBASE-5836 URL: https://issues.apache.org/jira/browse/HBASE-5836 Project: HBase Issue Type: Task Reporter: stack Assignee: Elliott Clark Fix For: 0.94.1 This would be good to have in 0.94. Can go into 0.94.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5621) Convert admin protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257914#comment-13257914 ] stack commented on HBASE-5621: -- Want to put your patch up here Jimmy and run it by hadoopqa? Thanks. Convert admin protocol of HRegionInterface to PB Key: HBASE-5621 URL: https://issues.apache.org/jira/browse/HBASE-5621 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5831) hadoopqa builds not completing
[ https://issues.apache.org/jira/browse/HBASE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257925#comment-13257925 ] stack commented on HBASE-5831: -- Thanks Jon. I tried it over in HBASE-5794, setting it back down again, but it didn't seen to matter. I committed the patch there which undoes the 100 anyways since Mikhail said the change was good for some 0.89fb tests, he wasn't sure about trunk. hadoopqa builds not completing -- Key: HBASE-5831 URL: https://issues.apache.org/jira/browse/HBASE-5831 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Priority: Blocker Attachments: 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.all.mapreduce.txt, 5831.remove.all.mapreduce.txt No test failures but build complains it has failed. trunk build seems to have the same affliction: {code} Results : Tests run: 909, Failures: 0, Errors: 0, Skipped: 9 [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 41:19.273s [INFO] Finished at: Wed Apr 18 21:54:31 UTC 2012 [INFO] Final Memory: 59M/451M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test (secondPartTestsExecution) on project hbase: Failure or timeout - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523250/5811+%281%29.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: {code} Its not apparent that any particular test is not finishing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258006#comment-13258006 ] stack commented on HBASE-5833: -- Eh, Ted, builds.apache.org is a public web site. I do not need your echoing whats there in here. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5831) hadoopqa builds not completing
[ https://issues.apache.org/jira/browse/HBASE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258008#comment-13258008 ] stack commented on HBASE-5831: -- @Jon Hmm.. yes. You are right. Both times it passed. It was worth committing hbase-5794 then. Now to find the other hanging tests... hadoopqa builds not completing -- Key: HBASE-5831 URL: https://issues.apache.org/jira/browse/HBASE-5831 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Priority: Blocker Attachments: 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.all.mapreduce.txt, 5831.remove.all.mapreduce.txt No test failures but build complains it has failed. trunk build seems to have the same affliction: {code} Results : Tests run: 909, Failures: 0, Errors: 0, Skipped: 9 [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 41:19.273s [INFO] Finished at: Wed Apr 18 21:54:31 UTC 2012 [INFO] Final Memory: 59M/451M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test (secondPartTestsExecution) on project hbase: Failure or timeout - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523250/5811+%281%29.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: {code} Its not apparent that any particular test is not finishing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3614) Expose per-region request rate metrics
[ https://issues.apache.org/jira/browse/HBASE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258011#comment-13258011 ] stack commented on HBASE-3614: -- @Todd This issue just exposes metrics that were already being collected per region. I believe its over the metrics reporting period (5 seconds?). Want that changed? Metrics could do w/ a revamp/edit for sure. Expose per-region request rate metrics -- Key: HBASE-3614 URL: https://issues.apache.org/jira/browse/HBASE-3614 Project: HBase Issue Type: Improvement Components: metrics, regionserver Reporter: Gary Helmling Assignee: Elliott Clark Priority: Minor Fix For: 0.96.0 Attachments: HBASE-3614-0.patch, HBASE-3614-1.patch, HBASE-3614-2.patch, HBASE-3614-3.patch, HBASE-3614-4.patch, HBASE-3614-5.patch, HBASE-3614-6.patch, HBASE-3614-7.patch, HBASE-3614-8.patch, HBASE-3614-9.patch, Screen Shot 2012-04-17 at 2.41.27 PM.png We currently export metrics on request rates for each region server, and this can help with identifying uneven load at a high level. But once you see a given server under high load, you're forced to extrapolate based on your application patterns and the data it's serving what the likely culprit is. This can and should be much easier if we just exported request rate metrics per-region on each server. Dynamically updating the metrics keys based on assigned regions may pose some minor challenges, but this seems a very valuable diagnostic tool to have available. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256788#comment-13256788 ] stack commented on HBASE-5782: -- You will need to pull in HLogPerformanceEvaluation. Copy it whole (don't do the hbase-5792 because it got mod'd a few times subsequent to commit). You could also just commit the unit test to trunk and not to 0.94; that should be fine long as we hold to committing patches to trunk first. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: Lars Hofhansl Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782-v3.txt, 5782.txt, 5782.unfinished-stack.txt, 5782.unittest.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5816) Two concurrent assign would cause master to abort with msg Unexpected state trying to OFFLINE;
[ https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256799#comment-13256799 ] stack commented on HBASE-5816: -- Thanks for filing the issue Maryann. I think we need to address the root problem of two threads in the master both at the same time trying to assign the same region rather than do as is done here where we just stop the abort. The patch as is will only move the problem down the line (we'll likely end up w/ a single region double assigned?). Let me update the issue title. This log snippet is a really good find. Two concurrent assign would cause master to abort with msg Unexpected state trying to OFFLINE; - Key: HBASE-5816 URL: https://issues.apache.org/jira/browse/HBASE-5816 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Maryann Xue Attachments: HBASE-5816.patch The first assign thread exits with success after updating the RegionState to PENDING_OPEN, while the second assign follows immediately into assign and fails the RegionState check in setOfflineInZooKeeper(). This causes the master to abort. In the below case, the two concurrent assigns occurred when AM tried to assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to assign this region (from the region plan) spontaneously. 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. (offlining) 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING) 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=CLOSED, ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x236b912e9b3000e Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN, ts=1334613179096, server=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0 java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 12 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 remote=/10.239.47.87:60020] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283) at $Proxy7.openRegion(Unknown Source) at
[jira] [Commented] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256827#comment-13256827 ] stack commented on HBASE-5737: -- I think this is weird '+this.balancer.setMasterServices(this);' but its not your change. +1 on commit. Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch, HBASE-5737_3.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5823) Hbck should be able to print help
[ https://issues.apache.org/jira/browse/HBASE-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256835#comment-13256835 ] stack commented on HBASE-5823: -- +1 on patch. Hbck should be able to print help - Key: HBASE-5823 URL: https://issues.apache.org/jira/browse/HBASE-5823 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Minor Attachments: hbase-hbck.patch bin/hbase hbck -h and -help should print the help message. It used to print help when unrecognized options are passed. We can backport this to 0.92/0.94 branches as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256861#comment-13256861 ] stack commented on HBASE-5547: -- Is the problem in #1 the client waiting on acks from all the regionservers? Does it need to do this? Can it not just set the state up in zk and then just move on (You have this in your patch already if I remember rightly). Do you want the RS's acknowledging that they have been set into backup mode? They could set a flag up in zk but this gets torturous when say we add a new feature that wants to do some thing similar. If we had a dynamic Configuration system, one that didn't require roll of table to set the table 'read-only' or 'in-back-up mode', would that help here? One option #2, yeah, its a pain going to zk for each WAL when there is this callback mechanism that all RS are subscribed to anyways. For sure could poll zk the first time but should then cache the setting and only drop it later if a callback says it changed. Agree roll of table to set the backup flag is much too heavyweight. Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5823) Hbck should be able to print help
[ https://issues.apache.org/jira/browse/HBASE-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256880#comment-13256880 ] stack commented on HBASE-5823: -- I tried it. Seems to work. Hbck should be able to print help - Key: HBASE-5823 URL: https://issues.apache.org/jira/browse/HBASE-5823 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Minor Fix For: 0.92.2, 0.94.0 Attachments: hbase-hbck.patch bin/hbase hbck -h and -help should print the help message. It used to print help when unrecognized options are passed. We can backport this to 0.92/0.94 branches as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5790) ZKUtil deleteRecursively should be a recoverable operation
[ https://issues.apache.org/jira/browse/HBASE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256889#comment-13256889 ] stack commented on HBASE-5790: -- @Ted Its not a runtime requirement that client or ensemble be 3.4.x. 3.4.x client and ensemble is required if you run secure hbase else its not necessary and we should be wary requiring it; e.g. our ops didn't want to upgrade to 3.4.x ensemble just yet and so we run w/ a 3.4.x client against 3.3.x ensemble. @Jesse Sounds fine requiring 3.4.x in 0.96. Want to raise a conversation out on mailing list? ZKUtil deleteRecursively should be a recoverable operation -- Key: HBASE-5790 URL: https://issues.apache.org/jira/browse/HBASE-5790 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Assignee: Jesse Yates Labels: zookeeper Fix For: 0.96.0, 0.94.1 Attachments: java_HBASE-5790-v1.patch, java_HBASE-5790.patch As of 3.4.3 Zookeeper now has full, multi-operation transaction. This means we can wholesale delete chunks of the zk tree and ensure that we don't have any pesky recursive delete issues where we delete the children of a node, but then a child joins before deletion of the parent. Even without transactions, this should be the behavior, but it is possible to make it much cleaner now that we have this new feature in zk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256921#comment-13256921 ] stack commented on HBASE-5677: -- Xufeng So we should close this issue and backport hbase-5454 to 0.90 and to 0.92.2? Or would you rather make a new issue that adds check initialized to createTable for trunk and 0.94 and that has a new version of hbase-5454 that includes checkinitialized in the patch we put on 0.90 and 0.92? The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2 Attachments: 5677-proposal.txt, 5677-proposal.txt, Backport-HBASE-5454-to-90.patch, Backport-HBASE-5454-to-92.patch, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.
[ https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256950#comment-13256950 ] stack commented on HBASE-5545: -- The addtions to FSUtils are over the top but +1 on patch -- deleting tmp content on open seems useful. region can't be opened for a long time. Because the creating File failed. - Key: HBASE-5545 URL: https://issues.apache.org/jira/browse/HBASE-5545 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.6 Reporter: gaojinchao Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.92.2, 0.94.0 Attachments: HBASE-5545.patch, HBASE-5545.patch Scenario: 1. File is created 2. But while writing data, all datanodes might have crashed. So writing data will fail. 3. Now even if close is called in finally block, close also will fail and throw the Exception because writing data failed. 4. After this if RS try to create the same file again, then AlreadyBeingCreatedException will come. Suggestion to handle this scenario. --- 1. Check for the existence of the file, if exists delete the file and create new file. Here delete call for the file will not check whether the file is open or closed. Overwrite Option: 1. Overwrite option will be applicable if you are trying to overwrite a closed file. 2. If the file is not closed, then even with overwrite option Same AlreadyBeingCreatedException will be thrown. This is the expected behaviour to avoid the Multiple clients writing to same file. Region server logs: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo for DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 on client 158.1.132.19 because current leaseholder is trying to recreate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131) at org.apache.hadoop.ipc.Client.call(Client.java:961) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245) at $Proxy6.create(Unknown Source) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at $Proxy6.create(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518) at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) [2012-03-07 20:51:45,858] [WARN ] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131]
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256956#comment-13256956 ] stack commented on HBASE-5547: -- bq. ...but is definitely a concern and something that I've seen take up to a few seconds to propagate. Yeah. If you don't want a window, query the regionservers (you'll need to add something to query but...) bq. ... Are you basically talking about doing per-table configuration storage in the table znode? I was stating then that we already are doing a per table attribute up in zk -- whether enabled or disabled -- and that rather than do up new nodes for a new attribute that instead we should add to the table znode the new attribute. That was then. Now I'm suggesting we put all config up there. We could start w/ HTD if we want to keep it table scoped (we'd have another tier in front of the one Nicolas added, a dynamic one). If the above too ambitious, we should at least generalize the table znode so can add attributes and we might as well pb serialize the HTD as anything else? bq. ...If they are disabled, they need to check everytime to see if it has been enabled Or just watch the table znode and if it changes, check if backup has been flipped on. Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256968#comment-13256968 ] stack commented on HBASE-5782: -- Sorry. Dumb. The tool calls system.exit. Let me fix in another issue. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: Lars Hofhansl Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782-v3.txt, 5782.txt, 5782.unfinished-stack.txt, 5782.unittest.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5825) TestHLog not running any tests; fix
[ https://issues.apache.org/jira/browse/HBASE-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256973#comment-13256973 ] stack commented on HBASE-5825: -- The commit on HBASE-5782 broke TestHLog (It included a unit test of mine that calls HLogPerformanceEvaluation -- it calls System.exit when done). TestHLog not running any tests; fix --- Key: HBASE-5825 URL: https://issues.apache.org/jira/browse/HBASE-5825 Project: HBase Issue Type: Bug Reporter: stack -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5799) [89-fb] Multiget API may return incomplete resutls
[ https://issues.apache.org/jira/browse/HBASE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257036#comment-13257036 ] stack commented on HBASE-5799: -- @Liyin Do we need this out on trunk? What commit on 0.89fb was this fix? Thanks. [89-fb] Multiget API may return incomplete resutls -- Key: HBASE-5799 URL: https://issues.apache.org/jira/browse/HBASE-5799 Project: HBase Issue Type: Bug Reporter: Liyin Tang Assignee: Liyin Tang There is a serious bug in the multiget which will cause the multiget function only returns part of the results. In the process function: The initial region is set before sorting the input list. So after the input list has been sorted, the initial region may no longer be the correct region for the first row in the sorted list. So the first row in the sorted list may be sent to the wrong region server which has no result for this row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257207#comment-13257207 ] stack commented on HBASE-5737: -- @Ram I do not follow. Please rephrase. Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch, HBASE-5737_3.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5751) hbase master stop does not bring down backup masters
[ https://issues.apache.org/jira/browse/HBASE-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257216#comment-13257216 ] stack commented on HBASE-5751: -- When was it reverted Gregory? There was a long run of fails after its commit. Thanks. hbase master stop does not bring down backup masters -- Key: HBASE-5751 URL: https://issues.apache.org/jira/browse/HBASE-5751 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Gregory Chanan Fix For: 0.90.7 Carry forward the discussion from parent for 0.90 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5349) Automagically tweak global memstore and block cache sizes based on workload
[ https://issues.apache.org/jira/browse/HBASE-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257219#comment-13257219 ] stack commented on HBASE-5349: -- I wouldn't mind more detail. Can our LRU be resized? Memstore upper bound can vary but there are interesting effects like if its too big, flushing can take so long, the memstore fills before we get around to flushing it again so we block. Nit: 10 minutes seems like too coarse a granularity? Good stuff Enis. Automagically tweak global memstore and block cache sizes based on workload --- Key: HBASE-5349 URL: https://issues.apache.org/jira/browse/HBASE-5349 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Fix For: 0.96.0 Hypertable does a neat thing where it changes the size given to the CellCache (our MemStores) and Block Cache based on the workload. If you need an image, scroll down at the bottom of this link: http://www.hypertable.com/documentation/architecture/ That'd be one less thing to configure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3614) Expose per-region request rate metrics
[ https://issues.apache.org/jira/browse/HBASE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257226#comment-13257226 ] stack commented on HBASE-3614: -- This could be final: '+ private RegionOperationMetrics regionMetrics;'? 100 chars per line. Just pass HRegionInfo altogether to the below? {code} +this.regionMetrics = new RegionOperationMetrics(conf, this.regionInfo.getTableNameAsString(), this.regionInfo.getEncodedName()); {code} Err... your replacement is better than what was there previously in the below: {code} -final String metricPrefix = SchemaMetrics.generateSchemaMetricsPrefix( -getTableDesc().getNameAsString(), familyMap.keySet()); -if (!metricPrefix.isEmpty()) { - RegionMetricsStorage.incrTimeVaryingMetric(metricPrefix + delete_, after - now); -} +this.regionMetrics.updateDeleteMetrics(familyMap.keySet(), after-now); {code} Whats happening here? {code} +if (cfSet == null) { + cfSet = put.getFamilyMap().keySet(); +} else { + cfSetConsistent = cfSetConsistent put.equals(cfSet); {code} Do we have to get the column family set each time through? It never changes (currently) while the region is open. Whats a cfSetConsistent? A comment would help? Yeah, I don't follow this stuff: {code} + //See if the column families were consistent through the whole thing. + //if they were then keep them. If they were not then pass a null. + //null will be treated as unknown. {code} Should be hbase.metrics.region.exposeOperationTimes instead of hbase.metrics.exposeOperationTimes to convey its on/off for per-region metrics? This patch is great. Expose per-region request rate metrics -- Key: HBASE-3614 URL: https://issues.apache.org/jira/browse/HBASE-3614 Project: HBase Issue Type: Improvement Components: metrics, regionserver Reporter: Gary Helmling Assignee: Elliott Clark Priority: Minor Attachments: HBASE-3614-0.patch, HBASE-3614-1.patch, HBASE-3614-2.patch, HBASE-3614-3.patch, HBASE-3614-4.patch, Screen Shot 2012-04-17 at 2.41.27 PM.png We currently export metrics on request rates for each region server, and this can help with identifying uneven load at a high level. But once you see a given server under high load, you're forced to extrapolate based on your application patterns and the data it's serving what the likely culprit is. This can and should be much easier if we just exported request rate metrics per-region on each server. Dynamically updating the metrics keys based on assigned regions may pose some minor challenges, but this seems a very valuable diagnostic tool to have available. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5751) hbase master stop does not bring down backup masters
[ https://issues.apache.org/jira/browse/HBASE-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257230#comment-13257230 ] stack commented on HBASE-5751: -- https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/456/console looks like a similar hang though not on same test; the tests are aborted midway through. I think your arg. that its unrelated holds going by the fact that 471-473 fail TestLogRolling in the manner in which they failed when the patch was in place. Lets commit hbase-5213 and figure this failing TestLogRolling out in a new issue. hbase master stop does not bring down backup masters -- Key: HBASE-5751 URL: https://issues.apache.org/jira/browse/HBASE-5751 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Gregory Chanan Fix For: 0.90.7 Carry forward the discussion from parent for 0.90 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257241#comment-13257241 ] stack commented on HBASE-5737: -- The above is from AM? If so, I'm not sure it a bug. At the time, my sense is that the balancer ran w/o keeping context. Whats changed is that you seem to have a LB that is doing this now. As to whether a bug or improvement, its your call boss. Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch, HBASE-5737_3.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5827) [Coprocessors] Observer notifications on exceptions
[ https://issues.apache.org/jira/browse/HBASE-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257244#comment-13257244 ] stack commented on HBASE-5827: -- This seems like something we need if cps are to be able to keep a running context. [Coprocessors] Observer notifications on exceptions --- Key: HBASE-5827 URL: https://issues.apache.org/jira/browse/HBASE-5827 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Andrew Purtell Benjamin Busjaeger wrote on dev@: {quote} Is there a reason that RegionObservers are not notified when a get/put/delete fails? Suppose I maintain some (transient) state in my Coprocessor that is created during preGet and discarded during postGet. If the get fails, postGet is not invoked, so I cannot remove the state. If there is a good reason, is there any other way to achieve the same thing? If not, would it be possible to add something the snippet below to the code base? {code} // pre-get CP hook if (withCoprocessor (coprocessorHost != null)) { if (coprocessorHost.preGet(get, results)) { return results; } } +try{ ... +} catch (Throwable t) { +// failed-get CP hook +if (withCoprocessor (coprocessorHost != null)) { + coprocessorHost.failedGet(get, results); +} +rethrow t; +} // post-get CP hook if (withCoprocessor (coprocessorHost != null)) { coprocessorHost.postGet(get, results); } {code} {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassigning the same region
[ https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257252#comment-13257252 ] stack commented on HBASE-5816: -- Great stuff Maryann. Where is the above bit of code from? I don't find it in trunk (could be me). bq. It should be safe for the later thread just return or get an exception if the region has already been assigned by an earlier thread. What are you thinking? When we go into the assign, we check if the region is in transition and unless its a force assign, just return? Or would you do this earlier? Maybe the balancer should be more deferential? It could check if the regionserver its been asked move a region from is on the deadservers list. This would still be racy though. Would doing the check in the assign method be enough? (I've not looked at the code). Thanks for the help on this stuff. Balancer and ServerShutdownHandler concurrently reassigning the same region --- Key: HBASE-5816 URL: https://issues.apache.org/jira/browse/HBASE-5816 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Maryann Xue Assignee: ramkrishna.s.vasudevan Priority: Critical Attachments: HBASE-5816.patch The first assign thread exits with success after updating the RegionState to PENDING_OPEN, while the second assign follows immediately into assign and fails the RegionState check in setOfflineInZooKeeper(). This causes the master to abort. In the below case, the two concurrent assigns occurred when AM tried to assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to assign this region (from the region plan) spontaneously. 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. (offlining) 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING) 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=CLOSED, ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x236b912e9b3000e Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN, ts=1334613179096, server=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0 java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 12 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302
[jira] [Commented] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassigning the same region
[ https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257255#comment-13257255 ] stack commented on HBASE-5816: -- Should we have the servershutdownhandler and the balancer feed a single queue that assignment manager pulls from? If the region is already in the queue then we'd favor the purposed assignment (the balancers?) rather than the random one? Balancer and ServerShutdownHandler concurrently reassigning the same region --- Key: HBASE-5816 URL: https://issues.apache.org/jira/browse/HBASE-5816 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Maryann Xue Assignee: ramkrishna.s.vasudevan Priority: Critical Attachments: HBASE-5816.patch The first assign thread exits with success after updating the RegionState to PENDING_OPEN, while the second assign follows immediately into assign and fails the RegionState check in setOfflineInZooKeeper(). This causes the master to abort. In the below case, the two concurrent assigns occurred when AM tried to assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to assign this region (from the region plan) spontaneously. 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. (offlining) 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING) 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=CLOSED, ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x236b912e9b3000e Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN, ts=1334613179096, server=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0 java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 12 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 remote=/10.239.47.87:60020] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283) at $Proxy7.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573) at
[jira] [Commented] (HBASE-5792) HLog Performance Evaluation Tool
[ https://issues.apache.org/jira/browse/HBASE-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255644#comment-13255644 ] stack commented on HBASE-5792: -- @Todd Thanks. I removed TestHLogBench over in HBASE-5808. The new test does verify and actually writes a log which TestHLogBench does not. HLog Performance Evaluation Tool Key: HBASE-5792 URL: https://issues.apache.org/jira/browse/HBASE-5792 Project: HBase Issue Type: Test Components: wal Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: performance, wal Fix For: 0.96.0 Attachments: HBASE-5792-v0.patch, HBASE-5792-v1.patch, HBASE-5792-v2.patch, verify.txt, verify.txt Related to HDFS-3280 and the HBase WAL slowdown on 0.23+ It would be nice to have a simple tool like HFilePerformanceEvaluation, ... to be able to check easily the HLog performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5788) Move Dynamic Metrics storage off of HRegion.
[ https://issues.apache.org/jira/browse/HBASE-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255645#comment-13255645 ] stack commented on HBASE-5788: -- bq. TestRegionServerMetrics covers most of the functionality of the new class but I can create a new set of more explicit tests if you think that is needed. Probably no need if we have some coverage already. Just want to make sure the class does its basic contract. Easier figuring this stuff in a unit test than up on a cluster, yadda, yadda, you know what I'm at. Move Dynamic Metrics storage off of HRegion. Key: HBASE-5788 URL: https://issues.apache.org/jira/browse/HBASE-5788 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5788-0.patch, HBASE-5788-1.patch, HBASE-5788-2.patch HRegion right now has the responsibility of storing static counts and latency numbers for use by the metrics package. Since these maps are incremented and set from lots of places it makes adding functionality hard. So move the metrics functionality into SchemaMetrics making it more than just a class for naming. The next step will be to simplify the api exposed so that using it will be easier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255667#comment-13255667 ] stack commented on HBASE-5620: -- @Jimmy So every invocation will throw an exception? {code} +// For protobuf protocols, ServiceException is expected {code} Whats the Set in Invocation doing? You add it but don't seem to access it? I like the removal of a call method down through the rpc stack Convert the client protocol of HRegionInterface to PB - Key: HBASE-5620 URL: https://issues.apache.org/jira/browse/HBASE-5620 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5620-sec.patch, hbase-5620_v3.patch, hbase-5620_v4.patch, hbase-5620_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255778#comment-13255778 ] stack commented on HBASE-5620: -- I made HBASE-5810 to apply this Jimmy. Good stuff. Convert the client protocol of HRegionInterface to PB - Key: HBASE-5620 URL: https://issues.apache.org/jira/browse/HBASE-5620 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5620-sec.patch, hbase-5620_v3.patch, hbase-5620_v4.patch, hbase-5620_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255877#comment-13255877 ] stack commented on HBASE-5782: -- Looking at Lars patch. On you 1. and 2. above, apparently the append is also expensive according to Dhruba. Just saying. Also on ...might lead to sync be issued multiple time when only one was necessary (it seems the same race condition existed before). Yes, this we have always had. I'd say kill this stuff... it looks like rubbish to me: {code} + syncBatchSize.addAndGet(doneUpto - this.syncedTillHere); {code} Its not read by anyone, looks like the math can go wonky, and when it is read, its set back to zero which is probably unexpected. Kill it I'd say. I think this is ok: {code} + this.syncedTillHere = Math.max(this.syncedTillHere, doneUpto); {code} but this is racy {code} long doneUpto = this.unflushedEntries.get(); {code} It could be low in number; i.e. we could be putting into hdfs more edits than the current value of unflushedEntries if we read after an edit has been added to the queue but before the above is updated. Is that ok? Its ok if this is a little sloppy especially if it under reports? On tactic for 0.94, sure on doing this for 0.94 though I like Todds fix better. The verification tool will help you figure if this slows stuff much and if we are writing out of order. Let me know if you want me to run it for you. Let me add in log rolling too as per Todd suggestion. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255885#comment-13255885 ] stack commented on HBASE-5782: -- Can we try and make Todd's work? It does some nice cleanup. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255937#comment-13255937 ] stack commented on HBASE-5782: -- bq. We won't write more into the log (once we take the pendingWrites they are gone Is that so? We don't get the pendingWrites until we are under the flush lock but we've taken doneUpTo before we go under the lock. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5812) Add log rolling to HLogPerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-5812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255946#comment-13255946 ] stack commented on HBASE-5812: -- Verify can deal w/ multiple logs and verify all logs were written in sequence id order. Add log rolling to HLogPerformanceEvaluation Key: HBASE-5812 URL: https://issues.apache.org/jira/browse/HBASE-5812 Project: HBase Issue Type: Task Reporter: stack Attachments: 5812.txt Add being able to ask that HLogPerformanceEvaluation rolls logs when its running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255959#comment-13255959 ] stack commented on HBASE-5782: -- @Lars But it own't matter right since the map we are getting from is not under our new flush lock? I think its harmless. We will undercount whats been flushed I believe; we'll not overcount (and so possible lose data)? I added log rolling and tested your patch using HLogPerformanceEvaluation. It 'works' at least. If you want me to compare before and after, just say. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256043#comment-13256043 ] stack commented on HBASE-5782: -- I made the hlog perf tool work on hdfs and ran some basic tests. Both Todd an Lars' patches seem faster than what we have currently. Running w/o a fix on hdfs w/ current trunk I have to disable verify because it fails (verify happens after we print out test timings). $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -conf /home/stack/hadoop-conf/core-site.xml -path hdfs://sv4r11s38:7000/tmp -threads 100 -roll 1 12/04/17 22:58:28 INFO wal.HLogPerformanceEvaluation: Summary: threads=100, iterations=1 took 100.630s 9937.395ops/s 12/04/17 23:00:33 INFO wal.HLogPerformanceEvaluation: Summary: threads=100, iterations=1 took 94.945s 10532.413ops/s Todd patch on hdfs: $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -conf /home/stack/hadoop-conf/core-site.xml -path hdfs://sv4r11s38:7000/tmp -threads 100 -roll 1 -verify 12/04/17 22:53:35 INFO wal.HLogPerformanceEvaluation: Summary: threads=100, iterations=1 took 81.202s 12314.967ops/s Lars patch: 12/04/17 23:07:08 INFO wal.HLogPerformanceEvaluation: Summary: threads=100, iterations=1 took 76.800s 13020.833ops/s For Todd and Lars, both pass verify which checks that seqids are ordered and that we wrote as much as we think we did. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256044#comment-13256044 ] stack commented on HBASE-5782: -- Ok on lars patch into 0.94. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256085#comment-13256085 ] stack commented on HBASE-5782: -- I tried to reproduce what JD is seeing on cluster using same sized keys and values but Lars' patch completes before Todds. My test run may be too small I did thread dumps during Lars and Todd runs. Both seem to be down in sync mostly, down here 'org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3789)' otherwise hung up on sync points around wal append/sync. Lets go w/ the Lars patch because minimal changes. As per Todd, lets file an issue to clean up this stuff with his patch as seed. From J-D work, any grease lightening we can apply around hlog append makes for a big difference in overall write throughput. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256087#comment-13256087 ] stack commented on HBASE-5782: -- @Lars As to your patch being 'slower' when fewer threads, I think you can't do such a compare. W/o your patch, we are broke. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5790) ZKUtil deleteRecursively should be a recoverable operation
[ https://issues.apache.org/jira/browse/HBASE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256177#comment-13256177 ] stack commented on HBASE-5790: -- This patch requires zk 3.4.x right but it doesn't check that version running before it goes and uses this new Transaction feature (I'm not sure if you even can ask zk its ensemble version from the client)? If a user puts 3.3.x under hbase, we'll hang doing this call? ZKUtil deleteRecursively should be a recoverable operation -- Key: HBASE-5790 URL: https://issues.apache.org/jira/browse/HBASE-5790 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Assignee: Jesse Yates Labels: zookeeper Fix For: 0.96.0, 0.94.1 Attachments: java_HBASE-5790-v1.patch, java_HBASE-5790.patch As of 3.4.3 Zookeeper now has full, multi-operation transaction. This means we can wholesale delete chunks of the zk tree and ensure that we don't have any pesky recursive delete issues where we delete the children of a node, but then a child joins before deletion of the parent. Even without transactions, this should be the behavior, but it is possible to make it much cleaner now that we have this new feature in zk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256185#comment-13256185 ] stack commented on HBASE-5782: -- Want me to make a test that does simple three threads with just a few edits ... say 1k... and then verifies all in order and all edits written so we notice regression? Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-lars-v2.txt, 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch, hbase-5782.txt Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5795) hbase-3927 breaks 0.92-0.94 compatibility
[ https://issues.apache.org/jira/browse/HBASE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254794#comment-13254794 ] stack commented on HBASE-5795: -- v2 works out on a cluster for me hbase-3927 breaks 0.92-0.94 compatibility --- Key: HBASE-5795 URL: https://issues.apache.org/jira/browse/HBASE-5795 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.94.0 Attachments: 5795-v2.txt, 5795.unittest.txt This commit broke our 0.92/0.94 compatibility: {code} r1136686 | stack | 2011-06-16 14:18:08 -0700 (Thu, 16 Jun 2011) | 1 line HBASE-3927 display total uncompressed byte size of a region in web UI {code} I just tried the new RC for 0.94. I brought up a 0.94 master on a 0.92 cluster and rather than just digest version 1 of the HServerLoad, I get this: {code} 2012-04-14 22:47:59,752 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.4.14.38 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:684) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1269) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1184) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:722) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:513) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:488) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: A record version mismatch occured. Expecting v2, found v1 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.HServerLoad$RegionLoad.readFields(HServerLoad.java:379) at org.apache.hadoop.hbase.HServerLoad.readFields(HServerLoad.java:686) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:681) ... 9 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5798) NPE running hbck on 0.94 out of reportTablesInFlux
[ https://issues.apache.org/jira/browse/HBASE-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254811#comment-13254811 ] stack commented on HBASE-5798: -- Error is transient. Subsequent runs worked. NPE running hbck on 0.94 out of reportTablesInFlux -- Key: HBASE-5798 URL: https://issues.apache.org/jira/browse/HBASE-5798 Project: HBase Issue Type: Bug Reporter: stack Got this playing w/ hbck going against the 0.94RC: {code} 12/04/16 17:03:14 INFO util.HBaseFsck: getHTableDescriptors == tableNames = [] Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:553) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:344) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:380) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3033) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5792) HLog Performance Evaluation Tool
[ https://issues.apache.org/jira/browse/HBASE-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254827#comment-13254827 ] stack commented on HBASE-5792: -- This is great Matteo. We need this. Yeah, agree, this tool will have most value if it puts nothing but a lone region (and WAL). Few minors below: Missing annotatations on audience. Do you need these? IIRC, the default exists w/ need of definition: {code} + public HLogPerformanceEvaluation() { + } {code} You do it in another place at least too. No harm adding a bit of class doc on HLogPutBenchmark You don't want to use a command parser? HLog Performance Evaluation Tool Key: HBASE-5792 URL: https://issues.apache.org/jira/browse/HBASE-5792 Project: HBase Issue Type: Test Components: wal Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: performance, wal Attachments: HBASE-5792-v0.patch, HBASE-5792-v1.patch Related to HDFS-3280 and the HBase WAL slowdown on 0.23+ It would be nice to have a simple tool like HFilePerformanceEvaluation, ... to be able to check easily the HLog performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5792) HLog Performance Evaluation Tool
[ https://issues.apache.org/jira/browse/HBASE-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254869#comment-13254869 ] stack commented on HBASE-5792: -- @Matteo NVM. I want to use this tool now so I'll take care of the above. Good stuff. HLog Performance Evaluation Tool Key: HBASE-5792 URL: https://issues.apache.org/jira/browse/HBASE-5792 Project: HBase Issue Type: Test Components: wal Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: performance, wal Attachments: HBASE-5792-v0.patch, HBASE-5792-v1.patch, HBASE-5792-v2.patch Related to HDFS-3280 and the HBase WAL slowdown on 0.23+ It would be nice to have a simple tool like HFilePerformanceEvaluation, ... to be able to check easily the HLog performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5795) hbase-3927 breaks 0.92-0.94 compatibility
[ https://issues.apache.org/jira/browse/HBASE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254928#comment-13254928 ] stack commented on HBASE-5795: -- No. Please include the unit test on commit. hbase-3927 breaks 0.92-0.94 compatibility --- Key: HBASE-5795 URL: https://issues.apache.org/jira/browse/HBASE-5795 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.94.0, 0.96.0 Attachments: 5795-v2.txt, 5795.unittest.txt This commit broke our 0.92/0.94 compatibility: {code} r1136686 | stack | 2011-06-16 14:18:08 -0700 (Thu, 16 Jun 2011) | 1 line HBASE-3927 display total uncompressed byte size of a region in web UI {code} I just tried the new RC for 0.94. I brought up a 0.94 master on a 0.92 cluster and rather than just digest version 1 of the HServerLoad, I get this: {code} 2012-04-14 22:47:59,752 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.4.14.38 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:684) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1269) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1184) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:722) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:513) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:488) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: A record version mismatch occured. Expecting v2, found v1 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.HServerLoad$RegionLoad.readFields(HServerLoad.java:379) at org.apache.hadoop.hbase.HServerLoad.readFields(HServerLoad.java:686) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:681) ... 9 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Not all the regions are getting assigned after the log splitting.
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255034#comment-13255034 ] stack commented on HBASE-5782: -- I just committed a tool over on HBASE-5792. It tests WALs. If you pass the -verify flag, you'll see that even w/ just three threads, sequence ids are out of order. Could be useful verifying whatever fix we have here. Not all the regions are getting assigned after the log splitting. - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5782.patch Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5634) document how to use uberhbck
[ https://issues.apache.org/jira/browse/HBASE-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255089#comment-13255089 ] stack commented on HBASE-5634: -- +1 Fix The using the -details option will report I'm glad you don't call it uberhbck in the doc (well, you joke about it -- thats ok) document how to use uberhbck Key: HBASE-5634 URL: https://issues.apache.org/jira/browse/HBASE-5634 Project: HBase Issue Type: Improvement Components: documentation, hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: uber hbck docs.pdf The updated hbck from HBASE-5128 introduces many new repair options and, as a side effect, offers many new opportunities to durably shoot oneself in the foot. Docs need to be written and added to the ref guide to explain its usage and ramifications and discuss repair strategies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255201#comment-13255201 ] stack commented on HBASE-5782: -- Not sure I follow but I do know this patch more ambitious than what I was at + You remove the 'other' sequence numbering system, unflushedEntries? That looks good. + Are asserts on by default? We disabled them a while back I believe? You run w/ asserts? (Yeah, thats a good thing to test -- should you use your guava test instead?) + Its ugly we call it hlogFlush but internal we do appends (thats not your change) + I agree that the reset of the the pending writes linked list needs to be done under the synchronization held by hlogFlush + I like how you do pushback of edits if we failin hlogFlush. + On this thing: {code} + // TODO: restore metric syncBatchSize.addAndGet(doneUpto - this.syncedTillHere); {code} Its not used anywhere and it the math looked dodgy... then when you read it it gets set to zero so I'm not so sure it is of any use. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-sketch.txt, 5782.txt, HBASE-5782.patch Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5802) Change the default metrics class to NullContextWithUpdateThread
[ https://issues.apache.org/jira/browse/HBASE-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255295#comment-13255295 ] stack commented on HBASE-5802: -- So, enable jmx emissions? Would suggest that your patch include comment explaining what the exotic-sounding NullContextWithUpdateThread does. Maybe copy the class comment into your patch somewhere: {code} * A null context which has a thread calling * periodically when monitoring is started. This keeps the data sampled * correctly. * In all other respects, this is like the NULL context: No data is emitted. * This is suitable for Monitoring systems like JMX which reads the metrics * when someone reads the data from JMX. * * The default impl of start and stop monitoring: * is the AbstractMetricsContext is good enough. {code} Maybe update the reference guide too especially if you are changing default. Good stuff E. Change the default metrics class to NullContextWithUpdateThread --- Key: HBASE-5802 URL: https://issues.apache.org/jira/browse/HBASE-5802 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5802-0.patch Since lots more metrics are being placed into the Dynamic metrics bucket changing the default class to NullContextWithUpdateThread seems like it might be worth it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3614) Expose per-region request rate metrics
[ https://issues.apache.org/jira/browse/HBASE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255304#comment-13255304 ] stack commented on HBASE-3614: -- FYI 100 chars per line max and space around operators (this won't fly: cfSetConsistent?cfSet:null) I like how you are removing metrics stuff from HRegion out to a region scoped metrics class. '+public class RegionMetrics {' needs a class comment saying what its all about. Does the class need to be public? Can it be scoped to this package only? Collect all the data members at the top of the class. Thats whats usually done in this code base. So put the tablename etc. in RegionMetric before the constructor etc. rather than after. Does this need to be public generateRegionMetricsPrefix? What do these new metrics look like? Is this all it takes to expose them? Some regionnames are going to be really long. Should you use the region encoded name instead of the full name? Do you think we even need the table name as prefix? Good stuff Elliott. Expose per-region request rate metrics -- Key: HBASE-3614 URL: https://issues.apache.org/jira/browse/HBASE-3614 Project: HBase Issue Type: Improvement Components: metrics, regionserver Reporter: Gary Helmling Assignee: Elliott Clark Priority: Minor Attachments: HBASE-3614-0.patch, HBASE-3614-1.patch We currently export metrics on request rates for each region server, and this can help with identifying uneven load at a high level. But once you see a given server under high load, you're forced to extrapolate based on your application patterns and the data it's serving what the likely culprit is. This can and should be much easier if we just exported request rate metrics per-region on each server. Dynamically updating the metrics keys based on assigned regions may pose some minor challenges, but this seems a very valuable diagnostic tool to have available. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5733) AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE.
[ https://issues.apache.org/jira/browse/HBASE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255313#comment-13255313 ] stack commented on HBASE-5733: -- Patch looks good to me. I like the test. The LOG.fatal is redundant. The master abort does a log fatal. Else patch is good. AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE. - Key: HBASE-5733 URL: https://issues.apache.org/jira/browse/HBASE-5733 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HBASE-5733.patch, HBASE-5733.patch Found while going through the code... AssignmentManager#processDeadServersAndRegionsInTransition can fail with NPE as this is directly iterating the nodes from listChildrenAndWatchForNewChildren with-out checking for null. Here also we need to handle with null check like other places. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5788) Move Dynamic Metrics storage off of HRegion.
[ https://issues.apache.org/jira/browse/HBASE-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255323#comment-13255323 ] stack commented on HBASE-5788: -- Is MetricsStorage for Region metrics only? If so, call it RegionMetrics? Or maybe its generic metrics storage for this package? If so, the name is right. Should it be down in the metrics package? Regardless, new class needs class comment explaining class scope. Does it have to public? Can it be private to the package at least? Lines 100 chars. Why are data members in this new class public rather than private? Even if they are static. And static data members probably ain't a good idea because then there is one only per JVM and there can be many regionservers in the one JVM; e.g. in testing. Yeah, do its method names need to be public? Can these be package private? Hmm... maybe they need to be public because called from the metrics subpackage? I like all the code that comes out of HRegion. Thats good. And no harm in a basic unit test that your new class is basically working. Any worries w/ concurrent access? Good stuff Elliott. Move Dynamic Metrics storage off of HRegion. Key: HBASE-5788 URL: https://issues.apache.org/jira/browse/HBASE-5788 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5788-0.patch, HBASE-5788-1.patch, HBASE-5788-2.patch HRegion right now has the responsibility of storing static counts and latency numbers for use by the metrics package. Since these maps are incremented and set from lots of places it makes adding functionality hard. So move the metrics functionality into SchemaMetrics making it more than just a class for naming. The next step will be to simplify the api exposed so that using it will be easier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3585) isLegalFamilyName() can throw ArrayOutOfBoundException
[ https://issues.apache.org/jira/browse/HBASE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255325#comment-13255325 ] stack commented on HBASE-3585: -- Have a patch Uma? isLegalFamilyName() can throw ArrayOutOfBoundException -- Key: HBASE-3585 URL: https://issues.apache.org/jira/browse/HBASE-3585 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.1 Reporter: Prakash Khemani Priority: Minor org.apache.hadoop.hbase.HColumnDescriptor.isLegalFamilyName(byte[]) accesses byte[0] w/o first checking the array length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5782) Edits can be appended out of seqid order since HBASE-4487
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255344#comment-13255344 ] stack commented on HBASE-5782: -- @Ram Read over the HLog comments. Its got stuff on why we want sequenceids in order and where we have dependency on their being ordered, at least they are notes on how we used to think. I was wondering too about ordering today. If we didn't have to have order, then it would make stuff like running a regionserver with N WALs a bit easier, and we don't try to guarantee sequence order when replicating. But I'm wary undoing order though without our giving the issue a bunch of thought first (Your patch above makes me nervous). On the patch, Todds' seems way superior to me. His is more radical, removing what seems to be a confusing sequenceid double, and its more clear whats going on. Oh, and thanks to you fellas for finding this one. Its a good one. Edits can be appended out of seqid order since HBASE-4487 - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: 5782-sketch.txt, 5782.txt, 5782.unfinished-stack.txt, HBASE-5782.patch Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5795) hbase-3927 breaks 0.92-0.94 compatibility
[ https://issues.apache.org/jira/browse/HBASE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254441#comment-13254441 ] stack commented on HBASE-5795: -- I looked at Ted's patch. That should do it. See if it makes the unit test pass I'd say. I can test on cluster tomorrow morning (will also finish my rolling restart and kill of meta on a cluster w/ 1k regions too...) hbase-3927 breaks 0.92-0.94 compatibility --- Key: HBASE-5795 URL: https://issues.apache.org/jira/browse/HBASE-5795 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.94.0 Attachments: 5795-v1.txt, 5795.unittest.txt This commit broke our 0.92/0.94 compatibility: {code} r1136686 | stack | 2011-06-16 14:18:08 -0700 (Thu, 16 Jun 2011) | 1 line HBASE-3927 display total uncompressed byte size of a region in web UI {code} I just tried the new RC for 0.94. I brought up a 0.94 master on a 0.92 cluster and rather than just digest version 1 of the HServerLoad, I get this: {code} 2012-04-14 22:47:59,752 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.4.14.38 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:684) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1269) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1184) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:722) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:513) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:488) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: A record version mismatch occured. Expecting v2, found v1 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.HServerLoad$RegionLoad.readFields(HServerLoad.java:379) at org.apache.hadoop.hbase.HServerLoad.readFields(HServerLoad.java:686) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:681) ... 9 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5747) Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
[ https://issues.apache.org/jira/browse/HBASE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254520#comment-13254520 ] stack commented on HBASE-5747: -- But I didn't change anything! Does that mean Jon fixed it w/ his hbck commit? @Jon Let me ask @Mikhail why he went to 100 retries... Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test Key: HBASE-5747 URL: https://issues.apache.org/jira/browse/HBASE-5747 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 5474.txt, 5474v2.txt, 5474v3 (1).txt, 5474v3.txt, 5708v4.txt, 5708v4.txt Forward port as much as we can of Mikhail's hard-won test cleanups over on 0.89 branch Will improve our being able to run unit tests in //. He also found a few bugs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5747) Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
[ https://issues.apache.org/jira/browse/HBASE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254213#comment-13254213 ] stack commented on HBASE-5747: -- @Jon I can put it back. I pulled in that from original patch. Let me try setting it back. See if that helps w/ test hangs. I ran TestSchemaMetrics locally and it runs fine. It also does not seem to be responsible for the test 'timeouts' that are subsequent to 2757. Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test Key: HBASE-5747 URL: https://issues.apache.org/jira/browse/HBASE-5747 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 5474.txt, 5474v2.txt, 5474v3 (1).txt, 5474v3.txt, 5708v4.txt, 5708v4.txt Forward port as much as we can of Mikhail's hard-won test cleanups over on 0.89 branch Will improve our being able to run unit tests in //. He also found a few bugs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5795) hbase-3927 breaks 0.92-0.94 compatibility
[ https://issues.apache.org/jira/browse/HBASE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254224#comment-13254224 ] stack commented on HBASE-5795: -- Hmm... Its not hbase-3927 that broke compatibility, it seems rather to be this one that changes the RegionLoad VERSION: {code} r1238873 | tedyu | 2012-01-31 16:12:36 -0800 (Tue, 31 Jan 2012) | 2 lines HBASE-5256 Use WritableUtils.readVInt() in RegionLoad.readFields() (Mubarak) {code} Looking at the patch, it breaks compatibility in a pretty radical way changing ints to vints on all RegionLoad members. hbase-3927 breaks 0.92-0.94 compatibility --- Key: HBASE-5795 URL: https://issues.apache.org/jira/browse/HBASE-5795 Project: HBase Issue Type: Bug Reporter: stack This commit broke our 0.92/0.94 compatibility: {code} r1136686 | stack | 2011-06-16 14:18:08 -0700 (Thu, 16 Jun 2011) | 1 line HBASE-3927 display total uncompressed byte size of a region in web UI {code} I just tried the new RC for 0.94. I brought up a 0.94 master on a 0.92 cluster and rather than just digest version 1 of the HServerLoad, I get this: {code} 2012-04-14 22:47:59,752 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.4.14.38 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:684) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1269) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1184) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:722) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:513) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:488) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: A record version mismatch occured. Expecting v2, found v1 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.HServerLoad$RegionLoad.readFields(HServerLoad.java:379) at org.apache.hadoop.hbase.HServerLoad.readFields(HServerLoad.java:686) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:681) ... 9 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5795) hbase-3927 breaks 0.92-0.94 compatibility
[ https://issues.apache.org/jira/browse/HBASE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254227#comment-13254227 ] stack commented on HBASE-5795: -- I'd suggest backing out HBASE-5256. Its a little weird in that it ups the VERSION on the inner class but not on the outer class. Its not a critical fix either so we could probably do w/o it in 0.94. Let me try removing it. hbase-3927 breaks 0.92-0.94 compatibility --- Key: HBASE-5795 URL: https://issues.apache.org/jira/browse/HBASE-5795 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack This commit broke our 0.92/0.94 compatibility: {code} r1136686 | stack | 2011-06-16 14:18:08 -0700 (Thu, 16 Jun 2011) | 1 line HBASE-3927 display total uncompressed byte size of a region in web UI {code} I just tried the new RC for 0.94. I brought up a 0.94 master on a 0.92 cluster and rather than just digest version 1 of the HServerLoad, I get this: {code} 2012-04-14 22:47:59,752 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.4.14.38 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:684) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1269) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1184) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:722) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:513) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:488) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: A record version mismatch occured. Expecting v2, found v1 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.HServerLoad$RegionLoad.readFields(HServerLoad.java:379) at org.apache.hadoop.hbase.HServerLoad.readFields(HServerLoad.java:686) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:681) ... 9 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5795) hbase-3927 breaks 0.92-0.94 compatibility
[ https://issues.apache.org/jira/browse/HBASE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254229#comment-13254229 ] stack commented on HBASE-5795: -- Hmmm... not that easy. This one messes us up too... {code} r1239157 | tedyu | 2012-02-01 06:56:20 -0800 (Wed, 01 Feb 2012) | 2 lines HBASE-5283 Request counters may become negative for heavily loaded regions (Mubarak) {code} The above commit depends on hbase-5256. If hbase-5256 were not in place, this would not break compatibility but since we have to back out hbase-5256, it does. Looking.. hbase-3927 breaks 0.92-0.94 compatibility --- Key: HBASE-5795 URL: https://issues.apache.org/jira/browse/HBASE-5795 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack This commit broke our 0.92/0.94 compatibility: {code} r1136686 | stack | 2011-06-16 14:18:08 -0700 (Thu, 16 Jun 2011) | 1 line HBASE-3927 display total uncompressed byte size of a region in web UI {code} I just tried the new RC for 0.94. I brought up a 0.94 master on a 0.92 cluster and rather than just digest version 1 of the HServerLoad, I get this: {code} 2012-04-14 22:47:59,752 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.4.14.38 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:684) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1269) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1184) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:722) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:513) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:488) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: A record version mismatch occured. Expecting v2, found v1 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.HServerLoad$RegionLoad.readFields(HServerLoad.java:379) at org.apache.hadoop.hbase.HServerLoad.readFields(HServerLoad.java:686) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:681) ... 9 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5778) Turn on WAL compression by default
[ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253454#comment-13253454 ] stack commented on HBASE-5778: -- I backed it out of 0.94 and trunk. Turn on WAL compression by default -- Key: HBASE-5778 URL: https://issues.apache.org/jira/browse/HBASE-5778 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch I ran some tests to verify if WAL compression should be turned on by default. For a use case where it's not very useful (values two order of magnitude bigger than the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS 130% when not compressing the WAL). When values are smaller than the keys, I saw a 38% improvement for the insert run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts for all the additional CPU usage, it might just be that we're able to insert faster and we spend more time in the MemStore per second (because our MemStores are bad when they contain tens of thousands of values). Those are two extremes, but it shows that for the price of some CPU we can save a lot. My machines have 2 quads with HT, so I still had a lot of idle CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5784) Enable mvn deploy of website
[ https://issues.apache.org/jira/browse/HBASE-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253691#comment-13253691 ] stack commented on HBASE-5784: -- Committed to trunk Enable mvn deploy of website Key: HBASE-5784 URL: https://issues.apache.org/jira/browse/HBASE-5784 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.96.0 Attachments: 5784.txt Up to this, deploy of website has been build local and then copy up to apache and put it into place under /www/hbase.apache.org. Change it so can have maven deploy the site. The good thing about having the latter do it is that its regular; permissions will always be the same so Doug and I won't be fighting each other when we stick stuff up there. Also, its a one step process rather than multiple. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253708#comment-13253708 ] stack commented on HBASE-5620: -- It passed for me. Let me commit this monster. Convert the client protocol of HRegionInterface to PB - Key: HBASE-5620 URL: https://issues.apache.org/jira/browse/HBASE-5620 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, hbase-5620_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253712#comment-13253712 ] stack commented on HBASE-5620: -- I think src/main/java/org/apache/hadoop/hbase/protobuf/ClientProtocol.java is in wrong package. Ditto for AdminProtocol. What you think Jimmy? Should we move them? Where should they go? At top level? Or into client package? Convert the client protocol of HRegionInterface to PB - Key: HBASE-5620 URL: https://issues.apache.org/jira/browse/HBASE-5620 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, hbase-5620_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253714#comment-13253714 ] stack commented on HBASE-5620: -- Mind opening new issues Jimmy to do outstanding work like unit tests? Convert the client protocol of HRegionInterface to PB - Key: HBASE-5620 URL: https://issues.apache.org/jira/browse/HBASE-5620 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, hbase-5620_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4336) Convert source tree into maven modules
[ https://issues.apache.org/jira/browse/HBASE-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253734#comment-13253734 ] stack commented on HBASE-4336: -- +1 Tell us more about the issue Jesse. When I do mvn compile on a project of many modules, its fine except for the case where tests depend on the product of an earlier module? Convert source tree into maven modules -- Key: HBASE-4336 URL: https://issues.apache.org/jira/browse/HBASE-4336 Project: HBase Issue Type: Task Components: build Reporter: Gary Helmling Priority: Critical Fix For: 0.96.0 When we originally converted the build to maven we had a single core module defined, but later reverted this to a module-less build for the sake of simplicity. It now looks like it's time to re-address this, as we have an actual need for modules to: * provide a trimmed down client library that applications can make use of * more cleanly support building against different versions of Hadoop, in place of some of the reflection machinations currently required * incorporate the secure RPC engine that depends on some secure Hadoop classes I propose we start simply by refactoring into two initial modules: * core - common classes and utilities, and client-side code and interfaces * server - master and region server implementations and supporting code This would also lay the groundwork for incorporating the HBase security features that have been developed. Once the module structure is in place, security-related features could then be incorporated into a third module -- security -- after normal review and approval. The security module could then depend on secure Hadoop, without modifying the dependencies of the rest of the HBase code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5604) M/R tool to replay WAL files
[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253778#comment-13253778 ] stack commented on HBASE-5604: -- Remove the Date stuff. Just do basic ms. M/R tool to replay WAL files Key: HBASE-5604 URL: https://issues.apache.org/jira/browse/HBASE-5604 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0, 0.96.0 Attachments: 5604-v10.txt, 5604-v11.txt, 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 5604-v9.txt, HLog-5604-v3.txt Just an idea I had. Might be useful for restore of a backup using the HLogs. This could an M/R (with a mapper per HLog file). The tool would get a timerange and a (set of) table(s). We'd pick the right HLogs based on time before the M/R job is started and then have a mapper per HLog file. The mapper would then go through the HLog, filter all WALEdits that didn't fit into the time range or are not any of the tables and then uses HFileOutputFormat to generate HFiles. Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5747) Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
[ https://issues.apache.org/jira/browse/HBASE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253832#comment-13253832 ] stack commented on HBASE-5747: -- TestWALPlayer is not because of this test and TestServerCustomProtocol passes locally. Going to commit this v4. Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test Key: HBASE-5747 URL: https://issues.apache.org/jira/browse/HBASE-5747 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Blocker Attachments: 5474.txt, 5474v2.txt, 5474v3 (1).txt, 5474v3.txt, 5708v4.txt, 5708v4.txt Forward port as much as we can of Mikhail's hard-won test cleanups over on 0.89 branch Will improve our being able to run unit tests in //. He also found a few bugs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253843#comment-13253843 ] stack commented on HBASE-5620: -- I did too and trunk is also no longer complaining. The rat test is a PITA. There was probably some deitrus laying around that it picked up. I modified the trunk build to keep the rat.txt report next time. @Jimmy I think top-level is better than where it currently is. What other Protocols would go up to the top level? None I suppose. I suppose they should be in client package but its a little perverse having the the client stuff reaching into zk and util and protobuf... Convert the client protocol of HRegionInterface to PB - Key: HBASE-5620 URL: https://issues.apache.org/jira/browse/HBASE-5620 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, hbase-5620_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5747) Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
[ https://issues.apache.org/jira/browse/HBASE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253894#comment-13253894 ] stack commented on HBASE-5747: -- @Mikhail I think it fair in cases like this where a bunch of the code base is touched that us frontier folk more familiar w/ trunk pitch in. We probably know more whats portable and what to drop. Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test Key: HBASE-5747 URL: https://issues.apache.org/jira/browse/HBASE-5747 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 5474.txt, 5474v2.txt, 5474v3 (1).txt, 5474v3.txt, 5708v4.txt, 5708v4.txt Forward port as much as we can of Mikhail's hard-won test cleanups over on 0.89 branch Will improve our being able to run unit tests in //. He also found a few bugs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253974#comment-13253974 ] stack commented on HBASE-5620: -- @Jimmy I think this is the biggest patch ever applied to HBase. Congrats! Convert the client protocol of HRegionInterface to PB - Key: HBASE-5620 URL: https://issues.apache.org/jira/browse/HBASE-5620 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, hbase-5620_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253975#comment-13253975 ] stack commented on HBASE-5620: -- @Jimmy I think this is the biggest patch ever applied to HBase. Congrats! Convert the client protocol of HRegionInterface to PB - Key: HBASE-5620 URL: https://issues.apache.org/jira/browse/HBASE-5620 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, hbase-5620_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4336) Convert source tree into maven modules
[ https://issues.apache.org/jira/browse/HBASE-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253985#comment-13253985 ] stack commented on HBASE-4336: -- So, running test in a module scope, you cannot have dependencies outside of the module (You can depend on 3rd party jars but not ones made by this maven build -- or is it just test stuff? Could security depend on hbase-common.jar in its tests?) Convert source tree into maven modules -- Key: HBASE-4336 URL: https://issues.apache.org/jira/browse/HBASE-4336 Project: HBase Issue Type: Task Components: build Reporter: Gary Helmling Priority: Critical Fix For: 0.96.0 When we originally converted the build to maven we had a single core module defined, but later reverted this to a module-less build for the sake of simplicity. It now looks like it's time to re-address this, as we have an actual need for modules to: * provide a trimmed down client library that applications can make use of * more cleanly support building against different versions of Hadoop, in place of some of the reflection machinations currently required * incorporate the secure RPC engine that depends on some secure Hadoop classes I propose we start simply by refactoring into two initial modules: * core - common classes and utilities, and client-side code and interfaces * server - master and region server implementations and supporting code This would also lay the groundwork for incorporating the HBase security features that have been developed. Once the module structure is in place, security-related features could then be incorporated into a third module -- security -- after normal review and approval. The security module could then depend on secure Hadoop, without modifying the dependencies of the rest of the HBase code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5747) Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
[ https://issues.apache.org/jira/browse/HBASE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252228#comment-13252228 ] stack commented on HBASE-5747: -- Not sure why tests are not completing. Running on a mac I see problem in this test: {code} Running org.apache.hadoop.hbase.zookeeper.TestZKLeaderManager Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.415 sec FAILURE! Results : Failed tests: testLeaderSelection(org.apache.hadoop.hbase.zookeeper.TestZKLeaderManager): New leader should exist {code} Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test Key: HBASE-5747 URL: https://issues.apache.org/jira/browse/HBASE-5747 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Blocker Attachments: 5474.txt, 5474v2.txt, 5474v3 (1).txt, 5474v3.txt Forward port as much as we can of Mikhail's hard-won test cleanups over on 0.89 branch Will improve our being able to run unit tests in //. He also found a few bugs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252504#comment-13252504 ] stack commented on HBASE-5754: -- Let me do the same. I did not match generator map tasks to verify reducers. Then let me recreate the split issue Eric describes above. Thanks lads. data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5773) HtablePool constructor not reading config files in certain cases
[ https://issues.apache.org/jira/browse/HBASE-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252586#comment-13252586 ] stack commented on HBASE-5773: -- It doesn't apply to 0.90 branch. HtablePool constructor not reading config files in certain cases Key: HBASE-5773 URL: https://issues.apache.org/jira/browse/HBASE-5773 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.1 Reporter: Ioan Eugen Stan Priority: Minor Fix For: 0.92.2, 0.94.0 Attachments: different-config-behaviour.patch Creating a HtablePool can issue two behaviour depanding on the constructor called. Case 1: loads the configs from hbase-site public HTablePool() { this(HBaseConfiguration.create(), Integer.MAX_VALUE); } Calling this with null values for Configuration: public HTablePool(final Configuration config, final int maxSize) { this(config, maxSize, null, null); } will issue: public HTablePool(final Configuration config, final int maxSize, final HTableInterfaceFactory tableFactory, PoolType poolType) { // Make a new configuration instance so I can safely cleanup when // done with the pool. this.config = config == null ? new Configuration() : config; which does not read the hbase-site config files as HBaseConfiguration.create() does. I've tracked this problem to all versions of hbase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3443) ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix
[ https://issues.apache.org/jira/browse/HBASE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252709#comment-13252709 ] stack commented on HBASE-3443: -- 0.94? ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix -- Key: HBASE-3443 URL: https://issues.apache.org/jira/browse/HBASE-3443 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3, 0.90.4, 0.90.5, 0.90.6, 0.92.0, 0.92.1 Reporter: Kannan Muthukkaruppan Assignee: Lars Hofhansl Priority: Critical Labels: corruption Fix For: 0.96.0 Attachments: 3443.txt For incrementColumnValue() HBASE-3082 adds an optimization to check memstores first, and only if not present in the memstore then check the store files. In the presence of deletes, the above optimization is not reliable. If the column is marked as deleted in the memstore, one should not look further into the store files. But currently, the code does so. Sample test code outline: {code} admin.createTable(desc) table = HTable.new(conf, tableName) table.incrementColumnValue(Bytes.toBytes(row), cf1name, Bytes.toBytes(column), 5); admin.flush(tableName) sleep(2) del = Delete.new(Bytes.toBytes(row)) table.delete(del) table.incrementColumnValue(Bytes.toBytes(row), cf1name, Bytes.toBytes(column), 5); get = Get.new(Bytes.toBytes(row)) keyValues = table.get(get).raw() keyValues.each do |keyValue| puts Expect 5; Got Value=#{Bytes.toLong(keyValue.getValue())}; end {code} The above prints: {code} Expect 5; Got Value=10 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3443) ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix
[ https://issues.apache.org/jira/browse/HBASE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252744#comment-13252744 ] stack commented on HBASE-3443: -- Oh, and if you don't fix it, you'll have to explain why you didn't to BenoƮt. ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix -- Key: HBASE-3443 URL: https://issues.apache.org/jira/browse/HBASE-3443 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3, 0.90.4, 0.90.5, 0.90.6, 0.92.0, 0.92.1 Reporter: Kannan Muthukkaruppan Assignee: Lars Hofhansl Priority: Critical Labels: corruption Fix For: 0.96.0 Attachments: 3443.txt For incrementColumnValue() HBASE-3082 adds an optimization to check memstores first, and only if not present in the memstore then check the store files. In the presence of deletes, the above optimization is not reliable. If the column is marked as deleted in the memstore, one should not look further into the store files. But currently, the code does so. Sample test code outline: {code} admin.createTable(desc) table = HTable.new(conf, tableName) table.incrementColumnValue(Bytes.toBytes(row), cf1name, Bytes.toBytes(column), 5); admin.flush(tableName) sleep(2) del = Delete.new(Bytes.toBytes(row)) table.delete(del) table.incrementColumnValue(Bytes.toBytes(row), cf1name, Bytes.toBytes(column), 5); get = Get.new(Bytes.toBytes(row)) keyValues = table.get(get).raw() keyValues.each do |keyValue| puts Expect 5; Got Value=#{Bytes.toLong(keyValue.getValue())}; end {code} The above prints: {code} Expect 5; Got Value=10 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5777) MiniHBaseCluster cannot start multiple region servers
[ https://issues.apache.org/jira/browse/HBASE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252881#comment-13252881 ] stack commented on HBASE-5777: -- We have an hbase-site.xml at src/test that is used when we run tests. It disables the UI. You think we should apply this patch too Jimmy? MiniHBaseCluster cannot start multiple region servers - Key: HBASE-5777 URL: https://issues.apache.org/jira/browse/HBASE-5777 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-5777.patch MiniHBaseCluster can try to start multiple region servers. But all of them except one will die in putting up the web UI because of BindException since HConstants.REGIONSERVER_INFO_PORT_AUTO is set to false by default. This issue will make many unit tests depending on multiple region servers flaky, such as TestAdmin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5778) Turn on WAL compression by default
[ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252947#comment-13252947 ] stack commented on HBASE-5778: -- +1 Add release note w/ how to turn it off Turn on WAL compression by default -- Key: HBASE-5778 URL: https://issues.apache.org/jira/browse/HBASE-5778 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5778.patch I ran some tests to verify if WAL compression should be turned on by default. For a use case where it's not very useful (values two order of magnitude bigger than the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS 130% when not compressing the WAL). When values are smaller than the keys, I saw a 38% improvement for the insert run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts for all the additional CPU usage, it might just be that we're able to insert faster and we spend more time in the MemStore per second (because our MemStores are bad when they contain tens of thousands of values). Those are two extremes, but it shows that for the price of some CPU we can save a lot. My machines have 2 quads with HT, so I still had a lot of idle CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253135#comment-13253135 ] stack commented on HBASE-5754: -- I ran w/ 10 generators and 10 slots for the verify step and got the below which doesn't prints out only a REFERENCED count. Running these recent tests I let it do its natural splitting so it grew from zero to 260odd regions so maybe the issue you see Eric comes of manual splits coming out of the UI. Let me try that next. Thanks lads. {code} 12/04/13 05:16:23 INFO mapred.JobClient: map 100% reduce 99% 12/04/13 05:16:54 INFO mapred.JobClient: map 100% reduce 100% 12/04/13 05:16:59 INFO mapred.JobClient: Job complete: job_201204092039_0046 12/04/13 05:16:59 INFO mapred.JobClient: Counters: 30 12/04/13 05:16:59 INFO mapred.JobClient: Job Counters 12/04/13 05:16:59 INFO mapred.JobClient: Launched reduce tasks=10 12/04/13 05:16:59 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=30125694 12/04/13 05:16:59 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/04/13 05:16:59 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/04/13 05:16:59 INFO mapred.JobClient: Rack-local map tasks=6 12/04/13 05:16:59 INFO mapred.JobClient: Launched map tasks=256 12/04/13 05:16:59 INFO mapred.JobClient: Data-local map tasks=250 12/04/13 05:16:59 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5832198 12/04/13 05:16:59 INFO mapred.JobClient: goraci.Verify$Counts 12/04/13 05:16:59 INFO mapred.JobClient: REFERENCED=10 12/04/13 05:16:59 INFO mapred.JobClient: File Output Format Counters 12/04/13 05:16:59 INFO mapred.JobClient: Bytes Written=0 12/04/13 05:16:59 INFO mapred.JobClient: FileSystemCounters 12/04/13 05:16:59 INFO mapred.JobClient: FILE_BYTES_READ=83022967343 12/04/13 05:16:59 INFO mapred.JobClient: HDFS_BYTES_READ=156414 12/04/13 05:16:59 INFO mapred.JobClient: FILE_BYTES_WRITTEN=112881560332 12/04/13 05:16:59 INFO mapred.JobClient: File Input Format Counters 12/04/13 05:16:59 INFO mapred.JobClient: Bytes Read=0 12/04/13 05:16:59 INFO mapred.JobClient: Map-Reduce Framework 12/04/13 05:16:59 INFO mapred.JobClient: Map output materialized bytes=29992170602 12/04/13 05:16:59 INFO mapred.JobClient: Map input records=10 12/04/13 05:16:59 INFO mapred.JobClient: Reduce shuffle bytes=29874879887 12/04/13 05:16:59 INFO mapred.JobClient: Spilled Records=7527086436 12/04/13 05:16:59 INFO mapred.JobClient: Map output bytes=25992155242 12/04/13 05:16:59 INFO mapred.JobClient: CPU time spent (ms)=20182570 12/04/13 05:16:59 INFO mapred.JobClient: Total committed heap usage (bytes)=99953082368 12/04/13 05:16:59 INFO mapred.JobClient: Combine input records=0 12/04/13 05:16:59 INFO mapred.JobClient: SPLIT_RAW_BYTES=156414 12/04/13 05:16:59 INFO mapred.JobClient: Reduce input records=20 12/04/13 05:16:59 INFO mapred.JobClient: Reduce input groups=10 12/04/13 05:16:59 INFO mapred.JobClient: Combine output records=0 12/04/13 05:16:59 INFO mapred.JobClient: Physical memory (bytes) snapshot=91762372608 12/04/13 05:16:59 INFO mapred.JobClient: Reduce output records=0 12/04/13 05:16:59 INFO mapred.JobClient: Virtual memory (bytes) snapshot=391126540288 12/04/13 05:16:59 INFO mapred.JobClient: Map output records=20 {code} data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was
[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.
[ https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251950#comment-13251950 ] stack commented on HBASE-5756: -- Default is DRFA in 0.94 and before. RFA after (0.96) we can change defalult File Appender to RFA instead of DRFA. Key: HBASE-5756 URL: https://issues.apache.org/jira/browse/HBASE-5756 Project: HBase Issue Type: Bug Reporter: rohithsharma Priority: Minor This can be a point of concern when on a certain day the logging happens more because of more and more activity. In that case the log file for that day can grow huge. These logs can not be opened for analysis since size is more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252207#comment-13252207 ] stack commented on HBASE-5737: -- Ram, the AM#setBalancer is not right? Doesn't AM make a balancer instance of its own up in its constructor? We should at least remove that. Could we pass in the load balancer to use into the AM's constructor rather than call a setBalancer method? Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4109) Hostname returned via reverse dns lookup contains trailing period if configured interface is not default
[ https://issues.apache.org/jira/browse/HBASE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250729#comment-13250729 ] stack commented on HBASE-4109: -- @Adrian Forward port is over in hbase-5758. I will commit later today. Hostname returned via reverse dns lookup contains trailing period if configured interface is not default -- Key: HBASE-4109 URL: https://issues.apache.org/jira/browse/HBASE-4109 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.90.3 Reporter: Shrijeet Paliwal Assignee: Shrijeet Paliwal Fix For: 0.90.4 Attachments: 0001-HBASE-4109-Sanitize-hostname-returned-from-DNS-class.patch If you are using an interface anything other than 'default' (literally that keyword) DNS.java 's getDefaultHost will return a string which will have a trailing period at the end. It seems javadoc of reverseDns in DNS.java (see below) is conflicting with what that function is actually doing. It is returning a PTR record while claims it returns a hostname. The PTR record always has period at the end , RFC: http://irbs.net/bog-4.9.5/bog47.html We make call to DNS.getDefaultHost at more than one places and treat that as actual hostname. Quoting HRegionServer for example {code} String machineName = DNS.getDefaultHost(conf.get( hbase.regionserver.dns.interface, default), conf.get( hbase.regionserver.dns.nameserver, default)); {code} This causes inconsistencies. An example of such inconsistency was observed while debugging the issue Regions not getting reassigned if RS is brought down. More here http://search-hadoop.com/m/CANUA1qRCkQ1 We may want to sanitize the string returned from DNS class. Or better we can take a path of overhauling the way we do DNS name matching all over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5728) Methods Missing in HTableInterface
[ https://issues.apache.org/jira/browse/HBASE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250740#comment-13250740 ] stack commented on HBASE-5728: -- @Bing Yes. If you are up for it. @Lars Thanks for doing the research. Methods Missing in HTableInterface -- Key: HBASE-5728 URL: https://issues.apache.org/jira/browse/HBASE-5728 Project: HBase Issue Type: Improvement Components: client Reporter: Bing Li Dear all, I found some methods existed in HTable were not in HTableInterface. setAutoFlush setWriteBufferSize ... In most cases, I manipulate HBase through HTableInterface from HTablePool. If I need to use the above methods, how to do that? I am considering writing my own table pool if no proper ways. Is it fine? Thanks so much! Best regards, Bing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.
[ https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250796#comment-13250796 ] stack commented on HBASE-5756: -- Its not clear what you are asking for Rohit. Does this recent commit to TRUNK give you what you want? HBASE-5655 we can change defalult File Appender to RFA instead of DRFA. Key: HBASE-5756 URL: https://issues.apache.org/jira/browse/HBASE-5756 Project: HBase Issue Type: Bug Reporter: rohithsharma Priority: Minor This can be a point of concern when on a certain day the logging happens more because of more and more activity. In that case the log file for that day can grow huge. These logs can not be opened for analysis since size is more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4336) Convert source tree into maven modules
[ https://issues.apache.org/jira/browse/HBASE-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250808#comment-13250808 ] stack commented on HBASE-4336: -- Can we just have hbase-common? No hbase-core. No hbase-security (looks like security might be getting smashed into hbase-common). Why do we need hbase-assemble? It takes all that has gone before to package? Is this a common pattern? What about the profiles we currently have? Like -Phadoop 0.23. Will those go away? Thanks for doing this Jesse. I think we should commit the refactor as long as its basically working. We can fine tune later as we go. Convert source tree into maven modules -- Key: HBASE-4336 URL: https://issues.apache.org/jira/browse/HBASE-4336 Project: HBase Issue Type: Task Components: build Reporter: Gary Helmling Priority: Critical Fix For: 0.96.0 When we originally converted the build to maven we had a single core module defined, but later reverted this to a module-less build for the sake of simplicity. It now looks like it's time to re-address this, as we have an actual need for modules to: * provide a trimmed down client library that applications can make use of * more cleanly support building against different versions of Hadoop, in place of some of the reflection machinations currently required * incorporate the secure RPC engine that depends on some secure Hadoop classes I propose we start simply by refactoring into two initial modules: * core - common classes and utilities, and client-side code and interfaces * server - master and region server implementations and supporting code This would also lay the groundwork for incorporating the HBase security features that have been developed. Once the module structure is in place, security-related features could then be incorporated into a third module -- security -- after normal review and approval. The security module could then depend on secure Hadoop, without modifying the dependencies of the rest of the HBase code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira