[jira] [Created] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.
If getTaskList() returns null splitlogWorker is down. It wont serve any requests. -- Key: HBASE-5635 URL: https://issues.apache.org/jira/browse/HBASE-5635 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.1 Reporter: Kristam Subba Swathi During the hlog split operation if all the zookeepers are down ,then the paths will be returned as null and the splitworker thread wil be exited Now this regionserver wil not be able to acquire any other tasks since the splitworker thread is exited Please find the attached code for more details -- private ListString getTaskList() { for (int i = 0; i zkretries; i++) { try { return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher, this.watcher.splitLogZNode)); } catch (KeeperException e) { LOG.warn(Could not get children of znode + this.watcher.splitLogZNode, e); try { Thread.sleep(1000); } catch (InterruptedException e1) { LOG.warn(Interrupted while trying to get task list ..., e1); Thread.currentThread().interrupt(); return null; } } } in the org.apache.hadoop.hbase.regionserver.SplitLogWorker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5517) Region Server Coprocessor : Suggestion for change when next() call with nbRows1
[ https://issues.apache.org/jira/browse/HBASE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-5517. --- Resolution: Won't Fix As per Andrew's comment above, we can have the alternate solution of creating the wrapper for RegionScanner and get the functionality. So marking this issue as closed and wont fix Region Server Coprocessor : Suggestion for change when next() call with nbRows1 Key: HBASE-5517 URL: https://issues.apache.org/jira/browse/HBASE-5517 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.92.0 Reporter: Anoop Sam John Originated from the discussion under HBASE-2038 [Coprocessor based IHBase] Currently preNext() and postNext() will be called once for a next() call into HRegionServer. But if the next() is being called with nbRows1, co processor should provide a chance to do some operation before, after every next() calls into region as part of call next(int scannerId, int nbRows). In case of usage of coprocessor with IHBase, before making any calls of next() into a Region, we need to make a reseek() to a row based on the index information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.
[ https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238328#comment-13238328 ] Uma Maheswara Rao G commented on HBASE-5635: Yes, I think, continuing without SplitLogWroker may not be a good behaviour. Because that particular regionServer may have more capacity to take up the new regions. With the current behaviour it may not compete for taking any new splilog work. I feel we can retry for some times and then we can shutdown regionServer? or other option is to retry forever on any ZK exception. And can exit only on interrupted exception. Also i am seeing this issue may be bit dangerous bacause, if ZK is not available for some time, all RegionServer may face this problem and no one will take up the splitlog work. listChildrenAndWatchForNewChildren will return null only if node does not exist. If it is not able to find any children then it will return empty list. So, zookeeper.znode.splitlog will be always set. On Other keeperExceptions like ZK unavalability and all, we have to handle. If getTaskList() returns null splitlogWorker is down. It wont serve any requests. -- Key: HBASE-5635 URL: https://issues.apache.org/jira/browse/HBASE-5635 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.1 Reporter: Kristam Subba Swathi During the hlog split operation if all the zookeepers are down ,then the paths will be returned as null and the splitworker thread wil be exited Now this regionserver wil not be able to acquire any other tasks since the splitworker thread is exited Please find the attached code for more details -- private ListString getTaskList() { for (int i = 0; i zkretries; i++) { try { return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher, this.watcher.splitLogZNode)); } catch (KeeperException e) { LOG.warn(Could not get children of znode + this.watcher.splitLogZNode, e); try { Thread.sleep(1000); } catch (InterruptedException e1) { LOG.warn(Interrupted while trying to get task list ..., e1); Thread.currentThread().interrupt(); return null; } } } in the org.apache.hadoop.hbase.regionserver.SplitLogWorker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated HBASE-5564: -- Status: Open (was: Patch Available) Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated HBASE-5564: -- Attachment: HBASE-5564_trunk.2.patch Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated HBASE-5564: -- Status: Patch Available (was: Open) Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238452#comment-13238452 ] Laxman commented on HBASE-5564: --- @Stack, updated the patch after fixing your comments. Thanks for the review. Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238506#comment-13238506 ] Hadoop QA commented on HBASE-5564: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519960/HBASE-5564_trunk.2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1305//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1305//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1305//console This message is automatically generated. Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238513#comment-13238513 ] Alex Newman commented on HBASE-2600: I really have no idea what's going on here. I can't seem to create patch from svn or git. Also, I've noticed the patch does have the binary snippet, svn and patch just aren't applying it. My jenkins job runs out of memory(/dev/shm). So I gave the build machine a reboot and the branch at http://github.com/posix4e/hbase (branch jenkins) built fine. Can someone with a commit bit just pull it into a svn branch? Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid Key: HBASE-2600 URL: https://issues.apache.org/jira/browse/HBASE-2600 Project: HBase Issue Type: Bug Reporter: stack Assignee: Alex Newman Attachments: 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, jenkins.pdf This is an idea that Ryan and I have been kicking around on and off for a while now. If regionnames were made of tablename+endrow instead of tablename+startrow, then in the metatables, doing a search for the region that contains the wanted row, we'd just have to open a scanner using passed row and the first row found by the scan would be that of the region we need (If offlined parent, we'd have to scan to the next row). If we redid the meta tables in this format, we'd be using an access that is natural to hbase, a scan as opposed to the perverse, expensive getClosestRowBefore we currently have that has to walk backward in meta finding a containing region. This issue is about changing the way we name regions. If we were using scans, prewarming client cache would be near costless (as opposed to what we'll currently have to do which is first a getClosestRowBefore and then a scan from the closestrowbefore forward). Converting to the new method, we'd have to run a migration on startup changing the content in meta. Up to this, the randomid component of a region name has been the timestamp of region creation. HBASE-2531 32-bit encoding of regionnames waaay too susceptible to hash clashes proposes changing the randomid so that it contains actual name of the directory in the filesystem that hosts the region. If we had this in place, I think it would help with the migration to this new way of doing the meta because as is, the region name in fs is a hash of regionname... changing the format of the regionname would mean we generate a different hash... so we'd need hbase-2531 to be in place before we could do this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE
[ https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238532#comment-13238532 ] Lars Hofhansl commented on HBASE-5097: -- Feel free to put it in 0.94.0. Will cut an RC today. RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE --- Key: HBASE-5097 URL: https://issues.apache.org/jira/browse/HBASE-5097 Project: HBase Issue Type: Bug Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch In HRegionServer.java openScanner() {code} r.prepareScanner(scan); RegionScanner s = null; if (r.getCoprocessorHost() != null) { s = r.getCoprocessorHost().preScannerOpen(scan); } if (s == null) { s = r.getScanner(scan); } if (r.getCoprocessorHost() != null) { s = r.getCoprocessorHost().postScannerOpen(scan, s); } {code} If we dont have implemention for postScannerOpen the RegionScanner is null and so throwing nullpointer {code} java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326) {code} Making this defect as blocker.. Pls feel free to change the priority if am wrong. Also correct me if my way of trying out coprocessors without implementing postScannerOpen is wrong. Am just a learner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238534#comment-13238534 ] Lars Hofhansl commented on HBASE-5623: -- @Enis: Please have a look when you get a chance. I would like to cut an RC of 0.94 today. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA
[jira] [Updated] (HBASE-5633) NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5633: - Resolution: Fixed Status: Resolved (was: Patch Available) Let's move the 0.90 and 0.92 into a sub-task, so that this issue can be kept closed. NPE reading ZK config in HBase -- Key: HBASE-5633 URL: https://issues.apache.org/jira/browse/HBASE-5633 Project: HBase Issue Type: Bug Components: zookeeper Reporter: Matteo Bertozzi Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch, HBASE-5633-v1.patch, HBASE-5633-v2.patch If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and cluster.distributed property (in hbase-site.xml) is empty we get an NPE in parseZooCfg(). The easy way to reproduce the bug is running org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing: {code} property namehbase.cluster.distributed/name value/value /property {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5615: - Resolution: Fixed Status: Resolved (was: Patch Available) the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5636) TestTableMapReduce doesn't work properly.
TestTableMapReduce doesn't work properly. - Key: HBASE-5636 URL: https://issues.apache.org/jira/browse/HBASE-5636 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.1 Reporter: Takuya Ueshin No map function is called because there are no test data put before test starts. The following three tests are in the same situation: - org.apache.hadoop.hbase.mapred.TestTableMapReduce - org.apache.hadoop.hbase.mapreduce.TestTableMapReduce - org.apache.hadoop.hbase.mapreduce.TestMulitthreadedTableMapper -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
[ https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5637: -- Attachment: hbase-5637.patch Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563. Key: HBASE-5637 URL: https://issues.apache.org/jira/browse/HBASE-5637 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5637.patch After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed that TestHMsg#getList started to fail consistently. This updates the test to deal with the updated equality semantics in HBASE-5563. This fixes that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
[ https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238565#comment-13238565 ] Jonathan Hsieh commented on HBASE-5637: --- This test case does not exist in 0.92/0.94/trunk and is not relevent to it. Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563. Key: HBASE-5637 URL: https://issues.apache.org/jira/browse/HBASE-5637 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5637.patch After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed that TestHMsg#getList started to fail consistently. This updates the test to deal with the updated equality semantics in HBASE-5563. This fixes that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
[ https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5637: -- Status: Patch Available (was: Open) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563. Key: HBASE-5637 URL: https://issues.apache.org/jira/browse/HBASE-5637 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5637.patch After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed that TestHMsg#getList started to fail consistently. This updates the test to deal with the updated equality semantics in HBASE-5563. This fixes that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
[ https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238568#comment-13238568 ] stack commented on HBASE-5637: -- +1 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563. Key: HBASE-5637 URL: https://issues.apache.org/jira/browse/HBASE-5637 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5637.patch After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed that TestHMsg#getList started to fail consistently. This updates the test to deal with the updated equality semantics in HBASE-5563. This fixes that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
[ https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238571#comment-13238571 ] David S. Wang commented on HBASE-5637: -- This looks good to me. Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563. Key: HBASE-5637 URL: https://issues.apache.org/jira/browse/HBASE-5637 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5637.patch After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed that TestHMsg#getList started to fail consistently. This updates the test to deal with the updated equality semantics in HBASE-5563. This fixes that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-3813) Change RPC callQueue size from handlerCount * MAX_QUEUE_SIZE_PER_HANDLER;
[ https://issues.apache.org/jira/browse/HBASE-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans reassigned HBASE-3813: - Assignee: stack Change RPC callQueue size from handlerCount * MAX_QUEUE_SIZE_PER_HANDLER; --- Key: HBASE-3813 URL: https://issues.apache.org/jira/browse/HBASE-3813 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 3813.txt Yesterday debugging w/ Jack we noticed that with few handlers on a big box, he was seeing stats like this: {code} 2011-04-21 11:54:49,451 DEBUG org.apache.hadoop.ipc.HBaseServer: Server connection from X.X.X.X:60931; # active connections: 11; # queued calls: 2500 {code} We had 2500 items in the rpc queue waiting to be processed. Turns out he had too few handlers for number of clients (but also, it seems like he figured hw issues in that his RAM bus was running at 1/4 the rate that it should have been running at). Chatting w/ J-D this morning, he asked if the queues hold 'data'. The queues hold 'Calls'. Calls are the client request. They contain data. Jack had 2500 items queued. If each item to insert was 1MB, thats 25k * 1MB of memory that is outside of our generally accounting. Currently the queue size is handlers * MAX_QUEUE_SIZE_PER_HANDLER where MAX_QUEUE_SIZE_PER_HANDLER is hardcoded to be 100. If the queue is full we block (LinkedBlockingQueue). Going to change the queue size from 100 to 10 by default -- but also will make it configurable and will doc. this as possible cause of OOME. Will try it on production here before committing patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209
[ https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238591#comment-13238591 ] Jonathan Hsieh commented on HBASE-5596: --- +1, lgtm. Since this is an addendum to HBASE-5209, I've tweaked it to work on 0.94 and 0.92. I'll run them through test suites and if they pass tests I'll commit. Few minor bugs from HBASE-5209 -- Key: HBASE-5596 URL: https://issues.apache.org/jira/browse/HBASE-5596 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.94.0 Reporter: David S. Wang Assignee: David S. Wang Priority: Minor Attachments: HBASE-5596.patch, hbase-5596-0.94.patch A few leftover bugs from HBASE-5209. Comments are documented here: https://reviews.apache.org/r/3892/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5596) Few minor bugs from HBASE-5209
[ https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5596: -- Attachment: hbase-5596-0.94.patch one-line change to apply to 0.94/0.92 Few minor bugs from HBASE-5209 -- Key: HBASE-5596 URL: https://issues.apache.org/jira/browse/HBASE-5596 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.94.0 Reporter: David S. Wang Assignee: David S. Wang Priority: Minor Attachments: HBASE-5596.patch, hbase-5596-0.94.patch A few leftover bugs from HBASE-5209. Comments are documented here: https://reviews.apache.org/r/3892/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
[ https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238589#comment-13238589 ] Hadoop QA commented on HBASE-5637: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519976/hbase-5637.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1306//console This message is automatically generated. Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563. Key: HBASE-5637 URL: https://issues.apache.org/jira/browse/HBASE-5637 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5637.patch After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed that TestHMsg#getList started to fail consistently. This updates the test to deal with the updated equality semantics in HBASE-5563. This fixes that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5190) Limit the IPC queue size based on calls' payload size
[ https://issues.apache.org/jira/browse/HBASE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-5190. --- Resolution: Fixed Committed the addendum, thanks for looking at this Ted. Also, sorry I forgot about security. Limit the IPC queue size based on calls' payload size - Key: HBASE-5190 URL: https://issues.apache.org/jira/browse/HBASE-5190 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.0, 0.96.0 Attachments: 5190.addendum, HBASE-5190-v2.patch, HBASE-5190-v3.patch, HBASE-5190.patch Currently we limit the number of calls in the IPC queue only on their count. It used to be really high and was dropped down recently to num_handlers * 10 (so 100 by default) because it was easy to OOME yourself when huge calls were being queued. It's still possible to hit this problem if you use really big values and/or a lot of handlers, so the idea is that we should take into account the payload size. I can see 3 solutions: - Do the accounting outside of the queue itself for all calls coming in and out and when a call doesn't fit, throw a retryable exception. - Same accounting but instead block the call when it comes in until space is made available. - Add a new parameter for the maximum size (in bytes) of a Call and then set the size the IPC queue (in terms of the number of items) so that it could only contain as many items as some predefined maximum size (in bytes) for the whole queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
[ https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5637: -- Comment: was deleted (was: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519976/hbase-5637.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1306//console This message is automatically generated.) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563. Key: HBASE-5637 URL: https://issues.apache.org/jira/browse/HBASE-5637 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5637.patch After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed that TestHMsg#getList started to fail consistently. This updates the test to deal with the updated equality semantics in HBASE-5563. This fixes that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
[ https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5637: -- Resolution: Fixed Fix Version/s: 0.90.7 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563. Key: HBASE-5637 URL: https://issues.apache.org/jira/browse/HBASE-5637 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7 Attachments: hbase-5637.patch After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed that TestHMsg#getList started to fail consistently. This updates the test to deal with the updated equality semantics in HBASE-5563. This fixes that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
[ https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238609#comment-13238609 ] Jonathan Hsieh commented on HBASE-5637: --- Committed. Dave and Stack, thanks for review. Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563. Key: HBASE-5637 URL: https://issues.apache.org/jira/browse/HBASE-5637 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7 Attachments: hbase-5637.patch After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed that TestHMsg#getList started to fail consistently. This updates the test to deal with the updated equality semantics in HBASE-5563. This fixes that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209
[ https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238621#comment-13238621 ] Hadoop QA commented on HBASE-5596: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519979/hbase-5596-0.94.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1308//console This message is automatically generated. Few minor bugs from HBASE-5209 -- Key: HBASE-5596 URL: https://issues.apache.org/jira/browse/HBASE-5596 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.94.0 Reporter: David S. Wang Assignee: David S. Wang Priority: Minor Attachments: HBASE-5596.patch, hbase-5596-0.94.patch A few leftover bugs from HBASE-5209. Comments are documented here: https://reviews.apache.org/r/3892/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5638) Backport HBASE-5633 to 0.90 and 0.92 - NPE reading ZK config in HBase
Backport HBASE-5633 to 0.90 and 0.92 - NPE reading ZK config in HBase - Key: HBASE-5638 URL: https://issues.apache.org/jira/browse/HBASE-5638 Project: HBase Issue Type: Sub-task Components: zookeeper Affects Versions: 0.92.1, 0.90.6 Reporter: Matteo Bertozzi Priority: Minor Fix For: 0.90.7, 0.92.2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5638: --- Summary: Backport to 0.90 and 0.92 - NPE reading ZK config in HBase (was: Backport HBASE-5633 to 0.90 and 0.92 - NPE reading ZK config in HBase) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase -- Key: HBASE-5638 URL: https://issues.apache.org/jira/browse/HBASE-5638 Project: HBase Issue Type: Sub-task Components: zookeeper Affects Versions: 0.90.6, 0.92.1 Reporter: Matteo Bertozzi Priority: Minor Fix For: 0.90.7, 0.92.2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5638: --- Status: Patch Available (was: Open) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase -- Key: HBASE-5638 URL: https://issues.apache.org/jira/browse/HBASE-5638 Project: HBase Issue Type: Sub-task Components: zookeeper Affects Versions: 0.92.1, 0.90.6 Reporter: Matteo Bertozzi Priority: Minor Fix For: 0.90.7, 0.92.2 Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5638: --- Attachment: HBASE-5633-0.92.patch HBASE-5633-0.90.patch I've attached the 0.90 and 0.92 patches to backport HBASE-5633 fix Backport to 0.90 and 0.92 - NPE reading ZK config in HBase -- Key: HBASE-5638 URL: https://issues.apache.org/jira/browse/HBASE-5638 Project: HBase Issue Type: Sub-task Components: zookeeper Affects Versions: 0.90.6, 0.92.1 Reporter: Matteo Bertozzi Priority: Minor Fix For: 0.90.7, 0.92.2 Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238634#comment-13238634 ] Hadoop QA commented on HBASE-5638: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519985/HBASE-5633-0.92.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1309//console This message is automatically generated. Backport to 0.90 and 0.92 - NPE reading ZK config in HBase -- Key: HBASE-5638 URL: https://issues.apache.org/jira/browse/HBASE-5638 Project: HBase Issue Type: Sub-task Components: zookeeper Affects Versions: 0.90.6, 0.92.1 Reporter: Matteo Bertozzi Priority: Minor Fix For: 0.90.7, 0.92.2 Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding
[ https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238645#comment-13238645 ] stack commented on HBASE-4676: -- bq. The KeyValueScanner and related classes that iterate through the trie will eventually need to be smarter and have methods to do things like skipping to the next row of results without scanning every cell in between. Do we not have this now Matt? As said before, 'So 40-80x faster seeks at medium block size.', is pretty nice improvement. Ditto on 'Seek speed is ~15x NONE seek speed while expanding the effective block cache size by 4x.' What are you thinking these times Matt? The slow write and scan speeds would tend to rule it out for anything but a random read workload but when random reading only, you'll want to use prefixtrie (smile). Do you think it possible to approach prefix compression scan and write speeds? Regards your working set read testing, did it all fit in memory? How did you measure cycles per seek? What is numBlocks? The total number of blocks the dataset fit in? Throughput is better for sure.. what is latency like? Excellent stuff Matt. Prefix Compression - Trie data block encoding - Key: HBASE-4676 URL: https://issues.apache.org/jira/browse/HBASE-4676 Project: HBase Issue Type: New Feature Components: io, performance, regionserver Affects Versions: 0.90.6 Reporter: Matt Corgan Attachments: PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png The HBase data block format has room for 2 significant improvements for applications that have high block cache hit ratios. First, there is no prefix compression, and the current KeyValue format is somewhat metadata heavy, so there can be tremendous memory bloat for many common data layouts, specifically those with long keys and short values. Second, there is no random access to KeyValues inside data blocks. This means that every time you double the datablock size, average seek time (or average cpu consumption) goes up by a factor of 2. The standard 64KB block size is ~10x slower for random seeks than a 4KB block size, but block sizes as small as 4KB cause problems elsewhere. Using block sizes of 256KB or 1MB or more may be more efficient from a disk access and block-cache perspective in many big-data applications, but doing so is infeasible from a random seek perspective. The PrefixTrie block encoding format attempts to solve both of these problems. Some features: * trie format for row key encoding completely eliminates duplicate row keys and encodes similar row keys into a standard trie structure which also saves a lot of space * the column family is currently stored once at the beginning of each block. this could easily be modified to allow multiple family names per block * all qualifiers in the block are stored in their own trie format which caters nicely to wide rows. duplicate qualifers between rows are eliminated. the size of this trie determines the width of the block's qualifier fixed-width-int * the minimum timestamp is stored at the beginning of the block, and deltas are calculated from that. the maximum delta determines the width of the block's timestamp fixed-width-int The block is structured with metadata at the beginning, then a section for the row trie, then the column trie, then the timestamp deltas, and then then all the values. Most work is done in the row trie, where every leaf node (corresponding to a row) contains a list of offsets/references corresponding to the cells in that row. Each cell is fixed-width to enable binary searching and is represented by [1 byte operationType, X bytes qualifier offset, X bytes timestamp delta offset]. If all operation types are the same for a block, there will be zero per-cell overhead. Same for timestamps. Same for qualifiers when i get a chance. So, the compression aspect is very strong, but makes a few small sacrifices on VarInt size to enable faster binary searches in trie fan-out nodes. A more compressed but slower version might build on this by also applying further (suffix, etc) compression on the trie nodes at the cost of slower write speed. Even further compression could be obtained by using all VInts instead of FInts with a sacrifice on random seek speed (though not huge). One current drawback is the current write speed. While programmed with good constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not programmed with the same level of optimization as the read path. Work will need to be done to optimize the data structures used for encoding and could probably show a 10x increase. It will still be slower than delta encoding,
[jira] [Commented] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238647#comment-13238647 ] Hadoop QA commented on HBASE-5533: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519846/HBASE-5533-TRUNK-v6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1307//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1307//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1307//console This message is automatically generated. Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.2, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, histogram_web_ui.png To debug/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238664#comment-13238664 ] Lars Hofhansl commented on HBASE-5623: -- I do not want to wait with the RC any longer. @Stack or @Ted: Please have a look at the last patch (will add LOG.debug message as indicated in the last comment). Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by
[jira] [Assigned] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G reassigned HBASE-5598: -- Assignee: Uma Maheswara Rao G Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238675#comment-13238675 ] Ted Yu commented on HBASE-5623: --- Last patch should be good to go. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238680#comment-13238680 ] stack commented on HBASE-5623: -- Is this right? {code} +Writer tempWriter; synchronized (this.updateLock) { if (this.closed) return; + tempWriter = this.writer; // guaranteed non-null } {code} When we go to use tempWriter, it may be closed. We're fine w/ that? We'll get an IOE about it being closed Yeah, add a log here instead: {code} +} catch (IOException x) { + // Ignore. + // Writer might have been closed. + // In any case, we either don't have to do anything, + // or the log will be rolled the next time. } {code} I'm +1 on commit. Its way to easy reproducing the NPE. The changes in SequenceFileLogWriter are an improvement. I'm down w/ the retry on IOE (long as the retry is logged). Good stuff. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238689#comment-13238689 ] Lars Hofhansl commented on HBASE-5623: -- Re: tempWriter. Yep, the idea is the *only* error-condition for tempWriter is that it could have been closed. I'll add LOG.debug(Log roll failed and will be retried. (This is not an error)) Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message
[jira] [Commented] (HBASE-5190) Limit the IPC queue size based on calls' payload size
[ https://issues.apache.org/jira/browse/HBASE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238697#comment-13238697 ] Hudson commented on HBASE-5190: --- Integrated in HBase-0.94 #56 (See [https://builds.apache.org/job/HBase-0.94/56/]) HBASE-5190 Limit the IPC queue size based on calls' payload size (Ted's addendum) (Revision 1305469) Result = FAILURE jdcryans : Files : * /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java Limit the IPC queue size based on calls' payload size - Key: HBASE-5190 URL: https://issues.apache.org/jira/browse/HBASE-5190 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.0, 0.96.0 Attachments: 5190.addendum, HBASE-5190-v2.patch, HBASE-5190-v3.patch, HBASE-5190.patch Currently we limit the number of calls in the IPC queue only on their count. It used to be really high and was dropped down recently to num_handlers * 10 (so 100 by default) because it was easy to OOME yourself when huge calls were being queued. It's still possible to hit this problem if you use really big values and/or a lot of handlers, so the idea is that we should take into account the payload size. I can see 3 solutions: - Do the accounting outside of the queue itself for all calls coming in and out and when a call doesn't fit, throw a retryable exception. - Same accounting but instead block the call when it comes in until space is made available. - Add a new parameter for the maximum size (in bytes) of a Call and then set the size the IPC queue (in terms of the number of items) so that it could only contain as many items as some predefined maximum size (in bytes) for the whole queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238699#comment-13238699 ] jirapos...@reviews.apache.org commented on HBASE-5619: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4054/#review6354 --- I'm looking at differences between r1 and r4 and I don't see anything new in HRegionProtocol.proto ..? pom.xml https://reviews.apache.org/r/4054/#comment13781 Can you please kill all the trailing whitespaces while you're at it? pom.xml https://reviews.apache.org/r/4054/#comment13779 Just do if which cygpath 2/dev/null instead of doing it on 2 lines and testing $? pom.xml https://reviews.apache.org/r/4054/#comment13782 Just do for PROTO_FILE in $UNIX_PROTO_DIR/*.proto, I don't think the ls is really necessary here. src/main/proto/HRegionProtocol.proto https://reviews.apache.org/r/4054/#comment13783 I'm not seeing that you added it. - Benoit On 2012-03-23 19:29:52, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4054/ bq. --- bq. bq. (Updated 2012-03-23 19:29:52) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This is the first draft of the ProtoBuff HRegionProtocol. The corresponding java vs pb method mapping is attached to the jira: https://issues.apache.org/jira/browse/HBASE-5443 bq. bq. Please review. I'd like to move ahead after we get to some agreement. bq. bq. bq. This addresses bug HBASE-5619. bq. https://issues.apache.org/jira/browse/HBASE-5619 bq. bq. bq. Diffs bq. - bq. bq.pom.xml 10b13ef bq.src/main/proto/RegionAdmin.proto PRE-CREATION bq.src/main/proto/RegionClient.proto PRE-CREATION bq.src/main/proto/hbase.proto PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4054/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. Create PB protocols for HRegionInterface Key: HBASE-5619 URL: https://issues.apache.org/jira/browse/HBASE-5619 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5619.patch Subtask of HBase-5443, separate HRegionInterface into admin protocol and client protocol, create the PB protocol buffer files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238704#comment-13238704 ] jirapos...@reviews.apache.org commented on HBASE-5619: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4054/#review6356 --- Oh my, it's just that review board sucks, it's still showing the file as being added when comparing r1 and r4, even though the file is no longer in r4. - Benoit On 2012-03-23 19:29:52, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4054/ bq. --- bq. bq. (Updated 2012-03-23 19:29:52) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This is the first draft of the ProtoBuff HRegionProtocol. The corresponding java vs pb method mapping is attached to the jira: https://issues.apache.org/jira/browse/HBASE-5443 bq. bq. Please review. I'd like to move ahead after we get to some agreement. bq. bq. bq. This addresses bug HBASE-5619. bq. https://issues.apache.org/jira/browse/HBASE-5619 bq. bq. bq. Diffs bq. - bq. bq.pom.xml 10b13ef bq.src/main/proto/RegionAdmin.proto PRE-CREATION bq.src/main/proto/RegionClient.proto PRE-CREATION bq.src/main/proto/hbase.proto PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4054/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. Create PB protocols for HRegionInterface Key: HBASE-5619 URL: https://issues.apache.org/jira/browse/HBASE-5619 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5619.patch Subtask of HBase-5443, separate HRegionInterface into admin protocol and client protocol, create the PB protocol buffer files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5533: - Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. The two tests that failed seemed to have nought to do w/ this patch (I tried them locally and they both passed). Thanks for the nice addition Shaneal. Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.2, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Fix For: 0.96.0 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, histogram_web_ui.png To debug/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5628) Improve performance of uberhbck
[ https://issues.apache.org/jira/browse/HBASE-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238721#comment-13238721 ] Jonathan Hsieh commented on HBASE-5628: --- Minor optimization: in HBaseFsck#loadHdfsRegionInfos we could consolidate. {code} if (modTInfo = null) { ... } .. tablesInfo.put(tableNAme, modTInfo); {code} to be {code} if (modTInfo = null) { ... tablesInfo.put(tableNAme, modTInfo); } .. {code} Improve performance of uberhbck --- Key: HBASE-5628 URL: https://issues.apache.org/jira/browse/HBASE-5628 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh During reviews of HBASE-5128 there are several opportunities investigate for improving the performance of the tool. - Change regionInfoMap and tablesInfo from TreeMap to HashMap. - Change some full region set reloads to be incremental to require fewer passes. - Cache meta for subsequent calls of closeRegionSileneglyAndWait -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238766#comment-13238766 ] jirapos...@reviews.apache.org commented on HBASE-5619: -- bq. On 2012-03-26 19:15:13, Benoit Sigoure wrote: bq. pom.xml, line 780 bq. https://reviews.apache.org/r/4054/diff/1-4/?file=86002#file86002line780 bq. bq. Just do if which cygpath 2/dev/null instead of doing it on 2 lines and testing $? Sure, will fix. bq. On 2012-03-26 19:15:13, Benoit Sigoure wrote: bq. pom.xml, line 787 bq. https://reviews.apache.org/r/4054/diff/1-4/?file=86002#file86002line787 bq. bq. Just do for PROTO_FILE in $UNIX_PROTO_DIR/*.proto, I don't think the ls is really necessary here. You are right. It works without ls. Fixed. bq. On 2012-03-26 19:15:13, Benoit Sigoure wrote: bq. pom.xml, line 334 bq. https://reviews.apache.org/r/4054/diff/1-4/?file=86002#file86002line334 bq. bq. Can you please kill all the trailing whitespaces while you're at it? Sure, I will do that. - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4054/#review6354 --- On 2012-03-23 19:29:52, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4054/ bq. --- bq. bq. (Updated 2012-03-23 19:29:52) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This is the first draft of the ProtoBuff HRegionProtocol. The corresponding java vs pb method mapping is attached to the jira: https://issues.apache.org/jira/browse/HBASE-5443 bq. bq. Please review. I'd like to move ahead after we get to some agreement. bq. bq. bq. This addresses bug HBASE-5619. bq. https://issues.apache.org/jira/browse/HBASE-5619 bq. bq. bq. Diffs bq. - bq. bq.pom.xml 10b13ef bq.src/main/proto/RegionAdmin.proto PRE-CREATION bq.src/main/proto/RegionClient.proto PRE-CREATION bq.src/main/proto/hbase.proto PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4054/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. Create PB protocols for HRegionInterface Key: HBASE-5619 URL: https://issues.apache.org/jira/browse/HBASE-5619 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5619.patch Subtask of HBase-5443, separate HRegionInterface into admin protocol and client protocol, create the PB protocol buffer files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238768#comment-13238768 ] jirapos...@reviews.apache.org commented on HBASE-5619: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4054/ --- (Updated 2012-03-26 20:14:22.677676) Review request for hbase. Changes --- Addressed Benoit's comments. Summary --- This is the first draft of the ProtoBuff HRegionProtocol. The corresponding java vs pb method mapping is attached to the jira: https://issues.apache.org/jira/browse/HBASE-5443 Please review. I'd like to move ahead after we get to some agreement. This addresses bug HBASE-5619. https://issues.apache.org/jira/browse/HBASE-5619 Diffs (updated) - pom.xml 10b13ef src/main/proto/RegionAdmin.proto PRE-CREATION src/main/proto/RegionClient.proto PRE-CREATION src/main/proto/hbase.proto PRE-CREATION Diff: https://reviews.apache.org/r/4054/diff Testing --- Thanks, Jimmy Create PB protocols for HRegionInterface Key: HBASE-5619 URL: https://issues.apache.org/jira/browse/HBASE-5619 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5619.patch Subtask of HBase-5443, separate HRegionInterface into admin protocol and client protocol, create the PB protocol buffer files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5544) Add metrics to HRegion.processRow()
[ https://issues.apache.org/jira/browse/HBASE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-5544: -- Hadoop Flags: Reviewed Status: Patch Available (was: Open) Add metrics to HRegion.processRow() --- Key: HBASE-5544 URL: https://issues.apache.org/jira/browse/HBASE-5544 Project: HBase Issue Type: New Feature Reporter: Scott Chen Assignee: Scott Chen Attachments: HBASE-5544.D2457.1.patch, HBASE-5544.D2457.2.patch Add metrics of 1. time for waiting for the lock 2. processing time (scan time) 3. time spent while holding the lock 4. total call time 5. number of failures / calls -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5544) Add metrics to HRegion.processRow()
[ https://issues.apache.org/jira/browse/HBASE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-5544: -- Fix Version/s: 0.96.0 Add metrics to HRegion.processRow() --- Key: HBASE-5544 URL: https://issues.apache.org/jira/browse/HBASE-5544 Project: HBase Issue Type: New Feature Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.96.0 Attachments: HBASE-5544.D2457.1.patch, HBASE-5544.D2457.2.patch Add metrics of 1. time for waiting for the lock 2. processing time (scan time) 3. time spent while holding the lock 4. total call time 5. number of failures / calls -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-5606: --- Attachment: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch Do not do any error processing if the getDataSetWatch() call from SplitLogManager timeoutMonitor fails SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Priority: Critical Fix For: 0.92.2 Attachments: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238794#comment-13238794 ] Prakash Khemani commented on HBASE-5606: @Jimmy This is similar to HBASE-5081 w.r.t what goes wrong - a pending delete creates havoc on the next create. But it is different from HBASE-5081 because the pending Delete is created at a different point in the code - in the timeoutMonitor and not when the task actually fails ... SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Priority: Critical Fix For: 0.92.2 Attachments: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5623: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to 0.94 and 0.96. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238800#comment-13238800 ] Lars Hofhansl commented on HBASE-5623: -- Thanks for the patch Enis! And thanks for the reviews Stack and Ted. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238802#comment-13238802 ] Jean-Daniel Cryans commented on HBASE-5623: --- Funny, I just saw that NPE for the first time in my testing. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238804#comment-13238804 ] Uma Maheswara Rao G commented on HBASE-5598: I am not so favor of adding directly static tools related annotations into code. In Hadoop projects(HDFS, Mapreduce) we are using this exclude-filter for invalid find bugs. So, I proposed this here. I think we can decide first how we will exclude the invalid bugs, then we can start working on the bug fix directly. Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3134) [replication] Add the ability to enable/disable streams
[ https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238807#comment-13238807 ] Jean-Daniel Cryans commented on HBASE-3134: --- +1 on the patch you put up on review board Teruyoshi, can you attach it here and grant the license so that I can commit? Thanks! [replication] Add the ability to enable/disable streams --- Key: HBASE-3134 URL: https://issues.apache.org/jira/browse/HBASE-3134 Project: HBase Issue Type: New Feature Components: replication Reporter: Jean-Daniel Cryans Assignee: Teruyoshi Zenmyo Priority: Minor Labels: replication Fix For: 0.94.1 Attachments: 3134-v2.txt, 3134-v3.txt, 3134.txt, HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch This jira was initially in the scope of HBASE-2201, but was pushed out since it has low value compared to the required effort (and when want to ship 0.90.0 rather soonish). We need to design a way to enable/disable replication streams in a determinate fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5533: - Fix Version/s: 0.94.0 Patch applies almost cleanup to 0.94. Will pull it in. Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.2, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, histogram_web_ui.png To debug/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238811#comment-13238811 ] Jimmy Xiang commented on HBASE-5606: @Prakash, could there be other places which failed delete can cause this issue? Is it a cleaner fix to change async delete to sync delete? With sync delete, we can avoid all these havoc racing problems, and the retry will get a fresh start each time. SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Priority: Critical Fix For: 0.92.2 Attachments: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238817#comment-13238817 ] Lars Hofhansl commented on HBASE-5533: -- Committed to 0.94 as well. Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.2, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, histogram_web_ui.png To debug/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238820#comment-13238820 ] Uma Maheswara Rao G commented on HBASE-5598: Who ever wants to generate the findbugs html report locally, follow the below steps. 1) add the below target in pom.xml. Already there is one transformationSet available in pom.xml. just we can place this after that transformationSet. transformationSet dir${basedir}/target//dir includes includefindbugsXml.xml/include /includes stylesheetE:/SoftWares/findbugs-1.3.9/findbugs-1.3.9/src/xsl/default.xsl/stylesheet outputDir${basedir}/target//outputDir /transformationSet 2) Make sure to update the above findbugs xsl path correctly referring to your local path of findbugs. 3) run 'mvn findbugs:findbugs' 4) run 'mvn site' now ${basedir}/target/findbugsXml.xml will be replaced with html report. rename to ${basedir}/target/findbugsXml.html and open. Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5619) Create PB protocols for HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5619: --- Status: Open (was: Patch Available) Create PB protocols for HRegionInterface Key: HBASE-5619 URL: https://issues.apache.org/jira/browse/HBASE-5619 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5619.patch, hbase-5619_v3.patch Subtask of HBase-5443, separate HRegionInterface into admin protocol and client protocol, create the PB protocol buffer files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5619) Create PB protocols for HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5619: --- Status: Patch Available (was: Open) Create PB protocols for HRegionInterface Key: HBASE-5619 URL: https://issues.apache.org/jira/browse/HBASE-5619 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5619.patch, hbase-5619_v3.patch Subtask of HBase-5443, separate HRegionInterface into admin protocol and client protocol, create the PB protocol buffer files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5619) Create PB protocols for HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5619: --- Attachment: hbase-5619_v3.patch Create PB protocols for HRegionInterface Key: HBASE-5619 URL: https://issues.apache.org/jira/browse/HBASE-5619 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5619.patch, hbase-5619_v3.patch Subtask of HBase-5443, separate HRegionInterface into admin protocol and client protocol, create the PB protocol buffer files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238838#comment-13238838 ] Nicolas Spiegelberg commented on HBASE-5626: How is this different from the compaction simulation python script? The unit of measurement should be a flush, since we flush after a certain memstore memory size, regardless of flow rate or KV length. Compactions simulator tool for proofing algorithms -- Key: HBASE-5626 URL: https://issues.apache.org/jira/browse/HBASE-5626 Project: HBase Issue Type: Task Reporter: stack Priority: Minor Labels: noob A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238849#comment-13238849 ] Uma Maheswara Rao G commented on HBASE-5598: Also another case to use exclude filter is, we have many findbugs reported from generated code (from thrift). We can just add that package to the filter file. ex: {code} FindBugsFilter Match Package name=org.apache.hadoop.hbase.thrift.generated / /Match /FindBugsFilter {code} Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238853#comment-13238853 ] stack commented on HBASE-5626: -- Where is the python simulation script? Is it uploaded anywhere? (Pardon me if I missed it) Simulator needs to also factor in splitting. Compactions simulator tool for proofing algorithms -- Key: HBASE-5626 URL: https://issues.apache.org/jira/browse/HBASE-5626 Project: HBase Issue Type: Task Reporter: stack Priority: Minor Labels: noob A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5544) Add metrics to HRegion.processRow()
[ https://issues.apache.org/jira/browse/HBASE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238856#comment-13238856 ] Hadoop QA commented on HBASE-5544: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519753/HBASE-5544.D2457.2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1311//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1311//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1311//console This message is automatically generated. Add metrics to HRegion.processRow() --- Key: HBASE-5544 URL: https://issues.apache.org/jira/browse/HBASE-5544 Project: HBase Issue Type: New Feature Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.96.0 Attachments: HBASE-5544.D2457.1.patch, HBASE-5544.D2457.2.patch Add metrics of 1. time for waiting for the lock 2. processing time (scan time) 3. time spent while holding the lock 4. total call time 5. number of failures / calls -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238861#comment-13238861 ] Hadoop QA commented on HBASE-5606: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520003/0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1310//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1310//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1310//console This message is automatically generated. SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Priority: Critical Fix For: 0.92.2 Attachments: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits
[ https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-5618: --- Attachment: 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch update heartbeat time as soon as possible and as often as one can. SplitLogManager - prevent unnecessary attempts to resubmits --- Key: HBASE-5618 URL: https://issues.apache.org/jira/browse/HBASE-5618 Project: HBase Issue Type: Improvement Components: wal, zookeeper Reporter: Prakash Khemani Attachments: 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch Currently once a watch fires that the task node has been updated (hearbeated) by the worker, the splitlogmanager still quite some time before it updates the last heard from time. This is because the manager currently schedules another getDataSetWatch() and only after that finishes will it update the task's last heard from time. This leads to a large number of zk-BadVersion warnings when resubmission is continuously attempted and it fails. Two changes should be made (1) On a resubmission failure because of BadVersion the task's lastUpdate time should get upped. (2) The task's lastUpdate time should get upped as soon as the nodeDataChanged() watch fires and without waiting for getDataSetWatch() to complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238893#comment-13238893 ] jirapos...@reviews.apache.org commented on HBASE-5451: -- bq. On 2012-03-24 07:38:03, Benoit Sigoure wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java, line 102 bq. https://reviews.apache.org/r/4096/diff/2/?file=86903#file86903line102 bq. bq. Argh, no, don't change this! I got other HBase devs to promise to not change this as it makes backwards compatible clients impossibly complicated. I see. This was the basis of the graceful failure for current clients that are not aware of PB (clients would bail out if the versions of RPC don't match, right). The response to your comment below I don't see how this is graceful. is actually this change in the version. bq. On 2012-03-24 07:38:03, Benoit Sigoure wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto, line 34 bq. https://reviews.apache.org/r/4096/diff/2/?file=86905#file86905line34 bq. bq. Why keep this oddity of Hadoop RPC? Either rely on TCP keepalive, or add a Ping method to the RPC interface. Note that this is just documentation. Ping is already done in hbase RPC, and I thought I'd document it. I haven't done anything in the PB stuff for handling this. I agree with you this is odd/special-case and IMO a topic for a separate jira. bq. On 2012-03-24 07:38:03, Benoit Sigoure wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto, line 72 bq. https://reviews.apache.org/r/4096/diff/2/?file=86905#file86905line72 bq. bq. Why is this optional? General comment on the optional vs required PB fields... I have made most of the fields as optional since it makes the specification flexible and makes compatibility very easy. Once we are somewhat certain of the PB fields in the RPC we can finalize on the labeling of optional/required on the fields. Does this make sense? bq. On 2012-03-24 07:38:03, Benoit Sigoure wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto, line 71 bq. https://reviews.apache.org/r/4096/diff/2/?file=86905#file86905line71 bq. bq. What's the point of this message? Why not just put the callId in RpcRequestProto and be done with it? The main reason being I wanted to clearly separate what comes from the application and what's put in by the RPC layer. The client would frame a PB object (RpcRequestProto) and send it down to the RPC layer. Currently, the RpcRequestProto is mostly a placeholder with only one field called 'bytes'. Once I implement the ProtoBufRpcEngine (as in Hadoop core) in a follow-up jira, I'll have fields like methodname', 'protocolname', etc. and they would be encoded as RpcRequestProto objects. Similarly, on the response side. bq. On 2012-03-24 07:38:03, Benoit Sigoure wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto, line 25 bq. https://reviews.apache.org/r/4096/diff/2/?file=86905#file86905line25 bq. bq. I don't see how this is graceful. I answered this above. - Devaraj --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4096/#review6302 --- On 2012-03-01 03:40:14, Devaraj Das wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4096/ bq. --- bq. bq. (Updated 2012-03-01 03:40:14) bq. bq. bq. Review request for . bq. bq. bq. Summary bq. --- bq. bq. Switch RPC call envelope/headers to PBs bq. bq. bq. This addresses bug HBASE-5451. bq. https://issues.apache.org/jira/browse/HBASE-5451 bq. bq. bq. Diffs bq. - bq. bq.http://svn.apache.org/repos/asf/hbase/trunk/pom.xml 1294899 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/DataOutputOutputStream.java 1294899 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 1294899 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 1294899 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java 1294899 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4096/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Devaraj bq. bq.
[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238891#comment-13238891 ] Ted Yu commented on HBASE-2600: --- @Alex: Can you attach hbase-2600-root.dir.tgz to this JIRA ? Please briefly describe how you generated the tar ball. Thanks Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid Key: HBASE-2600 URL: https://issues.apache.org/jira/browse/HBASE-2600 Project: HBase Issue Type: Bug Reporter: stack Assignee: Alex Newman Attachments: 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, jenkins.pdf This is an idea that Ryan and I have been kicking around on and off for a while now. If regionnames were made of tablename+endrow instead of tablename+startrow, then in the metatables, doing a search for the region that contains the wanted row, we'd just have to open a scanner using passed row and the first row found by the scan would be that of the region we need (If offlined parent, we'd have to scan to the next row). If we redid the meta tables in this format, we'd be using an access that is natural to hbase, a scan as opposed to the perverse, expensive getClosestRowBefore we currently have that has to walk backward in meta finding a containing region. This issue is about changing the way we name regions. If we were using scans, prewarming client cache would be near costless (as opposed to what we'll currently have to do which is first a getClosestRowBefore and then a scan from the closestrowbefore forward). Converting to the new method, we'd have to run a migration on startup changing the content in meta. Up to this, the randomid component of a region name has been the timestamp of region creation. HBASE-2531 32-bit encoding of regionnames waaay too susceptible to hash clashes proposes changing the randomid so that it contains actual name of the directory in the filesystem that hosts the region. If we had this in place, I think it would help with the migration to this new way of doing the meta because as is, the region name in fs is a hash of regionname... changing the format of the regionname would mean we generate a different hash... so we'd need hbase-2531 to be in place before we could do this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5626) Compactions simulator tool for proofing algorithms
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-5626: --- Attachment: cf_compact.py Attached the current python script that I use to emulate compactions given different params. Compactions simulator tool for proofing algorithms -- Key: HBASE-5626 URL: https://issues.apache.org/jira/browse/HBASE-5626 Project: HBase Issue Type: Task Reporter: stack Priority: Minor Labels: noob Attachments: cf_compact.py A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238907#comment-13238907 ] Nicolas Spiegelberg commented on HBASE-5626: A little more explanation. Basic Concept: We wish to model the amount of compaction IO and file dispersion. The unit of measurement for compactions is a flush. This is because a flush is always 64MB (or whatever you configure) regardless of other properties about the CF/KV. Column families might trigger flushes at different intervals, but they usually flush a consistent amount of data. You can understand the behavior of a compaction algorithm based upon how it behaves over X amount of flushes. Does this test make a lot of assumptions and simplifications? Yes! Inputs: 1. ratio = compaction.ratio between files. (same as the HBase config) 2. min.files = minimum count of files that must be selected for a compaction to occur (same as HBase config) 3. duplication = percentage of KVs within a file that are mutations and will be deduped on compaction (0 = DUPLICATION = 1) 4. iterations = number of flushes to simulate Output: 1. The StoreFile dispersion after every flush (and, possibly, compaction triggered by that flush) 2. The average storefile count over iterations flushes 3. The amount of IO consumed by compactions after those iterations flushes. Compactions simulator tool for proofing algorithms -- Key: HBASE-5626 URL: https://issues.apache.org/jira/browse/HBASE-5626 Project: HBase Issue Type: Task Reporter: stack Priority: Minor Labels: noob Attachments: cf_compact.py A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238908#comment-13238908 ] stack commented on HBASE-5626: -- Nice. Let me take a looksee... Compactions simulator tool for proofing algorithms -- Key: HBASE-5626 URL: https://issues.apache.org/jira/browse/HBASE-5626 Project: HBase Issue Type: Task Reporter: stack Priority: Minor Labels: noob Attachments: cf_compact.py A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238910#comment-13238910 ] Hudson commented on HBASE-5623: --- Integrated in HBase-0.94 #57 (See [https://builds.apache.org/job/HBase-0.94/57/]) HBASE-5623 Race condition when rolling the HLog and hlogFlush (Enis Soztutar and LarsH) (Revision 1305549) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code}
[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-2600: --- Attachment: hbase-2600-root.dir.tgz generate-hbase-2600-root-in-tmp.sh was used to generate this tarball Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid Key: HBASE-2600 URL: https://issues.apache.org/jira/browse/HBASE-2600 Project: HBase Issue Type: Bug Reporter: stack Assignee: Alex Newman Attachments: 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, hbase-2600-root.dir.tgz, jenkins.pdf This is an idea that Ryan and I have been kicking around on and off for a while now. If regionnames were made of tablename+endrow instead of tablename+startrow, then in the metatables, doing a search for the region that contains the wanted row, we'd just have to open a scanner using passed row and the first row found by the scan would be that of the region we need (If offlined parent, we'd have to scan to the next row). If we redid the meta tables in this format, we'd be using an access that is natural to hbase, a scan as opposed to the perverse, expensive getClosestRowBefore we currently have that has to walk backward in meta finding a containing region. This issue is about changing the way we name regions. If we were using scans, prewarming client cache would be near costless (as opposed to what we'll currently have to do which is first a getClosestRowBefore and then a scan from the closestrowbefore forward). Converting to the new method, we'd have to run a migration on startup changing the content in meta. Up to this, the randomid component of a region name has been the timestamp of region creation. HBASE-2531 32-bit encoding of regionnames waaay too susceptible to hash clashes proposes changing the randomid so that it contains actual name of the directory in the filesystem that hosts the region. If we had this in place, I think it would help with the migration to this new way of doing the meta because as is, the region name in fs is a hash of regionname... changing the format of the regionname would mean we generate a different hash... so we'd need hbase-2531 to be in place before we could do this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HBASE-5598: --- Attachment: HBASE-5598.patch Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HBASE-5598.patch There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-5606: -- Attachment: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch Re-attaching Prakash's patch. SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Priority: Critical Fix For: 0.92.2 Attachments: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238924#comment-13238924 ] Uma Maheswara Rao G commented on HBASE-5598: As a initial start I updated the patch with filter file. 1) added filter file 2) excluded generated packges. org.apache.hadoop.hbase.thrift2.generated, org.apache.hadoop.hbase.thrift.generated, org.apache.hadoop.hbase.rest.protobuf.generated This reduces the findbugs count from 772 to 601. {quote} [WARNING] [INFO] [INFO] [INFO] Building HBase 0.95-SNAPSHOT [INFO] [INFO] [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase --- [INFO] Fork Value is true [java] Warnings generated: 601 [INFO] Done FindBugs Analysis [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 3:19.270s [INFO] Finished at: Tue Mar 27 03:56:20 IST 2012 [INFO] Final Memory: 13M/55M [INFO] {quote} note: currently we have already increased the count of findbugs in test-patch.properties. While verifying, I just reverted back to 0 for testing. Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HBASE-5598.patch There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4348) Add metrics for regions in transition
[ https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238923#comment-13238923 ] jirapos...@reviews.apache.org commented on HBASE-4348: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4402/#review6204 --- Himanshu, please address the potential NPE issue. I've added some suggestions to keep names consistent with HBase's conventions. It would be really nice if you could do a test that would exercise some of the new code (test updates don't really seem do it). See TestRpcMetrics, or TestMetricsMBeanBase as possible modes. I won't block committing if this doesn't happen, but it would be nice. :) src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon https://reviews.apache.org/r/4402/#comment13873 let's rename this to be consistent with other Configuration fields. Check out HConstants.java to see the names of quite a few configuration variables to get a general idea of the pattern. My suggestion is something like: 'hbase.metrics.rit.threshold.time' src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java https://reviews.apache.org/r/4402/#comment13857 Either: * javadoc to say this must not be null and add 'Preconditions.assertNotNull(masterMetrics,master metrics should never be null) on initialization * add guards where masterMetrics is deref'ed to see if null. Seems like with your tests, adding the guard 'if !=null' guard to masterMetrics derefs in the next comment is easier. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java https://reviews.apache.org/r/4402/#comment13874 ditto. Actually, since it is used in a few places, you should probably to add this to the HConstants file. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java https://reviews.apache.org/r/4402/#comment13858 master metrics could npe here. src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/4402/#comment13875 nit: spitting? (kind gross) maybe emitting (you use that word below) or publishing? src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/4402/#comment13363 nit: funny spacing src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetrics.java https://reviews.apache.org/r/4402/#comment13876 maybe rename to put rit in front so that it is consistent and will sort nicely? maybe 'ritOldestAge'? - jmhsieh On 2012-03-20 21:44:10, Himanshu Vashishtha wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4402/ bq. --- bq. bq. (Updated 2012-03-20 21:44:10) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This patch is for adding Region in transition metrics to the HMaster metrics system. It also adds these metrics in the master ui, in the Region in transition section. I have attached the proposed new format in the jira 4348. bq. bq. bq. This addresses bug HBase-4348. bq. https://issues.apache.org/jira/browse/HBase-4348 bq. bq. bq. Diffs bq. - bq. bq. src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon 0dc0691 bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java c4b4d30 bq.src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetrics.java 83abc52 bq.src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java d68ce33 bq. bq. Diff: https://reviews.apache.org/r/4402/diff bq. bq. bq. Testing bq. --- bq. bq. Ran on a 5 node cluster and kill region servers randomly to observe the changes in the RIT metrics as emitted out by the Master's mxbean; bq. bq. mvn test passes without any failure. bq. bq. bq. Thanks, bq. bq. Himanshu bq. bq. Add metrics for regions in transition - Key: HBASE-4348 URL: https://issues.apache.org/jira/browse/HBASE-4348 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Himanshu Vashishtha Priority: Minor Labels: noob Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, RITs.png, RegionInTransitions2.png, metrics-v2.patch The following metrics would be
[jira] [Updated] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HBASE-5598: --- Status: Patch Available (was: Open) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HBASE-5598.patch There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238938#comment-13238938 ] Enis Soztutar commented on HBASE-5623: -- Thanks Lars for pushing this. Just as a note, I just tested the final version of the patch on a 4 node cluster with ycsb -threads 100. No problems. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you
[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding
[ https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238948#comment-13238948 ] Matt Corgan commented on HBASE-4676: {quote}Do we not have this now Matt?{quote} I don't believe so. Right now, HFileReaderV2.blockSeek(..) does a brute force loop through each KV in the block and does a full KV comparison at each KV. Say there are 1000 KVs in the block and 20/row, on average you will have to do 500 full KV comparisons just to get to your key, and even if you are selecting all 20 KVs in the row you still do 500 comparisons on average to get to the start of the row. It's also very important to remember that these are long keys and they are most likely to share a common prefix since they got sorted to the same block, so you're comparators are churning the same prefix bytes over and over. {quote}The slow write and scan speeds would tend to rule it out for anything but a random read workload but when random reading only{quote} Yeah, it's not really targeted at long scans on cold data, but there are some cases in between long cold scans and hot point queries. Some factors: If it's warm data your block cache is now 6x bigger. Then there are many scans that are really sub-block prefix scans to get all the keys in a row (the 20 of 1000 cells i mentioned above). If your scan will have a low hit ratio then the ability to jump between rows inside a block will lessen CPU usage. It performs best when doing lots of random Gets on hot data (in block cache). Possibly a persistent memcached alternative, especially with SSDs. I believe the current system is actually limited by CPU when doing high throughput Gets on data with long keys and short values. The trie's individual request latency may not be much different, but a single server could serve many more parallel requests before maxing out cpu. The bigger the value/key ratio of your data the smaller the difference between trie and anything else. Seems like many people now have big values so I doubt they would see a difference. I'm more trying to enable smarter schemas with compound primary keys. {quote}Regards your working set read testing, did it all fit in memory?{quote} yes, i'm trying to do the purest comparison of cpu usage possible. leaving it up to others to extrapolate the results to what happens with the effectively bigger block cache, more rows/block fetched from disk for equivalent size block, etc. i'm currently just standing up a StoreFile and using it directly. see https://github.com/hotpads/hbase-prefix-trie/blob/hcell-scanners/test/org/apache/hadoop/hbase/cell/pt/test/performance/seek/SeekBenchmarkMain.java. i'll try to factor in network and disk latencies in later tests (did some preliminary tests friday but am on vacation this week). {quote}How did you measure cycles per seek?{quote} simple assumption of 2 billion/second. was just trying to emphasize how many cycles a seek currently takes {quote}What is numBlocks? The total number of blocks the dataset fit in?{quote} number of data blocks in the HFile i fed in Prefix Compression - Trie data block encoding - Key: HBASE-4676 URL: https://issues.apache.org/jira/browse/HBASE-4676 Project: HBase Issue Type: New Feature Components: io, performance, regionserver Affects Versions: 0.90.6 Reporter: Matt Corgan Attachments: PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png The HBase data block format has room for 2 significant improvements for applications that have high block cache hit ratios. First, there is no prefix compression, and the current KeyValue format is somewhat metadata heavy, so there can be tremendous memory bloat for many common data layouts, specifically those with long keys and short values. Second, there is no random access to KeyValues inside data blocks. This means that every time you double the datablock size, average seek time (or average cpu consumption) goes up by a factor of 2. The standard 64KB block size is ~10x slower for random seeks than a 4KB block size, but block sizes as small as 4KB cause problems elsewhere. Using block sizes of 256KB or 1MB or more may be more efficient from a disk access and block-cache perspective in many big-data applications, but doing so is infeasible from a random seek perspective. The PrefixTrie block encoding format attempts to solve both of these problems. Some features: * trie format for row key encoding completely eliminates duplicate row keys and encodes similar row keys into a standard trie structure which also saves a lot of space * the column family is currently stored once at the beginning of each block. this could easily be modified
[jira] [Updated] (HBASE-5596) Few minor bugs from HBASE-5209
[ https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5596: -- Resolution: Fixed Fix Version/s: 0.96.0 0.94.0 0.92.2 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Few minor bugs from HBASE-5209 -- Key: HBASE-5596 URL: https://issues.apache.org/jira/browse/HBASE-5596 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.94.0 Reporter: David S. Wang Assignee: David S. Wang Priority: Minor Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5596.patch, hbase-5596-0.94.patch A few leftover bugs from HBASE-5209. Comments are documented here: https://reviews.apache.org/r/3892/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4993) Performance regression in minicluster creation
[ https://issues.apache.org/jira/browse/HBASE-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238962#comment-13238962 ] Jean-Daniel Cryans commented on HBASE-4993: --- It seems this would work better, also reviewing the 0.90/0.92 code it think we should keep the new logic you introduced in this jira (with the fixed code). I opened HBASE-5639 and assigned it to you. Performance regression in minicluster creation -- Key: HBASE-4993 URL: https://issues.apache.org/jira/browse/HBASE-4993 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Fix For: 0.94.0 Attachments: 4993.patch, 4993.v3.patch Side effect of 4610: the mini cluster needs 4,5 seconds to start -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5639) The logic used in waiting for region servers during startup is broken
The logic used in waiting for region servers during startup is broken - Key: HBASE-5639 URL: https://issues.apache.org/jira/browse/HBASE-5639 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: nkeywal Priority: Blocker Fix For: 0.94.0 See the tail of HBASE-4993, which I'll report here: Me: {quote} I think a bug was introduced here. Here's the new waiting logic in waitForRegionServers: the 'hbase.master.wait.on.regionservers.mintostart' is reached AND there have been no new region server in for 'hbase.master.wait.on.regionservers.interval' time And the code that verifies that: !(lastCountChange+interval now count = minToStart) {quote} Nic: {quote} It seems that changing the code to (count minToStart || lastCountChange+interval now) would make the code works as documented. If you have 0 region servers that checked in and you are under the interval, you wait: (true or true) = true. If you have 0 region servers but you are above the interval, you wait: (true or false) = true. If you have 1 or more region servers that checked in and you are under the interval, you wait: (false or true) = true. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238965#comment-13238965 ] Jesse Yates commented on HBASE-5547: Thoughts on how long we should keep around files? Indefinitely? The latter seems a bit excessive, especially if a 'backup mode' ensures you run every X minutes (and exports to another cluster, moves the files to another backup directory). 'Cleanup' in implies you want to remove the file when no one care about the hfiles anymore - thinking maybe a periodic chore on the rs? With snapshots, I was expecting to add an file reference feature - essentially doing impl hardlinks for files we care about keeping around. Was thinking we could add a CP hook and impl that would let you add a checks (config based?) for if you want to keep a reference around for the file being cleaned up. In the backup situation, you would have a timer or (maybe check for a backup completed file/meta row) and see if you had elapsed that time or not; if not, you would add a reference, if so, do nothing and let the file get cleaned up. Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238963#comment-13238963 ] Lars Hofhansl commented on HBASE-5623: -- Good to know :) Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238968#comment-13238968 ] Lars Hofhansl commented on HBASE-5547: -- Was thinking that these files are up grabs unless HBase is in backup mode, maybe by the existence of a specific zNode, although other models are possible as well. The details are a bit tricky, of course. Do we need a full cleanup between backup runs so that we do not confuse the backup files? If not, do we tag with a backup number or a with timestamp (like the HLogs)? If we do HBASE-50 we won't need this one methinks. This might get us to workable solution more quickly. Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238972#comment-13238972 ] jirapos...@reviews.apache.org commented on HBASE-5619: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4054/#review6365 --- Excellent. It looks like we can commit this w/o breaking whats currently there? src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13813 What did we figure on the package name? Shouldn't it agree w/ the dir that holds the .proto files at src/main? Currently one is protobuf and the other is proto. src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13835 What are these two booleans broken out? Aren't they in they attributes of HRI already? Why repeat them? src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13836 Good. Its kinda dumb the way our HRegionInterface is now where it has an override, one that takes single family and another that takes an array. Thanks for collapsing. src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13837 Should we be repeating the API documentation that is up in the HRegionInterface that this .proto replaces here? If its not here, where will it be? Not all of the javadoc should make it over -- the stuff that says nothing shouldn't but some is of worth. What you think? src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13838 Again, thanks for doing the work collapsing the overrides that are up in HRegionInterface. src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13839 Isn't the response currently a void? And isnt' flush async (IIRC). If so, under what circumstance would we be able to fill out this response? src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13842 WALKey maps to HLogKey? Maybe add a comment to this effect? src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13846 Good. I like this method name better. Should be a comment which points back to the old name? Or what you think? src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13847 Yeah, this proto is missing commentary. I mean, the return here should be explained? src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13850 This will for sure grow w/ time. src/main/proto/RegionAdmin.proto https://reviews.apache.org/r/4054/#comment13855 Nice. So you are splitting HRegionInterface into admin and client? Good. src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13862 This is new? Being able to do this? How will it be used? src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13863 This is new feature on get? Or just special handling of an attribute? src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13864 We don't have this in the java Result. I don't understand why this is making its way into the object. src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13865 ditto src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13866 what is processed? src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13867 Why we need to add a region to the Get even optionally? src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13868 Why is the Get polluted by multiGet stuff? src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13869 I thought we could set this into the Get above? Why have it here as separate param? src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13870 Good stuff src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13872 This is a new feature? src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13881 A type rather than have the mutation type specified as a subclass? A mutate is a delete or put? src/main/proto/RegionClient.proto https://reviews.apache.org/r/4054/#comment13882 Again, does it make sense having this in here? I mean regions come and go -- e.g. split -- so you could have reference to non-existent region. This stuff of tying a mutation to a particular region should be done external to the mutations themselves? Is this trying to do multiaction? Maybe that should be a message apart from mutate? The new message would have region name and the mutate?
[jira] [Commented] (HBASE-5639) The logic used in waiting for region servers during startup is broken
[ https://issues.apache.org/jira/browse/HBASE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238973#comment-13238973 ] Jean-Daniel Cryans commented on HBASE-5639: --- Oh I forgot to mention that I'm marking this as a blocker for 0.94.0 because right now if you start a sizable cluster you may end up with region servers that checkin too late and miss the re-assignment of regions. The logic used in waiting for region servers during startup is broken - Key: HBASE-5639 URL: https://issues.apache.org/jira/browse/HBASE-5639 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: nkeywal Priority: Blocker Fix For: 0.94.0 See the tail of HBASE-4993, which I'll report here: Me: {quote} I think a bug was introduced here. Here's the new waiting logic in waitForRegionServers: the 'hbase.master.wait.on.regionservers.mintostart' is reached AND there have been no new region server in for 'hbase.master.wait.on.regionservers.interval' time And the code that verifies that: !(lastCountChange+interval now count = minToStart) {quote} Nic: {quote} It seems that changing the code to (count minToStart || lastCountChange+interval now) would make the code works as documented. If you have 0 region servers that checked in and you are under the interval, you wait: (true or true) = true. If you have 0 region servers but you are above the interval, you wait: (true or false) = true. If you have 1 or more region servers that checked in and you are under the interval, you wait: (false or true) = true. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238977#comment-13238977 ] stack commented on HBASE-5547: -- bq. Thoughts on how long we should keep around files? Indefinitely? Yes. I'd think some other process other than hbase would be responsible for their cleanup. Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238990#comment-13238990 ] Hadoop QA commented on HBASE-2600: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520024/hbase-2600-root.dir.tgz against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1315//console This message is automatically generated. Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid Key: HBASE-2600 URL: https://issues.apache.org/jira/browse/HBASE-2600 Project: HBase Issue Type: Bug Reporter: stack Assignee: Alex Newman Attachments: 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, hbase-2600-root.dir.tgz, jenkins.pdf This is an idea that Ryan and I have been kicking around on and off for a while now. If regionnames were made of tablename+endrow instead of tablename+startrow, then in the metatables, doing a search for the region that contains the wanted row, we'd just have to open a scanner using passed row and the first row found by the scan would be that of the region we need (If offlined parent, we'd have to scan to the next row). If we redid the meta tables in this format, we'd be using an access that is natural to hbase, a scan as opposed to the perverse, expensive getClosestRowBefore we currently have that has to walk backward in meta finding a containing region. This issue is about changing the way we name regions. If we were using scans, prewarming client cache would be near costless (as opposed to what we'll currently have to do which is first a getClosestRowBefore and then a scan from the closestrowbefore forward). Converting to the new method, we'd have to run a migration on startup changing the content in meta. Up to this, the randomid component of a region name has been the timestamp of region creation. HBASE-2531 32-bit encoding of regionnames waaay too susceptible to hash clashes proposes changing the randomid so that it contains actual name of the directory in the filesystem that hosts the region. If we had this in place, I think it would help with the migration to this new way of doing the meta because as is, the region name in fs is a hash of regionname... changing the format of the regionname would mean we generate a different hash... so we'd need hbase-2531 to be in place before we could do this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209
[ https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239002#comment-13239002 ] Hudson commented on HBASE-5596: --- Integrated in HBase-0.94-security #3 (See [https://builds.apache.org/job/HBase-0.94-security/3/]) HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305662) Result = ABORTED jmhsieh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ServerName.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Few minor bugs from HBASE-5209 -- Key: HBASE-5596 URL: https://issues.apache.org/jira/browse/HBASE-5596 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.94.0 Reporter: David S. Wang Assignee: David S. Wang Priority: Minor Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5596.patch, hbase-5596-0.94.patch A few leftover bugs from HBASE-5209. Comments are documented here: https://reviews.apache.org/r/3892/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239000#comment-13239000 ] Hudson commented on HBASE-5533: --- Integrated in HBase-0.94-security #3 (See [https://builds.apache.org/job/HBase-0.94-security/3/]) HBASE-5533 Add more metrics to HBase (Shaneal Manek) (Revision 1305582) Result = ABORTED larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/ExactCounterMetric.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/ExponentiallyDecayingSample.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/MetricsHistogram.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Sample.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Snapshot.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/UniformSample.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Threads.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/metrics/TestExactCounterMetric.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/metrics/TestExponentiallyDecayingSample.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsHistogram.java Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.2, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, histogram_web_ui.png To debug/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239001#comment-13239001 ] Hudson commented on HBASE-5623: --- Integrated in HBase-0.94-security #3 (See [https://builds.apache.org/job/HBase-0.94-security/3/]) HBASE-5623 Race condition when rolling the HLog and hlogFlush (Enis Soztutar and LarsH) (Revision 1305549) Result = ABORTED larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls
[jira] [Commented] (HBASE-5190) Limit the IPC queue size based on calls' payload size
[ https://issues.apache.org/jira/browse/HBASE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238999#comment-13238999 ] Hudson commented on HBASE-5190: --- Integrated in HBase-0.94-security #3 (See [https://builds.apache.org/job/HBase-0.94-security/3/]) HBASE-5190 Limit the IPC queue size based on calls' payload size (Ted's addendum) (Revision 1305469) Result = ABORTED jdcryans : Files : * /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java Limit the IPC queue size based on calls' payload size - Key: HBASE-5190 URL: https://issues.apache.org/jira/browse/HBASE-5190 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.0, 0.96.0 Attachments: 5190.addendum, HBASE-5190-v2.patch, HBASE-5190-v3.patch, HBASE-5190.patch Currently we limit the number of calls in the IPC queue only on their count. It used to be really high and was dropped down recently to num_handlers * 10 (so 100 by default) because it was easy to OOME yourself when huge calls were being queued. It's still possible to hit this problem if you use really big values and/or a lot of handlers, so the idea is that we should take into account the payload size. I can see 3 solutions: - Do the accounting outside of the queue itself for all calls coming in and out and when a call doesn't fit, throw a retryable exception. - Same accounting but instead block the call when it comes in until space is made available. - Add a new parameter for the maximum size (in bytes) of a Call and then set the size the IPC queue (in terms of the number of items) so that it could only contain as many items as some predefined maximum size (in bytes) for the whole queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239003#comment-13239003 ] Hudson commented on HBASE-5615: --- Integrated in HBase-0.94-security #3 (See [https://builds.apache.org/job/HBase-0.94-security/3/]) HBASE-5615 the master never does balance because of balancing the parent region (Xufeng) (Revision 1305172) Result = ABORTED tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup
[ https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239008#comment-13239008 ] Hudson commented on HBASE-5209: --- Integrated in HBase-0.94 #58 (See [https://builds.apache.org/job/HBase-0.94/58/]) HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305662) Result = ABORTED jmhsieh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ServerName.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup Key: HBASE-5209 URL: https://issues.apache.org/jira/browse/HBASE-5209 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.5, 0.92.0, 0.94.0 Reporter: Aditya Acharya Assignee: David S. Wang Fix For: 0.92.1, 0.94.0 Attachments: 5209.addendum, HBASE_5209_v5.diff I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209
[ https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239007#comment-13239007 ] Hudson commented on HBASE-5596: --- Integrated in HBase-0.94 #58 (See [https://builds.apache.org/job/HBase-0.94/58/]) HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305662) Result = ABORTED jmhsieh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ServerName.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Few minor bugs from HBASE-5209 -- Key: HBASE-5596 URL: https://issues.apache.org/jira/browse/HBASE-5596 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.94.0 Reporter: David S. Wang Assignee: David S. Wang Priority: Minor Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5596.patch, hbase-5596-0.94.patch A few leftover bugs from HBASE-5209. Comments are documented here: https://reviews.apache.org/r/3892/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira