[jira] [Commented] (HBASE-10169) Batch coprocessor
[ https://issues.apache.org/jira/browse/HBASE-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848909#comment-13848909 ] Hadoop QA commented on HBASE-10169: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618869/Batch%20Coprocessor%20Design%20Document.docx against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8175//console This message is automatically generated. Batch coprocessor - Key: HBASE-10169 URL: https://issues.apache.org/jira/browse/HBASE-10169 Project: HBase Issue Type: Sub-task Components: Coprocessors Affects Versions: 0.99.0 Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: Batch Coprocessor Design Document.docx, HBASE-10169.patch This is designed to improve the coprocessor invocation in the client side. Currently the coprocessor invocation is to send a call to each region. If there’s one region server, and 100 regions are located in this server, each coprocessor invocation will send 100 calls, each call uses a single thread in the client side. The threads will run out soon when the coprocessor invocations are heavy. In this design, all the calls to the same region server will be grouped into one in a single coprocessor invocation. This call will be spread into each region in the server side, and the results will be merged ahead in the server side before being returned to the client. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions
[ https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849025#comment-13849025 ] ramkrishna.s.vasudevan commented on HBASE-10137: As this is fixed in Trunk this change should be fine. +1 for being only on trunk. One quesiton is bq.In case of any failures retrying assignments and wait in GeneralBulkAssigner#waitUntilDone. Yes this would happen now. But after disabled had happened before enabling, if a REgion server goes down and going with retainAssignment? how would this work? META may have the dead RS entry but the servermanager won't be giving the old ones. Will this be handled here? GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions Key: HBASE-10137 URL: https://issues.apache.org/jira/browse/HBASE-10137 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.0, 0.94.14 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10137.patch Current in BulkEnabler we are assigning one region at a time, instead we can use GeneralBulkAssigner to bulk assign multiple regions at a time. {code} for (HRegionInfo region : regions) { if (assignmentManager.getRegionStates() .isRegionInTransition(region)) { continue; } final HRegionInfo hri = region; pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() { public void run() { assignmentManager.assign(hri, true); } })); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions
[ https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849040#comment-13849040 ] rajeshbabu commented on HBASE-10137: @Ram, bq. But after disabled had happened before enabling, if a REgion server goes down and going with retainAssignment? how would this work? META may have the dead RS entry but the servermanager won't be giving the old ones. Will this be handled here? If a region server is dead and no new RS started in the same host, then randomly one server from online servers(list from server manager) will be selected for the region. Its fine only. {code} if (localServers.isEmpty()) { // No servers on the new cluster match up with this hostname, // assign randomly. ServerName randomServer = servers.get(RANDOM.nextInt(servers.size())); assignments.get(randomServer).add(region); numRandomAssignments++; {code} GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions Key: HBASE-10137 URL: https://issues.apache.org/jira/browse/HBASE-10137 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.0, 0.94.14 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10137.patch Current in BulkEnabler we are assigning one region at a time, instead we can use GeneralBulkAssigner to bulk assign multiple regions at a time. {code} for (HRegionInfo region : regions) { if (assignmentManager.getRegionStates() .isRegionInTransition(region)) { continue; } final HRegionInfo hri = region; pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() { public void run() { assignmentManager.assign(hri, true); } })); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions
[ https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849045#comment-13849045 ] ramkrishna.s.vasudevan commented on HBASE-10137: +1 on commit then. [~jxiang] Want to take a look before commit? GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions Key: HBASE-10137 URL: https://issues.apache.org/jira/browse/HBASE-10137 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.0, 0.94.14 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10137.patch Current in BulkEnabler we are assigning one region at a time, instead we can use GeneralBulkAssigner to bulk assign multiple regions at a time. {code} for (HRegionInfo region : regions) { if (assignmentManager.getRegionStates() .isRegionInTransition(region)) { continue; } final HRegionInfo hri = region; pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() { public void run() { assignmentManager.assign(hri, true); } })); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: HBase upgrade error
The exception happened in protobuf Can you upgrade hadoop to release 2.2 ? Cheers On Dec 15, 2013, at 11:58 PM, hzwangxx whwangxingx...@163.com wrote: Hi, I want to upgrade hbase from 0.94 to 0.96. According to http://hbase.apache.org/book/upgrade0.96.html. I shutdown 0.94 cluster, use bin/hbase upgrade -check, I encountered a problem as follows: 13/12/16 15:43:08 INFO util.HFileV1Detector: Target dir is: hdfs://node01:9000/hbase 13/12/16 15:43:08 ERROR util.HFileV1Detector: java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses. 13/12/16 15:43:08 WARN migration.UpgradeTo96: There are some HFileV1, or corrupt files (files with incorrect major version). and use bin/hbase upgrade -execute can list the trackers: 13/12/16 15:45:56 INFO zookeeper.ClientCnxn: EventThread shut down 13/12/16 15:45:56 INFO zookeeper.ZooKeeper: Session: 0x142f975e49e0003 closed 13/12/16 15:45:56 INFO migration.UpgradeTo96: Starting Namespace upgrade 13/12/16 15:45:57 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS Exception in thread main java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses. at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetListingRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:18005) at com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:193) at sun.proxy.$Proxy10.getListing(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at sun.proxy.$Proxy10.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:440) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1526) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1509) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:407) at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:473) at org.apache.hadoop.hbase.migration.NamespaceUpgrade.verifyNSUpgrade(NamespaceUpgrade.java:488) at org.apache.hadoop.hbase.migration.NamespaceUpgrade.upgradeTableDirs(NamespaceUpgrade.java:127) at org.apache.hadoop.hbase.migration.NamespaceUpgrade.run(NamespaceUpgrade.java:502) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.migration.UpgradeTo96.executeTool(UpgradeTo96.java:222) at org.apache.hadoop.hbase.migration.UpgradeTo96.executeUpgrade(UpgradeTo96.java:212) at org.apache.hadoop.hbase.migration.UpgradeTo96.run(UpgradeTo96.java:134) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.migration.UpgradeTo96.main(UpgradeTo96.java:258) my hadoop version is: hadoop-2.0.0-cdh4.2.1 and hbase version is 0.94.0-cdh4.2.1. Waiting for your help. Thanks~ Best Wishes! hzwangxx
[jira] [Commented] (HBASE-10087) Store should be locked during a memstore snapshot
[ https://issues.apache.org/jira/browse/HBASE-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849215#comment-13849215 ] Nicolas Liochon commented on HBASE-10087: - Sorry for the delay. Theoretically, we have an issue: {code} @Override public void prepare() { memstore.snapshot(); this.snapshot = memstore.getSnapshot(); this.snapshotTimeRangeTracker = memstore.getSnapshotTimeRangeTracker(); } {code} w/o a lock, we could have an inconsistency between the snapshot and the snapshotTimeRangeTracker. As well, we could have an inconsistency in memstore.snapshot(); But it seems it cannot happen, because: Store#snaphot is called only in the tests. Store#prepare is called with the write lock on the HRegion. The functions that will modify the store also have the read lock on HRegion, so we can't be inconsistent in practice. This said, adding the lock in prepare() would be more consistent imho. I would propose to add it on trunk. Store should be locked during a memstore snapshot - Key: HBASE-10087 URL: https://issues.apache.org/jira/browse/HBASE-10087 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.96.1, 0.94.14 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 10079.v1.patch regression from HBASE-9963, found while looking at HBASE-10079. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HBASE-10172) hbase upgrade -check error
[ https://issues.apache.org/jira/browse/HBASE-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-10172. --- Resolution: Won't Fix What [~himan...@cloudera.com] says. Check your protobuf versions in your classpath. It looks like the first pb lib encountered is not 2.5.x as we expect (perhaps you have old pb lib still in your cp?) hbase upgrade -check error -- Key: HBASE-10172 URL: https://issues.apache.org/jira/browse/HBASE-10172 Project: HBase Issue Type: Bug Reporter: wang xiyi Hi, I want to upgrade hbase from 0.94 to 0.96. According to http://hbase.apache.org/book/upgrade0.96.html, I encountered a problem as follows: {code} Exception in thread main java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses. at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetListingRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:18005) . {code} hadoop version: hadoop-2.0.0-cdh4.2.1 hbase version:hbase-0.94.0-cdh4.2.1 Thanks~ -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10174: --- Attachment: 9667-0.94.patch Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94 - Key: HBASE-10174 URL: https://issues.apache.org/jira/browse/HBASE-10174 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 9667-0.94.patch On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported NoClassDefFoundError when he used Guava 15. The issue has been fixed in 0.96 + by HBASE-9667 This JIRA ports the fix to 0.94 branch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
Ted Yu created HBASE-10174: -- Summary: Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94 Key: HBASE-10174 URL: https://issues.apache.org/jira/browse/HBASE-10174 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 9667-0.94.patch On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported NoClassDefFoundError when he used Guava 15. The issue has been fixed in 0.96 + by HBASE-9667 This JIRA ports the fix to 0.94 branch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions
[ https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849334#comment-13849334 ] Jimmy Xiang commented on HBASE-10137: - I was thinking about the same too, just haven't get a chance to do it. Good stuff. +1. Should we remove class BulkEnabler if not used any more? GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions Key: HBASE-10137 URL: https://issues.apache.org/jira/browse/HBASE-10137 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.0, 0.94.14 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10137.patch Current in BulkEnabler we are assigning one region at a time, instead we can use GeneralBulkAssigner to bulk assign multiple regions at a time. {code} for (HRegionInfo region : regions) { if (assignmentManager.getRegionStates() .isRegionInTransition(region)) { continue; } final HRegionInfo hri = region; pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() { public void run() { assignmentManager.assign(hri, true); } })); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10163) Example Thrift DemoClient is flaky
[ https://issues.apache.org/jira/browse/HBASE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849369#comment-13849369 ] Sergey Shelukhin commented on HBASE-10163: -- Maybe add explicit timestamps? Example Thrift DemoClient is flaky -- Key: HBASE-10163 URL: https://issues.apache.org/jira/browse/HBASE-10163 Project: HBase Issue Type: Bug Components: Thrift Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Trivial Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-10163_v1.patch DemoClient for thrift under hbase-examples is failing sometimes because of earlier delete eclipses future puts with same timestamp. {code} row: 6, cols: unused: = DELETE_ME; row: 6, cols: entry:num = -1; row: 6, cols: entry:num = 6; entry:sqr = 36; row: 6, cols: entry:num = 6; entry:sqr = 36; row: 6, values: 6; -1; row: 5, cols: unused: = DELETE_ME; row: 5, values: FATAL: wrong # of versions {code} Similar to the one we have, we can add another small sleep between delete and subsequent put. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10138) incorrect or confusing test value is used in block caches
[ https://issues.apache.org/jira/browse/HBASE-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849384#comment-13849384 ] Sergey Shelukhin commented on HBASE-10138: -- [~lhofhansl] [~ndimiduk] there's no clear component owner for this, you guys want to review? incorrect or confusing test value is used in block caches - Key: HBASE-10138 URL: https://issues.apache.org/jira/browse/HBASE-10138 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10138.patch DEFAULT_BLOCKSIZE_SMALL is described as: {code} // Make default block size for StoreFiles 8k while testing. TODO: FIX! // Need to make it 8k for testing. public static final int DEFAULT_BLOCKSIZE_SMALL = 8 * 1024; {code} This value is used on production path in CacheConfig thru HStore/HRegion, and passed to various cache object. We should change it to actual block size, or if it is somehow by design at least we should clarify it and remove the comment. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions
[ https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849386#comment-13849386 ] rajeshbabu commented on HBASE-10137: Thanks [~jxiang] for review. With the patch it wont be used any more. I will remove it and upload new patch. GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions Key: HBASE-10137 URL: https://issues.apache.org/jira/browse/HBASE-10137 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.0, 0.94.14 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10137.patch Current in BulkEnabler we are assigning one region at a time, instead we can use GeneralBulkAssigner to bulk assign multiple regions at a time. {code} for (HRegionInfo region : regions) { if (assignmentManager.getRegionStates() .isRegionInTransition(region)) { continue; } final HRegionInfo hri = region; pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() { public void run() { assignmentManager.assign(hri, true); } })); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849395#comment-13849395 ] Sergey Shelukhin commented on HBASE-10136: -- Agree with Jon, it looks like the implementation detail to me. Alter table conflicts with concurrent snapshot attempt on that table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849396#comment-13849396 ] Sergey Shelukhin commented on HBASE-5487: - That's an interesting one. Given that snapshots by default have no guarantees wrt consistent writes between regions (or do they), seems like snapshot should get the latest schema in case of concurrent alter. Is there any consideration (other the arguably implementation issues of not recovering from close-open) that would prevent that? For consistent snapshots presumably the schema can be snapshotted first, I am assuming they don't stop the world and just take seqId/mvcc/ts or something, so the newer values with new schema will just not exist. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849403#comment-13849403 ] Jonathan Hsieh commented on HBASE-5487: --- The problem isn't that you would get snapshots with inconsistent schemas if the two operations were issued concurrently. It is that open is async and outside the table write lock which means the snapshot would fail because the region may no have been open. This is a particular case where we would want the open routines to act synchronously with table alters and split daugher region opens (both open before table lock released and snapshot can happen). Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10137) GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions
[ https://issues.apache.org/jira/browse/HBASE-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849406#comment-13849406 ] stack commented on HBASE-10137: --- Should we rename GeneralBulkEnable as BulkEnabler rather than just remove BulkEnabler (if a 'GeneralBulkEnabler', folks will go looking for a SpecialBulkEnablers, or SergeantBulkEnabler...)? GeneralBulkAssigner with retain assignment plan can be used in EnableTableHandler to bulk assign the regions Key: HBASE-10137 URL: https://issues.apache.org/jira/browse/HBASE-10137 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.0, 0.94.14 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10137.patch Current in BulkEnabler we are assigning one region at a time, instead we can use GeneralBulkAssigner to bulk assign multiple regions at a time. {code} for (HRegionInfo region : regions) { if (assignmentManager.getRegionStates() .isRegionInTransition(region)) { continue; } final HRegionInfo hri = region; pool.execute(Trace.wrap(BulkEnabler.populatePool,new Runnable() { public void run() { assignmentManager.assign(hri, true); } })); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849407#comment-13849407 ] Matteo Bertozzi commented on HBASE-10136: - as I pointed out in my comment above we can fix case by case, but the main problems are still there. If you implement a new handler you have to keep in mind the rules to make everything working. In this case for example, the end of the handler is not the hand of the operation so the lock is released early. also in this case, the master call uses handler.process() instead of the executor pool to make the client operation synchronous. In the delete table case the last operation must be the removal of the table descriptor, otherwise the client call will not be synchronous. ...and so on with other, implementation details. I've pointed out the new master design, to discuss this set of rules and be part of the design. We must be able to know when an operation end, and not just guessing based on the result state of an operation. And this is a must for both server side (e.g. releasing the lock) and client side (e.g. sync operation) Alter table conflicts with concurrent snapshot attempt on that table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at
[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849412#comment-13849412 ] Jonathan Hsieh commented on HBASE-10136: I agree with needing rules -- the invariant I think we need here is that if an operation starts with a region in open state and is supposed to complete with the regions in open state, it must be open. (or a suitable replacement must be open). Currently I only see open/close/open conflicts. (splits/alters, likely merges). can we get away with just fixing those three operations so that their respective table locks are held until the opens complete? Is the wait until handler completion needed for any other operations? Alter table conflicts with concurrent snapshot attempt on that table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at
[jira] [Commented] (HBASE-9927) ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849413#comment-13849413 ] Hudson commented on HBASE-9927: --- SUCCESS: Integrated in HBase-0.94 #1228 (See [https://builds.apache.org/job/HBase-0.94/1228/]) HBASE-9927 ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() unnecessarily (tedyu: rev 1551273) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() unnecessarily -- Key: HBASE-9927 URL: https://issues.apache.org/jira/browse/HBASE-9927 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.94.15 Attachments: 9927.txt When inspecting log, I found the following: {code} 2013-11-08 18:23:48,472 ERROR [M:0;kiyo:42380.oldLogCleaner] client.HConnectionManager(468): Connection not found in the list, can't delete it (connection key=HConnectionKey{properties={hbase.rpc.timeout=6, hbase.zookeeper.property.clientPort=59832, hbase.client.pause=100, zookeeper.znode.parent=/hbase, hbase.client.retries.number=350, hbase.zookeeper.quorum=localhost}, username='zy'}). May be the key was modified? java.lang.Exception at org.apache.hadoop.hbase.client.HConnectionManager.deleteConnection(HConnectionManager.java:468) at org.apache.hadoop.hbase.client.HConnectionManager.deleteConnection(HConnectionManager.java:404) at org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.stop(ReplicationLogCleaner.java:141) at org.apache.hadoop.hbase.master.cleaner.CleanerChore.cleanup(CleanerChore.java:276) {code} The call to HConnectionManager#deleteConnection() is not needed. Here is related code which has a comment for this effect: {code} // Not sure why we're deleting a connection that we never acquired or used HConnectionManager.deleteConnection(this.getConf()); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits
[ https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-8701: - Attachment: hbase-8701-tag.patch Port the old patch to use tag. 1) During KeyValue comparison after rowkey, colfam/qual, timestamp, type are compared(same key), log sequence number will be used if there is any otherwise mvcc will be used. So the extra cost to fetch the log sequence number from tag is trivial because comparison normally terminates much earlier before reaching log sequence number or mvcc. 2) Only edits being replayed during recovery will be tagged with their own original log sequence number therefore the extra tag storage overhead is insignificant considering recovery doesn't happen often and tagging is only applied upon unflushed edits. distributedLogReplay need to apply wal edits in the receiving order of those edits -- Key: HBASE-8701 URL: https://issues.apache.org/jira/browse/HBASE-8701 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0 Attachments: 8701-v3.txt, hbase-8701-tag.patch, hbase-8701-v4.patch, hbase-8701-v5.patch, hbase-8701-v6.patch, hbase-8701-v7.patch, hbase-8701-v8.patch This issue happens in distributedLogReplay mode when recovering multiple puts of the same key + version(timestamp). After replay, the value is nondeterministic of the key h5. The original concern situation raised from [~eclark]: For all edits the rowkey is the same. There's a log with: [ A (ts = 0), B (ts = 0) ] Replay the first half of the log. A user puts in C (ts = 0) Memstore has to flush A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid. Replay the rest of the Log. Flush The issue will happen in similar situation like Put(key, t=T) in WAL1 and Put(key,t=T) in WAL2 h5. Below is the option(proposed by Ted) I'd like to use: a) During replay, we pass original wal sequence number of each edit to the receiving RS b) In receiving RS, we store negative original sequence number of wal edits into mvcc field of KVs of wal edits c) Add handling of negative MVCC in KVScannerComparator and KVComparator d) In receiving RS, write original sequence number into an optional field of wal file for chained RS failure situation e) When opening a region, we add a safety bumper(a large number) in order for the new sequence number of a newly opened region not to collide with old sequence numbers. In the future, when we stores sequence number along with KVs, we can adjust the above solution a little bit by avoiding to overload MVCC field. h5. The other alternative options are listed below for references: Option one a) disallow writes during recovery b) during replay, we pass original wal sequence ids c) hold flush till all wals of a recovering region are replayed. Memstore should hold because we only recover unflushed wal edits. For edits with same key + version, whichever with larger sequence Id wins. Option two a) During replay, we pass original wal sequence ids b) for each wal edit, we store each edit's original sequence id along with its key. c) during scanning, we use the original sequence id if it's present otherwise its store file sequence Id d) compaction can just leave put with max sequence id Please let me know if you have better ideas. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10174: --- Fix Version/s: 0.94.15 Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94 - Key: HBASE-10174 URL: https://issues.apache.org/jira/browse/HBASE-10174 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.94.15 Attachments: 9667-0.94.patch On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported NoClassDefFoundError when he used Guava 15. The issue has been fixed in 0.96 + by HBASE-9667 This JIRA ports the fix to 0.94 branch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849431#comment-13849431 ] Ted Yu commented on HBASE-10174: Test suite passed with patch: {code} [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 1:14:33.284s [INFO] Finished at: Mon Dec 16 18:23:47 UTC 2013 [INFO] Final Memory: 36M/642M {code} Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94 - Key: HBASE-10174 URL: https://issues.apache.org/jira/browse/HBASE-10174 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.94.15 Attachments: 9667-0.94.patch On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported NoClassDefFoundError when he used Guava 15. The issue has been fixed in 0.96 + by HBASE-9667 This JIRA ports the fix to 0.94 branch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10142) TestLogRolling#testLogRollOnDatanodeDeath test failure
[ https://issues.apache.org/jira/browse/HBASE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849429#comment-13849429 ] Andrew Purtell commented on HBASE-10142: This doesn't manifest on reasonably endowed VMs. However, we see two instances of different problems with this test on builds.apache.org. From the Hadoop 2 based build https://builds.apache.org/job/hbase-0.98/14/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRolling/testLogRollOnDatanodeDeath/: {noformat} java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.getLogReplication(FSHLog.java:1391) at org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:400) [...] Caused by: java.nio.channels.ClosedChannelException at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1317) at org.apache.hadoop.hdfs.DFSOutputStream.getCurrentBlockReplication(DFSOutputStream.java:1762) at org.apache.hadoop.hdfs.DFSOutputStream.getNumCurrentReplicas(DFSOutputStream.java:1751) {noformat} And from the Hadoop 1 based build https://builds.apache.org/job/hbase-0.98-on-hadoop-1.1/10/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRolling/testLogRollOnDatanodeDeath/: {noformat} java.lang.AssertionError: Missing datanode should've triggered a log roll at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:388) {noformat} So I am strongly inclined to disable this test as a flapper. TestLogRolling#testLogRollOnDatanodeDeath test failure -- Key: HBASE-10142 URL: https://issues.apache.org/jira/browse/HBASE-10142 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Fix For: 0.98.0 This is a demanding unit test, which fails fairly often as software versions (JVM, Hadoop) and system load change. Currently when testing 0.98 branch I see this failure: {noformat} Failed tests: testLogRollOnDatanodeDeath(org.apache.hadoop.hbase.regionserver.wal.TestLogRolling): LowReplication Roller should've been disabled, current replication=1 {noformat} Could be a timing issue after the recent switch to Hadoop 2 as default build/test profile. Let's see if more leniency makes sense and if it can stabilize the test before disabling it. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10142) TestLogRolling#testLogRollOnDatanodeDeath test failure
[ https://issues.apache.org/jira/browse/HBASE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849435#comment-13849435 ] Andrew Purtell commented on HBASE-10142: Could not reproduce on a m3.4xlarge build box running Amazon Linux (RHEL derived) and Oracle JDK 7u21 after 100 iterations. TestLogRolling#testLogRollOnDatanodeDeath test failure -- Key: HBASE-10142 URL: https://issues.apache.org/jira/browse/HBASE-10142 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Fix For: 0.98.0 This is a demanding unit test, which fails fairly often as software versions (JVM, Hadoop) and system load change. Currently when testing 0.98 branch I see this failure: {noformat} Failed tests: testLogRollOnDatanodeDeath(org.apache.hadoop.hbase.regionserver.wal.TestLogRolling): LowReplication Roller should've been disabled, current replication=1 {noformat} Could be a timing issue after the recent switch to Hadoop 2 as default build/test profile. Let's see if more leniency makes sense and if it can stabilize the test before disabling it. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10087) Store should be locked during a memstore snapshot
[ https://issues.apache.org/jira/browse/HBASE-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10087: -- Fix Version/s: (was: 0.96.1) 0.99.0 Store should be locked during a memstore snapshot - Key: HBASE-10087 URL: https://issues.apache.org/jira/browse/HBASE-10087 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0, 0.96.1, 0.94.14 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.98.0, 0.94.15, 0.99.0 Attachments: 10079.v1.patch regression from HBASE-9963, found while looking at HBASE-10079. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9942) hbase Scanner specifications accepting wrong specifier and then after scan using correct specifier returning unexpected result
[ https://issues.apache.org/jira/browse/HBASE-9942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9942: - Fix Version/s: (was: 0.96.1) 0.99.0 hbase Scanner specifications accepting wrong specifier and then after scan using correct specifier returning unexpected result --- Key: HBASE-9942 URL: https://issues.apache.org/jira/browse/HBASE-9942 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.13 Reporter: Deepak Sharma Priority: Minor Fix For: 0.99.0 check the given scenerio: 1. log in to hbase client -- ./hbase shell 2. created table 'tab1' hbase(main):001:0 create 'tab1' , 'fa1' 3. put some 10 rows (row1 to row10) in table 'tab1' 4. run the scan for table 'tab1' as follows: hbase(main):013:0 scan 'tab1' , { STARTROW = 'row4' , STOPROW = 'row9' } ROW COLUMN+CELL row4 column=fa1:col1, timestamp=1384164182738, value=value1 row5 column=fa1:col1, timestamp=1384164188396, value=value1 row6 column=fa1:col1, timestamp=1384164192395, value=value1 row7 column=fa1:col1, timestamp=1384164197693, value=value1 row8 column=fa1:col1, timestamp=1384164203237, value=value1 5 row(s) in 0.0540 seconds so result was expected , rows from 'row4' to 'row8' are displayed 5. then run the scan using wrong specifier ( '=' instead of '=') so get wrong result hbase(main):014:0 scan 'tab1' , { STARTROW = 'row4' , STOPROW = 'row9' } ROW COLUMN+CELL row1 column=fa1:col1, timestamp=1384164167838, value=value1 row10column=fa1:col1, timestamp=1384164212615, value=value1 row2 column=fa1:col1, timestamp=1384164175337, value=value1 row3 column=fa1:col1, timestamp=1384164179068, value=value1 row4 column=fa1:col1, timestamp=1384164182738, value=value1 row5 column=fa1:col1, timestamp=1384164188396, value=value1 row6 column=fa1:col1, timestamp=1384164192395, value=value1 row7 column=fa1:col1, timestamp=1384164197693, value=value1 row8 column=fa1:col1, timestamp=1384164203237, value=value1 row9 column=fa1:col1, timestamp=1384164208375, value=value1 10 row(s) in 0.0390 seconds 6. now performed correct scan query with correct specifier ( used '=' as specifier) hbase(main):015:0 scan 'tab1' , { STARTROW = 'row4' , STOPROW = 'row9' } ROW COLUMN+CELL row1 column=fa1:col1, timestamp=1384164167838, value=value1 row10column=fa1:col1, timestamp=1384164212615, value=value1 row2
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9969: - Fix Version/s: (was: 0.96.1) 0.99.0 Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.1, 0.99.0 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10018) Change the location prefetch
[ https://issues.apache.org/jira/browse/HBASE-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10018: -- Fix Version/s: (was: 0.96.1) 0.99.0 Change the location prefetch Key: HBASE-10018 URL: https://issues.apache.org/jira/browse/HBASE-10018 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 0.98.0, 0.99.0 Issues with prefetching are: - we do two calls to meta: one for the exact row, one for the prefetch - it's done in a lock - we take the next 10 regions. Why 10, why the 10 next? - is it useful if the table has 100K regions? Options are: - just remove it - replace it with a reverse scan: this would save a call -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8329) Limit compaction speed
[ https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8329: - Fix Version/s: (was: 0.96.1) 0.99.0 Limit compaction speed -- Key: HBASE-8329 URL: https://issues.apache.org/jira/browse/HBASE-8329 Project: HBase Issue Type: Improvement Components: Compaction Reporter: binlijin Assignee: binlijin Fix For: 0.99.0 Attachments: HBASE-8329-2-trunk.patch, HBASE-8329-3-trunk.patch, HBASE-8329-4-trunk.patch, HBASE-8329-5-trunk.patch, HBASE-8329-6-trunk.patch, HBASE-8329-7-trunk.patch, HBASE-8329-8-trunk.patch, HBASE-8329-trunk.patch There is no speed or resource limit for compaction,I think we should add this feature especially when request burst. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9151) HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split.
[ https://issues.apache.org/jira/browse/HBASE-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9151: - Fix Version/s: (was: 0.96.1) 0.99.0 HBCK cannot fix when meta server znode deleted, this can happen if all region servers stopped and there are no logs to split. - Key: HBASE-9151 URL: https://issues.apache.org/jira/browse/HBASE-9151 Project: HBase Issue Type: Bug Components: hbck Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.99.0 When meta server znode deleted and meta in FAILED_OPEN state, then hbck cannot fix it. This scenario can come when all region servers stopped by stop command and didnt start any RS within 10 secs(with default configurations). {code} public void assignMeta() throws KeeperException { MetaRegionTracker.deleteMetaLocation(this.watcher); assign(HRegionInfo.FIRST_META_REGIONINFO, true); } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8859) truncate_preserve should get table split keys as it is instead of converting them to string type and then again to bytes
[ https://issues.apache.org/jira/browse/HBASE-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8859: - Fix Version/s: (was: 0.96.1) 0.99.0 truncate_preserve should get table split keys as it is instead of converting them to string type and then again to bytes Key: HBASE-8859 URL: https://issues.apache.org/jira/browse/HBASE-8859 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.95.1 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0, 0.99.0 Attachments: HBASE-8859-Test_to_reproduce.patch, HBASE-8859_trunk.patch, HBASE-8859_trunk_2.patch If we take int,long or double bytes as split keys then we are not creating table with same split keys because converting them to strings directly and to bytes is giving different split keys, sometimes getting IllegalArgument exception because of same split keys(converted). Instead we can get split keys directly from HTable and pass them while creating table. {code} h_table = org.apache.hadoop.hbase.client.HTable.new(conf, table_name) splits = h_table.getRegionLocations().keys().map{|i| i.getStartKey} :byte splits = org.apache.hadoop.hbase.util.Bytes.toByteArrays(splits) {code} {code} Truncating 'emp3' table (it may take a while): - Disabling table... - Dropping table... - Creating table with region boundaries... ERROR: java.lang.IllegalArgumentException: All split keys must be unique, found duplicate: B\x11S\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\x00\x00, B\x11S\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\x00\x00 {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8803: - Fix Version/s: (was: 0.96.1) 0.99.0 region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Fix For: 0.94.15, 0.99.0 Attachments: 8803v5.txt, HBASE-8803-v0-trunk.patch, HBASE-8803-v1-0.94.patch, HBASE-8803-v1-trunk.patch, HBASE-8803-v2-0.94.patch, HBASE-8803-v2-0.94.patch, HBASE-8803-v3-0.94.patch, HBASE-8803-v4-0.94.patch, HBASE-8803-v4-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9738) Delete table and loadbalancer interference
[ https://issues.apache.org/jira/browse/HBASE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9738: - Fix Version/s: (was: 0.96.1) 0.99.0 Delete table and loadbalancer interference -- Key: HBASE-9738 URL: https://issues.apache.org/jira/browse/HBASE-9738 Project: HBase Issue Type: Bug Reporter: Devaraj Das Priority: Critical Fix For: 0.99.0 I have noticed that when the balancer is computing a plan for region moves, and a delete table is issued, there is some interference. 1. At time t1, user deleted the table. 2. This led to the master updating the meta table to remove the line for the regioninfo for a region f2a9e2e9d70894c03f54ee5902bebee6. {noformat} 2013-10-04 08:42:52,495 INFO [MASTER_TABLE_OPERATIONS-hor15n05:6-0] catalog.MetaEditor: Deleted [{ENCODED = f2a9e2e9d70894c03f54ee5902bebee6, NAME = 'usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6.', STARTKEY = '', ENDKEY = ''}] {noformat} 3. However around the same time, the balancer kicked in, and reassigned the region and made it online somewhere. It didn't check the fact (nor anyone else did) that the table was indeed deleted. {noformat} 2013-10-04 08:42:53,215 INFO [hor15n05.gq1.ygridcore.net,6,1380869262259-BalancerChore] master.HMaster: balance hri=usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6., src=hor15n09.gq1.ygridcore.net,60020,1380869263722, dest=hor15n11.gq1.ygridcore.net,60020,1380869263682 {noformat} . {noformat} 2013-10-04 08:42:53,592 INFO [AM.ZK.Worker-pool2-t829] master.RegionStates: Onlined f2a9e2e9d70894c03f54ee5902bebee6 on hor15n11.gq1.ygridcore.net,60020,1380869263682 {noformat} 4. Henceforth, all the drop tables started giving warnings like {noformat} 2013-10-04 08:45:17,587 INFO [RpcServer.handler=8,port=6] master.HMaster: Client=hrt_qa//68.142.246.151 delete usertable 2013-10-04 08:45:17,631 DEBUG [RpcServer.handler=8,port=6] lock.ZKInterProcessLockBase: Acquired a lock for /hbase/table-lock/usertable/write-master:600 2013-10-04 08:45:17,637 WARN [RpcServer.handler=8,port=6] catalog.MetaReader: No serialized HRegionInfo in keyvalues={usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6./info:seqnumDuringOpen/1380876173509/Put/vlen=8/mvcc=0, usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6./info:server/1380876173509/Put/vlen=32/mvcc=0, usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6./info:serverstartcode/1380876173509/Put/vlen=8/mvcc=0} {noformat} 5. The create of the same table also fails since there is still state (reincarnated, maybe) about the table in the master. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-5583) Master restart on create table with splitkeys does not recreate table with all the splitkey regions
[ https://issues.apache.org/jira/browse/HBASE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5583: - Fix Version/s: (was: 0.96.1) 0.99.0 Master restart on create table with splitkeys does not recreate table with all the splitkey regions --- Key: HBASE-5583 URL: https://issues.apache.org/jira/browse/HBASE-5583 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.99.0 Attachments: HBASE-5583_new_1.patch, HBASE-5583_new_1_review.patch, HBASE-5583_new_2.patch, HBASE-5583_new_4_WIP.patch, HBASE-5583_new_5_WIP_using_tableznode.patch - Create table using splitkeys - MAster goes down before all regions are added to meta - On master restart the table is again enabled but with less number of regions than specified in splitkeys Anyway client will get an exception if i had called sync create table. But table exists or not check will say table exists. Is this scenario to be handled by client only or can we have some mechanism on the master side for this? Pls suggest. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8495) Change ownership of the directory to bulk load
[ https://issues.apache.org/jira/browse/HBASE-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8495: - Fix Version/s: (was: 0.96.1) 0.99.0 Change ownership of the directory to bulk load -- Key: HBASE-8495 URL: https://issues.apache.org/jira/browse/HBASE-8495 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.94.7, 0.95.0 Reporter: Matteo Bertozzi Priority: Trivial Fix For: 0.99.0 To bulk load something you need to change the ownership of the data directory to allow the hbase user to read and move the files, also in the split case you must use the hbase user to run the LoadIncrementalHFiles tool, since internally some directories _tmp are created to add the split reference files. In a secure cluster, the SecureBulkLoadEndPoint will take care of this problem by doing a chmod 777 on the directory to bulk load. NOTE that a chown is not possible since you must be a super user to change the ownership, a change group may be possible but the user must be in the hbase group... and anyway it will require a chmod to allow the group to perform the move. {code} Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=WRITE, inode=/test/cf:th30z:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205) Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: Exception in rename at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.rename(HRegionFileSystem.java:928) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.commitStoreFile(HRegionFileSystem.java:340) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.bulkLoadStoreFile(HRegionFileSystem.java:414) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8942: - Fix Version/s: (was: 0.96.1) 0.99.0 DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.99.0 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-6970) hbase-deamon.sh creates/updates pid file even when that start failed.
[ https://issues.apache.org/jira/browse/HBASE-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6970: - Fix Version/s: (was: 0.96.1) 0.99.0 hbase-deamon.sh creates/updates pid file even when that start failed. - Key: HBASE-6970 URL: https://issues.apache.org/jira/browse/HBASE-6970 Project: HBase Issue Type: Bug Components: Usability Reporter: Lars Hofhansl Fix For: 0.99.0 We just ran into a strange issue where could neither start nor stop services with hbase-deamon.sh. The problem is this: {code} nohup nice -n $HBASE_NICENESS $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ $command $@ $startStop $logout 21 /dev/null echo $! $pid {code} So the pid file is created or updated even when the start of the service failed. The next stop command will then fail, because the pid file has the wrong pid in it. Edit: Spelling and more spelling errors. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9484) Backport 8534 Fix coverage for org.apache.hadoop.hbase.mapreduce to 0.96
[ https://issues.apache.org/jira/browse/HBASE-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9484: - Fix Version/s: (was: 0.96.1) 0.99.0 Backport 8534 Fix coverage for org.apache.hadoop.hbase.mapreduce to 0.96 -- Key: HBASE-9484 URL: https://issues.apache.org/jira/browse/HBASE-9484 Project: HBase Issue Type: Test Components: mapreduce, test Reporter: Nick Dimiduk Priority: Minor Fix For: 0.99.0 Attachments: 0001-HBASE-9484-backport-8534-Fix-coverage-for-org.apache.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9047) Tool to handle finishing replication when the cluster is offline
[ https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9047: - Fix Version/s: (was: 0.96.1) Tool to handle finishing replication when the cluster is offline Key: HBASE-9047 URL: https://issues.apache.org/jira/browse/HBASE-9047 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Jean-Daniel Cryans Assignee: Demai Ni Fix For: 0.98.0, 0.94.15, 0.99.0 Attachments: HBASE-9047-0.94-v1.patch, HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch, HBASE-9047-trunk-v1.patch, HBASE-9047-trunk-v2.patch, HBASE-9047-trunk-v3.patch, HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v5.patch, HBASE-9047-trunk-v6.patch, HBASE-9047-trunk-v7.patch, HBASE-9047-trunk-v7.patch We're having a discussion on the mailing list about replicating the data on a cluster that was shut down in an offline fashion. The motivation could be that you don't want to bring HBase back up but still need that data on the slave. So I have this idea of a tool that would be running on the master cluster while it is down, although it could also run at any time. Basically it would be able to read the replication state of each master region server, finish replicating what's missing to all the slave, and then clear that state in zookeeper. The code that handles replication does most of that already, see ReplicationSourceManager and ReplicationSource. Basically when ReplicationSourceManager.init() is called, it will check all the queues in ZK and try to grab those that aren't attached to a region server. If the whole cluster is down, it will grab all of them. The beautiful thing here is that you could start that tool on all your machines and the load will be spread out, but that might not be a big concern if replication wasn't lagging since it would take a few seconds to finish replicating the missing data for each region server. I'm guessing when starting ReplicationSourceManager you'd give it a fake region server ID, and you'd tell it not to start its own source. FWIW the main difference in how replication is handled between Apache's HBase and Facebook's is that the latter is always done separately of HBase itself. This jira isn't about doing that. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9473) Change UI to list 'system tables' rather than 'catalog tables'.
[ https://issues.apache.org/jira/browse/HBASE-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9473: - Fix Version/s: (was: 0.96.1) 0.99.0 Change UI to list 'system tables' rather than 'catalog tables'. --- Key: HBASE-9473 URL: https://issues.apache.org/jira/browse/HBASE-9473 Project: HBase Issue Type: Bug Components: UI Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 9473.txt Minor, one-line, bit of polishing. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-6416) hbck dies on NPE when a region folder exists but the table does not
[ https://issues.apache.org/jira/browse/HBASE-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6416: - Fix Version/s: (was: 0.96.1) 0.99.0 hbck dies on NPE when a region folder exists but the table does not --- Key: HBASE-6416 URL: https://issues.apache.org/jira/browse/HBASE-6416 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Fix For: 0.99.0 Attachments: hbase-6416-v1.patch, hbase-6416.patch This is what I'm getting for leftover data that has no .regioninfo First: {quote} 12/07/17 23:13:37 WARN util.HBaseFsck: Failed to read .regioninfo file for region null java.io.FileNotFoundException: File does not exist: /hbase/stumble_info_urlid_user/bd5f6cfed674389b4d7b8c1be227cb46/.regioninfo at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456) at org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegioninfo(HBaseFsck.java:611) at org.apache.hadoop.hbase.util.HBaseFsck.access$2200(HBaseFsck.java:140) at org.apache.hadoop.hbase.util.HBaseFsck$WorkItemHdfsRegionInfo.call(HBaseFsck.java:2882) at org.apache.hadoop.hbase.util.HBaseFsck$WorkItemHdfsRegionInfo.call(HBaseFsck.java:2866) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} Then it hangs on: {quote} 12/07/17 23:13:39 INFO util.HBaseFsck: Attempting to handle orphan hdfs dir: hdfs://sfor3s24:10101/hbase/stumble_info_urlid_user/bd5f6cfed674389b4d7b8c1be227cb46 12/07/17 23:13:39 INFO util.HBaseFsck: checking orphan for table null Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$100(HBaseFsck.java:1634) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:435) at org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:408) at org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:529) at org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:313) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:386) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3227) {quote} The NPE is sent by: {code} Preconditions.checkNotNull(Table + tableName + ' not present!, tableInfo); {code} I wonder why the condition checking was added if we don't handle it... In any case hbck dies but it hangs because there are some non-daemon hanging around. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9591) [replication] getting Current list of sinks is out of date all the time when a source is recovered
[ https://issues.apache.org/jira/browse/HBASE-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9591: - Fix Version/s: (was: 0.96.1) 0.99.0 [replication] getting Current list of sinks is out of date all the time when a source is recovered Key: HBASE-9591 URL: https://issues.apache.org/jira/browse/HBASE-9591 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Jean-Daniel Cryans Priority: Minor Fix For: 0.99.0 I tried killing a region server when the slave cluster was down, from that point on my log was filled with: {noformat} 2013-09-20 00:31:03,942 INFO [regionserver60020.replicationSource,1] org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: Current list of sinks is out of date, updating 2013-09-20 00:31:04,226 INFO [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634] org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: Current list of sinks is out of date, updating {noformat} The first log line is from the normal source, the second is the recovered one. When we try to replicate, we call replicationSinkMgr.getReplicationSink() and if the list of machines was refreshed since the last time then we call chooseSinks() which in turn refreshes the list of sinks and resets our lastUpdateToPeers. The next source will notice the change, and will call chooseSinks() too. The first source is coming for another round, sees the list was refreshed, calls chooseSinks() again. It happens forever until the recovered queue is gone. We could have all the sources going to the same cluster share a thread-safe ReplicationSinkManager. We could also manage the same cluster separately for each source. Or even easier, if the list we get in chooseSinks() is the same we had before, consider it a noop. What do you think [~gabriel.reid]? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-7245) Recovery on failed snapshot restore
[ https://issues.apache.org/jira/browse/HBASE-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7245: - Fix Version/s: (was: 0.96.1) 0.99.0 Recovery on failed snapshot restore --- Key: HBASE-7245 URL: https://issues.apache.org/jira/browse/HBASE-7245 Project: HBase Issue Type: Bug Components: Client, master, regionserver, snapshots, Zookeeper Reporter: Jonathan Hsieh Assignee: Matteo Bertozzi Fix For: 0.99.0 Restore will do updates to the file system and to meta. it seems that an inopportune failure before meta is completely updated could result in an inconsistent state that would require hbck to fix. We should define what the semantics are for recovering from this. Some suggestions: 1) Fail Forward (see some log saying restore's meta edits not completed, then gather information necessary to build it all from fs, and complete meta edits.). 2) Fail backwards (see some log saying restore's meta edits not completed, delete incomplete snapshot region entries from meta.) I think I prefer 1 -- if two processes end somehow updating (somehow the original master didn't die, and a new one started up) they would be idempotent. If we used 2, we could still have a race and still be in a bad place. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8713) [hadoop1] Log spam each time the WAL is rolled
[ https://issues.apache.org/jira/browse/HBASE-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8713: - Fix Version/s: (was: 0.96.1) 0.99.0 [hadoop1] Log spam each time the WAL is rolled -- Key: HBASE-8713 URL: https://issues.apache.org/jira/browse/HBASE-8713 Project: HBase Issue Type: Bug Affects Versions: 0.95.1 Reporter: Jean-Daniel Cryans Fix For: 0.99.0 Testing 0.95.1 RC1, I see these 2 lines every time the WAL is rolled: {noformat} 2013-06-07 17:19:33,182 INFO [RS_CLOSE_REGION-ip-10-20-46-44:50653-1] util.FSUtils: FileSystem doesn't support getDefaultReplication 2013-06-07 17:19:33,182 INFO [RS_CLOSE_REGION-ip-10-20-46-44:50653-1] util.FSUtils: FileSystem doesn't support getDefaultBlockSize {noformat} It only happens on hadoop1. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9866) Support the mode where REST server authorizes proxy users
[ https://issues.apache.org/jira/browse/HBASE-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9866: - Fix Version/s: (was: 0.96.1) 0.99.0 Support the mode where REST server authorizes proxy users - Key: HBASE-9866 URL: https://issues.apache.org/jira/browse/HBASE-9866 Project: HBase Issue Type: Improvement Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.99.0 Attachments: 9866-1.txt, 9866-2.txt, 9866-3.txt, 9866-4.txt In one use case, someone was trying to authorize with the REST server as a proxy user. That mode is not supported today. The curl request would be something like (assuming SPNEGO auth) - {noformat} curl -i --negotiate -u : http://HOST:PORT/version/cluster?doas=USER {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8593) Type support in ImportTSV tool
[ https://issues.apache.org/jira/browse/HBASE-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8593: - Fix Version/s: (was: 0.96.1) 0.99.0 Type support in ImportTSV tool -- Key: HBASE-8593 URL: https://issues.apache.org/jira/browse/HBASE-8593 Project: HBase Issue Type: Sub-task Components: mapreduce Reporter: Anoop Sam John Assignee: rajeshbabu Fix For: 0.99.0 Attachments: HBASE-8593.patch, HBASE-8593_v2.patch, HBASE-8593_v4.patch, ReportMapper.java Now the ImportTSV tool treats all the table column to be of type String. It converts the input data into bytes considering its type to be String. Some times user will need a type of say int/float to get added to table by using this tool. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9343) Implement stateless scanner for Stargate
[ https://issues.apache.org/jira/browse/HBASE-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9343: - Fix Version/s: (was: 0.96.1) 0.99.0 Implement stateless scanner for Stargate Key: HBASE-9343 URL: https://issues.apache.org/jira/browse/HBASE-9343 Project: HBase Issue Type: Improvement Components: REST Affects Versions: 0.94.11 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor Fix For: 0.98.1, 0.99.0 Attachments: HBASE-9343_94.00.patch, HBASE-9343_94.01.patch, HBASE-9343_trunk.00.patch, HBASE-9343_trunk.01.patch, HBASE-9343_trunk.01.patch, HBASE-9343_trunk.02.patch The current scanner implementation for scanner stores state and hence not very suitable for REST server failure scenarios. The current JIRA proposes to implement a stateless scanner. In the first version of the patch, a new resource class ScanResource has been added and all the scan parameters will be specified as query params. The following are the scan parameters startrow - The start row for the scan. endrow - The end row for the scan. columns - The columns to scan. starttime, endtime - To only retrieve columns within a specific range of version timestamps,both start and end time must be specified. maxversions - To limit the number of versions of each column to be returned. batchsize - To limit the maximum number of values returned for each call to next(). limit - The number of rows to return in the scan operation. More on start row, end row and limit parameters. 1. If start row, end row and limit not specified, then the whole table will be scanned. 2. If start row and limit (say N) is specified, then the scan operation will return N rows from the start row specified. 3. If only limit parameter is specified, then the scan operation will return N rows from the start of the table. 4. If limit and end row are specified, then the scan operation will return N rows from start of table till the end row. If the end row is reached before N rows ( say M and M lt; N ), then M rows will be returned to the user. 5. If start row, end row and limit (say N ) are specified and N lt; number of rows between start row and end row, then N rows from start row will be returned to the user. If N gt; (number of rows between start row and end row (say M), then M number of rows will be returned to the user. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9889) Make sure we clean up scannerReadPoints upon any exceptions
[ https://issues.apache.org/jira/browse/HBASE-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9889: - Fix Version/s: (was: 0.96.1) 0.99.0 Make sure we clean up scannerReadPoints upon any exceptions --- Key: HBASE-9889 URL: https://issues.apache.org/jira/browse/HBASE-9889 Project: HBase Issue Type: Sub-task Affects Versions: 0.89-fb, 0.94.12, 0.96.0 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.99.0 Attachments: hbase-9889.diff If there is an exception in the creation of RegionScanner (for example, exception while opening store files) the scanner Read points is not cleaned up. Having an unused old entry in the scannerReadPoints means that flushes and compactions cannot garbage-collect older versions. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9527) Review all old api that takes a table name as a byte array and ensure none can pass ns + tablename
[ https://issues.apache.org/jira/browse/HBASE-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9527: - Fix Version/s: (was: 0.96.1) 0.99.0 Review all old api that takes a table name as a byte array and ensure none can pass ns + tablename -- Key: HBASE-9527 URL: https://issues.apache.org/jira/browse/HBASE-9527 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Fix For: 0.98.0, 0.99.0 Go over all old APIs that take a table name and ensure that it is not possible to pass in a byte array that is a namespace + tablename; instead throw an exception. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8028) Append, Increment: Adding rollback support
[ https://issues.apache.org/jira/browse/HBASE-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8028: - Fix Version/s: (was: 0.96.1) 0.99.0 Append, Increment: Adding rollback support -- Key: HBASE-8028 URL: https://issues.apache.org/jira/browse/HBASE-8028 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.5 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.99.0 Attachments: HBase-8028-v1.patch, HBase-8028-v2.patch, HBase-8028-with-Increments-v1.patch, HBase-8028-with-Increments-v2.patch In case there is an exception while doing the log-sync, the memstore is not rollbacked, while the mvcc is _always_ forwarded to the writeentry created at the beginning of the operation. This may lead to scanners seeing results which are not synched to the fs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-7840) Enhance the java it framework to start stop a distributed hbase hadoop cluster
[ https://issues.apache.org/jira/browse/HBASE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7840: - Fix Version/s: (was: 0.96.1) 0.99.0 Enhance the java it framework to start stop a distributed hbase hadoop cluster --- Key: HBASE-7840 URL: https://issues.apache.org/jira/browse/HBASE-7840 Project: HBase Issue Type: New Feature Components: test Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 0.99.0 Attachments: 7840.v1.patch, 7840.v3.patch, 7840.v4.patch Needs are to use a development version of HBase HDFS 1 2. Ideally, should be nicely backportable to 0.94 to allow comparisons and regression tests between versions. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.
[ https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5617: - Fix Version/s: (was: 0.96.1) 0.99.0 Provide coprocessor hooks in put flow while rollbackMemstore. - Key: HBASE-5617 URL: https://issues.apache.org/jira/browse/HBASE-5617 Project: HBase Issue Type: Improvement Components: Coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.99.0 Attachments: HBASE-5617_1.patch, HBASE-5617_2.patch With coprocessors hooks while put happens we have the provision to create new puts to other tables or regions. These puts can be done with writeToWal as false. In 0.94 and above the puts are first written to memstore and then to WAL. If any failure in the WAL append or sync the memstore is rollbacked. Now the problem is that if the put that happens in the main flow fails there is no way to rollback the puts that happened in the prePut. We can add coprocessor hooks to like pre/postRoolBackMemStore. Is any one hook enough here? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9431) Set 'hbase.bulkload.retries.number' to 10 as HBASE-8450 claims
[ https://issues.apache.org/jira/browse/HBASE-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9431: - Fix Version/s: (was: 0.96.1) 0.99.0 Set 'hbase.bulkload.retries.number' to 10 as HBASE-8450 claims --- Key: HBASE-9431 URL: https://issues.apache.org/jira/browse/HBASE-9431 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.99.0 Attachments: 9431.txt HBASE-8450 claimes 'hbase.bulkload.retries.number' is set to 10 when its still 0 ([~jeffreyz] noticed). Fix. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-5839) Backup master not starting up due to Bind Exception while starting HttpServer
[ https://issues.apache.org/jira/browse/HBASE-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5839: - Fix Version/s: (was: 0.96.1) 0.99.0 Backup master not starting up due to Bind Exception while starting HttpServer - Key: HBASE-5839 URL: https://issues.apache.org/jira/browse/HBASE-5839 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.99.0 Backup master tries to bind to the info port 60010. This is done once the back up master becomes active. Even before that the Data Xceviers threads (IPC handlers) are started and they are started at random port. If already 60010 is used then when standby master comes up then it fails due to bind exception. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-6562) Fake KVs are sometimes passed to filters
[ https://issues.apache.org/jira/browse/HBASE-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6562: - Fix Version/s: (was: 0.96.1) 0.99.0 Fake KVs are sometimes passed to filters Key: HBASE-6562 URL: https://issues.apache.org/jira/browse/HBASE-6562 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.99.0 Attachments: 6562-0.94-v1.txt, 6562-0.96-v1.txt, 6562-v2.txt, 6562-v3.txt, 6562-v4.txt, 6562-v5.txt, 6562.txt, minimalTest.java In internal tests at Salesforce we found that fake row keys sometimes are passed to filters (Filter.filterRowKey(...) specifically). The KVs are eventually filtered by the StoreScanner/ScanQueryMatcher, but the row key is passed to filterRowKey in RegionScannImpl *before* that happens. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8533) HBaseAdmin does not ride over cluster restart
[ https://issues.apache.org/jira/browse/HBASE-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8533: - Fix Version/s: (was: 0.96.1) 0.99.0 HBaseAdmin does not ride over cluster restart - Key: HBASE-8533 URL: https://issues.apache.org/jira/browse/HBASE-8533 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 0.98.0, 0.95.0 Reporter: Julian Zhou Assignee: Julian Zhou Priority: Minor Fix For: 0.99.0 Attachments: 8533-0.95-v1.patch, 8533-trunk-v1.patch, hbase-8533-trunk-v0.patch For Restful servlet (org.apache.hadoop.hbase.rest.Main (0.94), org.apache.hadoop.hbase.rest.RESTServer (trunk)) on Jetty, we need to first explicitly start the service (% ./bin/hbase-daemon.sh start rest -p 8000 ) for application running. Here is a scenario, sometimes, HBase cluster are stopped/started for maintanence, but rest is a seperated standalone process, which binds the HBaseAdmin at construction method. HBase stop/start cause this binding lost for existing rest servlet. Rest servlet still exist to trying on old bound HBaseAdmin until a long time duration later with an Unavailable caught via an IOException caught in such as RootResource. Could we pairwise the HBase service with HBase rest service with some start/stop options? since seems no reason to still keep the rest servlet process after HBase stopped? When HBase restarts, original rest service could not resume to bind to the new HBase service via its old HBaseAdmin reference? So may we stop the rest when hbase stopped, or even if hbase was killed by acident, restart hbase with rest option could detect the old rest process, kill it and start to bind a new one? From this point of view, application rely on rest api in previous scenario could immediately detect it when setting up http connection session instead of wasting a long time to fail back from IOException with Unavailable from rest servlet. Put current options from the discussion history here from Andrew, Stack and Jean-Daniel, 1) create an HBaseAdmin on demand in rest servlet instead of keeping singleton instance; (another possible enhancement for HBase client: automatic reconnection of an open HBaseAdmin handle after a cluster bounce?) 2) pairwise the rest webapp with hbase webui so the rest is always on with HBase serive; 3) add an option for rest service (such as HBASE_MANAGES_REST) in hbase-env.sh, set HBASE_MANAGES_REST to true, the scripts will start/stop the REST server. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9513) Why is PE#RandomSeekScanTest way slower in 0.96 than in 0.94?
[ https://issues.apache.org/jira/browse/HBASE-9513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9513: - Fix Version/s: (was: 0.96.1) 0.99.0 Why is PE#RandomSeekScanTest way slower in 0.96 than in 0.94? - Key: HBASE-9513 URL: https://issues.apache.org/jira/browse/HBASE-9513 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.98.0, 0.99.0 Our JMS reported this on the 0.96.0RC0 thread. Our Matteo found similar on an offline thread. Whats up here? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9409) Backport 'HBASE-9391 Compilation problem in AccessController with JDK 6 (Andrew Purtell)'
[ https://issues.apache.org/jira/browse/HBASE-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9409: - Fix Version/s: (was: 0.96.1) 0.99.0 Backport 'HBASE-9391 Compilation problem in AccessController with JDK 6 (Andrew Purtell)' -- Key: HBASE-9409 URL: https://issues.apache.org/jira/browse/HBASE-9409 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.99.0 Issue to add fix to next 0.96.0RC or to 0.96.1 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-7108) Don't use legal family name for system folder at region level
[ https://issues.apache.org/jira/browse/HBASE-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7108: - Fix Version/s: (was: 0.96.1) 0.99.0 Don't use legal family name for system folder at region level - Key: HBASE-7108 URL: https://issues.apache.org/jira/browse/HBASE-7108 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.2, 0.94.2, 0.95.2 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.99.0 Attachments: HBASE-7108-v0.patch CHANGED, was: Don't allow recovered.edits as legal family name Region directories can contain folders called recovered.edits, log splitting related. But there's nothing that prevent a user to create a family with that name... HLog.RECOVERED_EDITS_DIR = recovered.edits; HRegion.MERGEDIR = merges; // fixed with HBASE-6158 SplitTransaction.SPLITDIR = splits; // fixed with HBASE-6158 -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9445) Snapshots should create column family dirs for empty regions
[ https://issues.apache.org/jira/browse/HBASE-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9445: - Fix Version/s: (was: 0.96.1) 0.99.0 Snapshots should create column family dirs for empty regions Key: HBASE-9445 URL: https://issues.apache.org/jira/browse/HBASE-9445 Project: HBase Issue Type: Bug Components: snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.99.0 Attachments: hbase-9445_v1.patch, hbase-9445_v2.patch, hbase-9445_v3.patch Currently, taking a snapshot will not create the family directory under a region if the family does not have any files in it. Subsequent verification fails because of this. There is some logic in the SnapshotTestingUtils.confirmSnapshotValid() to deal with empty family directories, but I think we should create the family directories regardless of whether there are any hfiles referencing them. {code} 2013-09-05 11:07:21,566 DEBUG [Thread-208] util.FSUtils(1687): |-data/ 2013-09-05 11:07:21,567 DEBUG [Thread-208] util.FSUtils(1687): |default/ 2013-09-05 11:07:21,568 DEBUG [Thread-208] util.FSUtils(1687): |---test/ 2013-09-05 11:07:21,569 DEBUG [Thread-208] util.FSUtils(1687): |--.tabledesc/ 2013-09-05 11:07:21,570 DEBUG [Thread-208] util.FSUtils(1690): |-.tableinfo.01 2013-09-05 11:07:21,570 DEBUG [Thread-208] util.FSUtils(1687): |--.tmp/ 2013-09-05 11:07:21,571 DEBUG [Thread-208] util.FSUtils(1687): |--accd6e55887057888de758df44dacda7/ 2013-09-05 11:07:21,572 DEBUG [Thread-208] util.FSUtils(1690): |-.regioninfo 2013-09-05 11:07:21,572 DEBUG [Thread-208] util.FSUtils(1687): |-fam/ 2013-09-05 11:07:21,555 DEBUG [Thread-208] util.FSUtils(1687): |-.hbase-snapshot/ 2013-09-05 11:07:21,556 DEBUG [Thread-208] util.FSUtils(1687): |.tmp/ 2013-09-05 11:07:21,557 DEBUG [Thread-208] util.FSUtils(1687): |offlineTableSnapshot/ 2013-09-05 11:07:21,558 DEBUG [Thread-208] util.FSUtils(1690): |---.snapshotinfo 2013-09-05 11:07:21,558 DEBUG [Thread-208] util.FSUtils(1687): |---.tabledesc/ 2013-09-05 11:07:21,558 DEBUG [Thread-208] util.FSUtils(1690): |--.tableinfo.01 2013-09-05 11:07:21,559 DEBUG [Thread-208] util.FSUtils(1687): |---.tmp/ 2013-09-05 11:07:21,559 DEBUG [Thread-208] util.FSUtils(1687): |---accd6e55887057888de758df44dacda7/ 2013-09-05 11:07:21,560 DEBUG [Thread-208] util.FSUtils(1690): |--.regioninfo {code} I think this is important for 0.96.0. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8642) [Snapshot] List and delete snapshot by table
[ https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8642: - Fix Version/s: (was: 0.96.1) 0.99.0 [Snapshot] List and delete snapshot by table Key: HBASE-8642 URL: https://issues.apache.org/jira/browse/HBASE-8642 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2 Reporter: Julian Zhou Assignee: Julian Zhou Priority: Minor Fix For: 0.99.0 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 8642-trunk-0.95-v2.patch Support list and delete snapshot by table name. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9892: - Fix Version/s: (was: 0.96.1) Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v2.patch, HBASE-9892-v5.txt The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-7579) HTableDescriptor equals method fails if results are returned in a different order
[ https://issues.apache.org/jira/browse/HBASE-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7579: - Fix Version/s: (was: 0.96.1) 0.99.0 HTableDescriptor equals method fails if results are returned in a different order - Key: HBASE-7579 URL: https://issues.apache.org/jira/browse/HBASE-7579 Project: HBase Issue Type: Bug Components: Admin Affects Versions: 0.94.6, 0.95.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-7579-0.94.patch, HBASE-7579-v1.patch, HBASE-7579-v2.patch, HBASE-7579-v3.patch, HBASE-7579-v4.patch, HBASE-7579-v5.patch HTableDescriptor's compareTo function compares a set of HColumnDescriptors against another set of HColumnDescriptors. It iterates through both, relying on the fact that they will be in the same order. In my testing, I may have seen this issue come up, so I decided to fix it. It's a straightforward fix. I convert the sets into a hashset for O(1) lookups (at least in theory), then I check that all items in the first set are found in the second. Since the sizes are the same, we know that if all elements showed up in the second set, then they must be equal. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8091) Goraci test might rewrite data in case of map task failures, which eclipses data loss problems
[ https://issues.apache.org/jira/browse/HBASE-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8091: - Fix Version/s: (was: 0.96.1) 0.99.0 Goraci test might rewrite data in case of map task failures, which eclipses data loss problems -- Key: HBASE-8091 URL: https://issues.apache.org/jira/browse/HBASE-8091 Project: HBase Issue Type: Bug Components: test Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.99.0 As discussed in HBASE-8031, goraci map tasks will rewite the same data if the map task fails. There are some conditions that this might cause overwriting data that would have been reported as loss otherwise. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9542) Have Get and MultiGet do cellblocks, currently they are pb all the time
[ https://issues.apache.org/jira/browse/HBASE-9542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9542: - Fix Version/s: (was: 0.96.1) 0.99.0 Have Get and MultiGet do cellblocks, currently they are pb all the time --- Key: HBASE-9542 URL: https://issues.apache.org/jira/browse/HBASE-9542 Project: HBase Issue Type: Improvement Reporter: stack Priority: Critical Fix For: 0.99.0 Probably better if we cellblock Gets and MultiGets rather than pb the results. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9292) Syncer fails but we won't go down
[ https://issues.apache.org/jira/browse/HBASE-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9292: - Fix Version/s: (was: 0.96.1) 0.99.0 Syncer fails but we won't go down - Key: HBASE-9292 URL: https://issues.apache.org/jira/browse/HBASE-9292 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.95.2 Environment: hadoop-2.1.0-beta and tip of 0.95 branch Reporter: stack Fix For: 0.99.0 Running some simple loading tests i ran into the following running on hadoop-2.1.0-beta. {code} 2013-08-20 16:51:56,310 DEBUG [regionserver60020.logRoller] regionserver.LogRoller: HLog roll requested 2013-08-20 16:51:56,314 DEBUG [regionserver60020.logRoller] wal.FSHLog: cleanupCurrentWriter waiting for transactions to get synced total 655761 synced till here 655750 2013-08-20 16:51:56,360 INFO [regionserver60020.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/a2434.halxg.cloudera.com,60020,1377031955847/a2434.halxg.cloudera.com%2C60020%2C1377031955847.1377042714402 with entries=985, filesize=122.5 M; new WAL /hbase/WALs/a2434.halxg.cloudera.com,60020,1377031955847/a2434.halxg.cloudera.com%2C60020%2C1377031955847.1377042716311 2013-08-20 16:51:56,378 WARN [Thread-4788] hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/WALs/a2434.halxg.cloudera.com,60020,1377031955847/a2434.halxg.cloudera.com%2C60020%2C1377031955847.1377042716311 could only be replicated to 0 nodes instead of minReplication (=1). There are 5 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2458) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:525) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy13.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at $Proxy13.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) at $Proxy14.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1220) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1073) ... {code} Thereafter the server is up but useless and can't go down because it just keeps doing this: {code} 2013-08-20 16:51:56,380 FATAL [RpcServer.handler=3,port=60020] wal.FSHLog: Could not sync. Requesting roll of hlog org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
[jira] [Updated] (HBASE-8529) checkOpen is missing from multi, mutate, get and multiGet etc.
[ https://issues.apache.org/jira/browse/HBASE-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8529: - Fix Version/s: (was: 0.96.1) 0.99.0 checkOpen is missing from multi, mutate, get and multiGet etc. -- Key: HBASE-8529 URL: https://issues.apache.org/jira/browse/HBASE-8529 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Minor Fix For: 0.98.0, 0.99.0 I saw we have checkOpen in all those functions in 0.94 while they're missing from trunk. Does anyone know why? For multi and mutate, if we don't call checkOpen we could flood our logs with bunch of DFSOutputStream is closed errors when we sync WAL. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-7115) [shell] Provide a way to register custom filters with the Filter Language Parser
[ https://issues.apache.org/jira/browse/HBASE-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7115: - Fix Version/s: (was: 0.96.1) 0.99.0 [shell] Provide a way to register custom filters with the Filter Language Parser Key: HBASE-7115 URL: https://issues.apache.org/jira/browse/HBASE-7115 Project: HBase Issue Type: Improvement Components: Filters, shell Affects Versions: 0.95.2 Reporter: Aditya Kishore Assignee: Aditya Kishore Fix For: 0.99.0 Attachments: HBASE-7115_trunk.patch, HBASE-7115_trunk.patch, HBASE-7115_trunk_v2.patch HBASE-5428 added this capability to thrift interface but the configuration parameter name is thrift specific. This patch introduces a more generic parameter hbase.user.filters using which the user defined custom filters can be specified in the configuration and loaded in any client that needs to use the filter language parser. The patch then uses this new parameter to register any user specified filters while invoking the HBase shell. Example usage: Let's say I have written a couple of custom filters with class names *{{org.apache.hadoop.hbase.filter.custom.SuperDuperFilter}}* and *{{org.apache.hadoop.hbase.filter.custom.SilverBulletFilter}}* and I want to use them from HBase shell using the filter language. To do that, I would add the following configuration to {{hbase-site.xml}} {panel}{{property}} {{ namehbase.user.filters/name}} {{ value}}*{{SuperDuperFilter}}*{{:org.apache.hadoop.hbase.filter.custom.SuperDuperFilter,}}*{{SilverBulletFilter}}*{{:org.apache.hadoop.hbase.filter.custom.SilverBulletFilter/value}} {{/property}}{panel} Once this is configured, I can launch HBase shell and use these filters in my {{get}} or {{scan}} just the way I would use a built-in filter. {code} hbase(main):001:0 scan 't', {FILTER = SuperDuperFilter(true) AND SilverBulletFilter(42)} ROW COLUMN+CELL status column=cf:a, timestamp=30438552, value=world_peace 1 row(s) in 0. seconds {code} To use this feature in any client, the client needs to make the following function call as part of its initialization. {code} ParseFilter.registerUserFilters(configuration); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849452#comment-13849452 ] Sergey Shelukhin commented on HBASE-5487: - IMHO this, in case of opens, promotes not being fault tolerant. In large clusters you cannot get around servers failing and regions closing and reopening. Snapshot should just be able to ride over that. Splits are more interesting. Esp. if snapshots are used more (MR over snapshots), it may be nonviable to prevent splits and other operations for the duration of every snapshot, alter, ... Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849583#comment-13849583 ] Sergey Shelukhin commented on HBASE-10136: -- If a server fails the region will not stay open... I don't think it's a good idea to rely on that. Locking would work as a temporary fix I guess, for this particular interaction. But why cannot snapshot handle the general case of regions becoming unavailable? It's not like close-open takes time like recovery does during alter table. Alter table conflicts with concurrent snapshot attempt on that table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline
[ https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849587#comment-13849587 ] stack commented on HBASE-9047: -- [~lhofhansl] You going to commit or would you like me too? Tool to handle finishing replication when the cluster is offline Key: HBASE-9047 URL: https://issues.apache.org/jira/browse/HBASE-9047 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Jean-Daniel Cryans Assignee: Demai Ni Fix For: 0.98.0, 0.94.15, 0.99.0 Attachments: HBASE-9047-0.94-v1.patch, HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch, HBASE-9047-trunk-v1.patch, HBASE-9047-trunk-v2.patch, HBASE-9047-trunk-v3.patch, HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v5.patch, HBASE-9047-trunk-v6.patch, HBASE-9047-trunk-v7.patch, HBASE-9047-trunk-v7.patch We're having a discussion on the mailing list about replicating the data on a cluster that was shut down in an offline fashion. The motivation could be that you don't want to bring HBase back up but still need that data on the slave. So I have this idea of a tool that would be running on the master cluster while it is down, although it could also run at any time. Basically it would be able to read the replication state of each master region server, finish replicating what's missing to all the slave, and then clear that state in zookeeper. The code that handles replication does most of that already, see ReplicationSourceManager and ReplicationSource. Basically when ReplicationSourceManager.init() is called, it will check all the queues in ZK and try to grab those that aren't attached to a region server. If the whole cluster is down, it will grab all of them. The beautiful thing here is that you could start that tool on all your machines and the load will be spread out, but that might not be a big concern if replication wasn't lagging since it would take a few seconds to finish replicating the missing data for each region server. I'm guessing when starting ReplicationSourceManager you'd give it a fake region server ID, and you'd tell it not to start its own source. FWIW the main difference in how replication is handled between Apache's HBase and Facebook's is that the latter is always done separately of HBase itself. This jira isn't about doing that. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9484) Backport 8534 Fix coverage for org.apache.hadoop.hbase.mapreduce to 0.96
[ https://issues.apache.org/jira/browse/HBASE-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849590#comment-13849590 ] stack commented on HBASE-9484: -- [~ndimiduk] Looks like this issue should be resolved? It was committed to 0.96? Backport 8534 Fix coverage for org.apache.hadoop.hbase.mapreduce to 0.96 -- Key: HBASE-9484 URL: https://issues.apache.org/jira/browse/HBASE-9484 Project: HBase Issue Type: Test Components: mapreduce, test Reporter: Nick Dimiduk Priority: Minor Fix For: 0.99.0 Attachments: 0001-HBASE-9484-backport-8534-Fix-coverage-for-org.apache.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10175) 2-thread ChaosMonkey steps on its own toes
Sergey Shelukhin created HBASE-10175: Summary: 2-thread ChaosMonkey steps on its own toes Key: HBASE-10175 URL: https://issues.apache.org/jira/browse/HBASE-10175 Project: HBase Issue Type: Improvement Components: test Reporter: Sergey Shelukhin Priority: Minor ChaosMonkey with one destructive and one volatility (flush-compact-split-etc.) threads steps on its own toes and logs a lot of exceptions. A simple solution would be to catch most (or all), like NotServingRegionException, and log less (not a full callstack for example, it's not very useful anyway). A more complicated/complementary one would be to keep track which regions the destructive thread affects and use other regions for volatile one. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HBASE-10143) Clean up dead local stores in FSUtils
[ https://issues.apache.org/jira/browse/HBASE-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark resolved HBASE-10143. --- Resolution: Fixed Fix Version/s: 0.99.0 0.96.2 0.98.0 Clean up dead local stores in FSUtils - Key: HBASE-10143 URL: https://issues.apache.org/jira/browse/HBASE-10143 Project: HBase Issue Type: Sub-task Affects Versions: 0.98.0, 0.96.0, 0.99.0 Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10143-0.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits
[ https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849591#comment-13849591 ] Ted Yu commented on HBASE-8701: --- {code} - return Longs.compare(right.getMvccVersion(), left.getMvccVersion()); + long leftChangeSeqNum = getReplaySeqNum(left); + if (leftChangeSeqNum 0) { +leftChangeSeqNum = left.getMvccVersion(); + } + long RightChangeSeqNum = getReplaySeqNum(right); + if (RightChangeSeqNum 0) { +RightChangeSeqNum = right.getMvccVersion(); {code} What would happen if one Cell has sequence Id but the other cell doesn't have sequence Id ? Can you put the patch on review board ? Thanks distributedLogReplay need to apply wal edits in the receiving order of those edits -- Key: HBASE-8701 URL: https://issues.apache.org/jira/browse/HBASE-8701 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0 Attachments: 8701-v3.txt, hbase-8701-tag.patch, hbase-8701-v4.patch, hbase-8701-v5.patch, hbase-8701-v6.patch, hbase-8701-v7.patch, hbase-8701-v8.patch This issue happens in distributedLogReplay mode when recovering multiple puts of the same key + version(timestamp). After replay, the value is nondeterministic of the key h5. The original concern situation raised from [~eclark]: For all edits the rowkey is the same. There's a log with: [ A (ts = 0), B (ts = 0) ] Replay the first half of the log. A user puts in C (ts = 0) Memstore has to flush A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid. Replay the rest of the Log. Flush The issue will happen in similar situation like Put(key, t=T) in WAL1 and Put(key,t=T) in WAL2 h5. Below is the option(proposed by Ted) I'd like to use: a) During replay, we pass original wal sequence number of each edit to the receiving RS b) In receiving RS, we store negative original sequence number of wal edits into mvcc field of KVs of wal edits c) Add handling of negative MVCC in KVScannerComparator and KVComparator d) In receiving RS, write original sequence number into an optional field of wal file for chained RS failure situation e) When opening a region, we add a safety bumper(a large number) in order for the new sequence number of a newly opened region not to collide with old sequence numbers. In the future, when we stores sequence number along with KVs, we can adjust the above solution a little bit by avoiding to overload MVCC field. h5. The other alternative options are listed below for references: Option one a) disallow writes during recovery b) during replay, we pass original wal sequence ids c) hold flush till all wals of a recovering region are replayed. Memstore should hold because we only recover unflushed wal edits. For edits with same key + version, whichever with larger sequence Id wins. Option two a) During replay, we pass original wal sequence ids b) for each wal edit, we store each edit's original sequence id along with its key. c) during scanning, we use the original sequence id if it's present otherwise its store file sequence Id d) compaction can just leave put with max sequence id Please let me know if you have better ideas. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10136) Alter table conflicts with concurrent snapshot attempt on that table
[ https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849592#comment-13849592 ] Matteo Bertozzi commented on HBASE-10136: - [~sershe] we're not talking about snapshots here. Currently snapshot are built to fail if a region is moving or is down, and this is by design. If you want to talk about how to fix this open another jira. The problem here is the TableEventHandler and when the table lock is released, for example if you call modifyTable() twice or you have a split concurrently with modifyTable() you don't get the expected behavior that we want with the table lock, which should be an operation on the table is locked until the other is completed. also the other problem, not completly related, that I'm pointing out is that since we have this async complete the client is not synchronous Alter table conflicts with concurrent snapshot attempt on that table Key: HBASE-10136 URL: https://issues.apache.org/jira/browse/HBASE-10136 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.96.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Matteo Bertozzi Labels: online_schema_change Expected behavior: A user can issue a request for a snapshot of a table while that table is undergoing an online schema change and expect that snapshot request to complete correctly. Also, the same is true if a user issues a online schema change request while a snapshot attempt is ongoing. Observed behavior: Snapshot attempts time out when there is an ongoing online schema change because the region is closed and opened during the snapshot. As a side-note, I would expect that the attempt should fail quickly as opposed to timing out. Further, what I have seen is that subsequent attempts to snapshot the table fail because of some state/cleanup issues. This is also concerning. Immediate error: {code}type=FLUSH }' is still in progress! 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11) Sleeping: 1ms while waiting for snapshot completion. 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting current status of snapshot from master... 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3] snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0 table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in progress! Snapshot failure occurred org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot 'snapshot0' wasn't completed in expectedTime:6 ms at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638) at org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602) at org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code} Likely root cause of error: {code}Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is closing at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.NotServingRegionException: changeSchemaDuringSnapshot1386806258640,,1386806258720.ea776db51749e39c956d771a7d17a0f3. is
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849602#comment-13849602 ] Enis Soztutar commented on HBASE-9892: -- Go for it! Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v2.patch, HBASE-9892-v5.txt The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9399) Up the memstore flush size
[ https://issues.apache.org/jira/browse/HBASE-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849603#comment-13849603 ] Elliott Clark commented on HBASE-9399: -- I finally got to try this and upping the memstore flush size to 512mb gave us about a 1% perf gain when running our IT tests. Nothing huge so we can take it or leave it. Up the memstore flush size -- Key: HBASE-9399 URL: https://issues.apache.org/jira/browse/HBASE-9399 Project: HBase Issue Type: Task Components: regionserver Affects Versions: 0.98.0, 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0 As heap sizes get bigger we are still recommending that users keep their number of regions to a minimum. This leads to lots of un-used memstore memory. For example I have a region server with 48 gigs of ram. 30 gigs are there for the region server. This with current defaults the global memstore size reserved is 8 gigs. The per region memstore size is 128mb right now. That means that I need 80 regions actively taking writes to reach the global memstore size. That number is way out of line with what our split policies currently give users. They are given much fewer regions by default. We should up the hbase.hregion.memstore.flush.size size. Ideally we should auto tune everything. But until then I think something like 512mb would help a lot with our write throughput on clusters that don't have several hundred regions per RS. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10146) Bump HTrace version to 2.04
[ https://issues.apache.org/jira/browse/HBASE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-10146: -- Resolution: Fixed Fix Version/s: 0.99.0 0.96.2 0.98.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Bump HTrace version to 2.04 --- Key: HBASE-10146 URL: https://issues.apache.org/jira/browse/HBASE-10146 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: HBASE-10146-0.patch 2.04 has been released with a bug fix for what happens when htrace fails. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10176) Canary#sniff() should close the HTable instance
Ted Yu created HBASE-10176: -- Summary: Canary#sniff() should close the HTable instance Key: HBASE-10176 URL: https://issues.apache.org/jira/browse/HBASE-10176 Project: HBase Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} table = new HTable(admin.getConfiguration(), tableDesc.getName()); {code} HTable instance should be closed by the end of the method. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10177) Fix the netty dependency issue
Gaurav Menghani created HBASE-10177: --- Summary: Fix the netty dependency issue Key: HBASE-10177 URL: https://issues.apache.org/jira/browse/HBASE-10177 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Gaurav Menghani Fix For: 0.89-fb The netty developers changed their group id from org.jboss.netty to io.netty. As a result, the zookeeper and hadoop dependencies pull in the older netty (3.2.2) and swift related dependencies pull in the newer netty (3.7.0). As a result we get ClassNotFoundExceptions, when the older 3.2.2 jar is picked up in place of 3.7.0. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9927) ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849607#comment-13849607 ] Hudson commented on HBASE-9927: --- FAILURE: Integrated in HBase-0.94-security #361 (See [https://builds.apache.org/job/HBase-0.94-security/361/]) HBASE-9927 ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() unnecessarily (tedyu: rev 1551273) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java ReplicationLogCleaner#stop() calls HConnectionManager#deleteConnection() unnecessarily -- Key: HBASE-9927 URL: https://issues.apache.org/jira/browse/HBASE-9927 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.94.15 Attachments: 9927.txt When inspecting log, I found the following: {code} 2013-11-08 18:23:48,472 ERROR [M:0;kiyo:42380.oldLogCleaner] client.HConnectionManager(468): Connection not found in the list, can't delete it (connection key=HConnectionKey{properties={hbase.rpc.timeout=6, hbase.zookeeper.property.clientPort=59832, hbase.client.pause=100, zookeeper.znode.parent=/hbase, hbase.client.retries.number=350, hbase.zookeeper.quorum=localhost}, username='zy'}). May be the key was modified? java.lang.Exception at org.apache.hadoop.hbase.client.HConnectionManager.deleteConnection(HConnectionManager.java:468) at org.apache.hadoop.hbase.client.HConnectionManager.deleteConnection(HConnectionManager.java:404) at org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.stop(ReplicationLogCleaner.java:141) at org.apache.hadoop.hbase.master.cleaner.CleanerChore.cleanup(CleanerChore.java:276) {code} The call to HConnectionManager#deleteConnection() is not needed. Here is related code which has a comment for this effect: {code} // Not sure why we're deleting a connection that we never acquired or used HConnectionManager.deleteConnection(this.getConf()); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10178) Potential null object dereference in TablePermission#equals()
Ted Yu created HBASE-10178: -- Summary: Potential null object dereference in TablePermission#equals() Key: HBASE-10178 URL: https://issues.apache.org/jira/browse/HBASE-10178 Project: HBase Issue Type: Bug Reporter: Ted Yu At line 326: {code} ((namespace == null other.getNamespace() == null) || namespace.equals(other.getNamespace())) {code} If namespace is null but other.getNamespace() is not null, we would deference null object. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10044) test-patch.sh should accept documents by known file extensions
[ https://issues.apache.org/jira/browse/HBASE-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849617#comment-13849617 ] Jesse Yates commented on HBASE-10044: - seems reasonable to me. thx ted test-patch.sh should accept documents by known file extensions -- Key: HBASE-10044 URL: https://issues.apache.org/jira/browse/HBASE-10044 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.0 Attachments: 10044-v1.txt, 10044-v2.txt, 10044-v3.txt Currently only htm[l] files are filtered out when test-patch.sh looks for patch attachment. In the email thread, 'Extensions for patches accepted by QA bot', consensus was to accept the following file extensions only: .patch .txt .diff -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849621#comment-13849621 ] Lars Hofhansl commented on HBASE-10174: --- Nit: {code} - private final MapString, Counter counts; + private final ConcurrentMapString, Counter counts = new ConcurrentHashMapString, Counter(); {code} Declaration can remain Map. When exactly does that become an issue? In the pom we require Guava 11.0.2. Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94 - Key: HBASE-10174 URL: https://issues.apache.org/jira/browse/HBASE-10174 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.94.15 Attachments: 9667-0.94.patch On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported NoClassDefFoundError when he used Guava 15. The issue has been fixed in 0.96 + by HBASE-9667 This JIRA ports the fix to 0.94 branch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline
[ https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849627#comment-13849627 ] Lars Hofhansl commented on HBASE-9047: -- I'll do it today. Tool to handle finishing replication when the cluster is offline Key: HBASE-9047 URL: https://issues.apache.org/jira/browse/HBASE-9047 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Jean-Daniel Cryans Assignee: Demai Ni Fix For: 0.98.0, 0.94.15, 0.99.0 Attachments: HBASE-9047-0.94-v1.patch, HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch, HBASE-9047-trunk-v1.patch, HBASE-9047-trunk-v2.patch, HBASE-9047-trunk-v3.patch, HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v5.patch, HBASE-9047-trunk-v6.patch, HBASE-9047-trunk-v7.patch, HBASE-9047-trunk-v7.patch We're having a discussion on the mailing list about replicating the data on a cluster that was shut down in an offline fashion. The motivation could be that you don't want to bring HBase back up but still need that data on the slave. So I have this idea of a tool that would be running on the master cluster while it is down, although it could also run at any time. Basically it would be able to read the replication state of each master region server, finish replicating what's missing to all the slave, and then clear that state in zookeeper. The code that handles replication does most of that already, see ReplicationSourceManager and ReplicationSource. Basically when ReplicationSourceManager.init() is called, it will check all the queues in ZK and try to grab those that aren't attached to a region server. If the whole cluster is down, it will grab all of them. The beautiful thing here is that you could start that tool on all your machines and the load will be spread out, but that might not be a big concern if replication wasn't lagging since it would take a few seconds to finish replicating the missing data for each region server. I'm guessing when starting ReplicationSourceManager you'd give it a fake region server ID, and you'd tell it not to start its own source. FWIW the main difference in how replication is handled between Apache's HBase and Facebook's is that the latter is always done separately of HBase itself. This jira isn't about doing that. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits
[ https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-8701: - Attachment: hbase-8701-tag-v1.patch [~te...@apache.org] A good point. I've updated the patch and moved it to review board(https://reviews.apache.org/r/16304/). Thanks. distributedLogReplay need to apply wal edits in the receiving order of those edits -- Key: HBASE-8701 URL: https://issues.apache.org/jira/browse/HBASE-8701 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0 Attachments: 8701-v3.txt, hbase-8701-tag-v1.patch, hbase-8701-tag.patch, hbase-8701-v4.patch, hbase-8701-v5.patch, hbase-8701-v6.patch, hbase-8701-v7.patch, hbase-8701-v8.patch This issue happens in distributedLogReplay mode when recovering multiple puts of the same key + version(timestamp). After replay, the value is nondeterministic of the key h5. The original concern situation raised from [~eclark]: For all edits the rowkey is the same. There's a log with: [ A (ts = 0), B (ts = 0) ] Replay the first half of the log. A user puts in C (ts = 0) Memstore has to flush A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid. Replay the rest of the Log. Flush The issue will happen in similar situation like Put(key, t=T) in WAL1 and Put(key,t=T) in WAL2 h5. Below is the option(proposed by Ted) I'd like to use: a) During replay, we pass original wal sequence number of each edit to the receiving RS b) In receiving RS, we store negative original sequence number of wal edits into mvcc field of KVs of wal edits c) Add handling of negative MVCC in KVScannerComparator and KVComparator d) In receiving RS, write original sequence number into an optional field of wal file for chained RS failure situation e) When opening a region, we add a safety bumper(a large number) in order for the new sequence number of a newly opened region not to collide with old sequence numbers. In the future, when we stores sequence number along with KVs, we can adjust the above solution a little bit by avoiding to overload MVCC field. h5. The other alternative options are listed below for references: Option one a) disallow writes during recovery b) during replay, we pass original wal sequence ids c) hold flush till all wals of a recovering region are replayed. Memstore should hold because we only recover unflushed wal edits. For edits with same key + version, whichever with larger sequence Id wins. Option two a) During replay, we pass original wal sequence ids b) for each wal edit, we store each edit's original sequence id along with its key. c) during scanning, we use the original sequence id if it's present otherwise its store file sequence Id d) compaction can just leave put with max sequence id Please let me know if you have better ideas. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9399) Up the memstore flush size
[ https://issues.apache.org/jira/browse/HBASE-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849635#comment-13849635 ] Lars Hofhansl commented on HBASE-9399: -- I am starting to benchmark the memstore. What I found so far is that (not too surprisingly) a lot of CPU time during an insert is spent in managing the CSLS. Making that larger should have minimal impact (if any). Might get better IO, since we're flushing larger initial files, even that should be negligible unless we're IO bound on write. For reads I almost want to flush sooner so that the data gets into the more scan friendly block format. Up the memstore flush size -- Key: HBASE-9399 URL: https://issues.apache.org/jira/browse/HBASE-9399 Project: HBase Issue Type: Task Components: regionserver Affects Versions: 0.98.0, 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0 As heap sizes get bigger we are still recommending that users keep their number of regions to a minimum. This leads to lots of un-used memstore memory. For example I have a region server with 48 gigs of ram. 30 gigs are there for the region server. This with current defaults the global memstore size reserved is 8 gigs. The per region memstore size is 128mb right now. That means that I need 80 regions actively taking writes to reach the global memstore size. That number is way out of line with what our split policies currently give users. They are given much fewer regions by default. We should up the hbase.hregion.memstore.flush.size size. Ideally we should auto tune everything. But until then I think something like 512mb would help a lot with our write throughput on clusters that don't have several hundred regions per RS. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10174: --- Attachment: 10174-v2.txt Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94 - Key: HBASE-10174 URL: https://issues.apache.org/jira/browse/HBASE-10174 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.94.15 Attachments: 10174-v2.txt, 9667-0.94.patch On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported NoClassDefFoundError when he used Guava 15. The issue has been fixed in 0.96 + by HBASE-9667 This JIRA ports the fix to 0.94 branch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849641#comment-13849641 ] Ted Yu commented on HBASE-10174: Patch v2 addresses comment on using Map. The issue would surface if user wants to upgrade to Guava 15 - e.g. if some user code uses Guava 15 and he/she wants to use one release of Guava in deployment. Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94 - Key: HBASE-10174 URL: https://issues.apache.org/jira/browse/HBASE-10174 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.94.15 Attachments: 10174-v2.txt, 9667-0.94.patch On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported NoClassDefFoundError when he used Guava 15. The issue has been fixed in 0.96 + by HBASE-9667 This JIRA ports the fix to 0.94 branch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849648#comment-13849648 ] stack commented on HBASE-9892: -- [~liushaohui] One thought I had before commit is what happens for say the case where it is an existing cluster and the znode has empty data? Will we use the info port from the configuration? What if we do a rolling restart when the znode has no data in it? Who writes the znode data? Will new servers be able to work if the znode has no data in it? Thanks. Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v2.patch, HBASE-9892-v5.txt The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849652#comment-13849652 ] stack commented on HBASE-10174: --- So, a client wants to do G15 so we hack on hbase to accommodate? For sure this is the only issue when we jump G11 to G15? We've tried on h1 and h2 cluster deploys? No conflicts? Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94 - Key: HBASE-10174 URL: https://issues.apache.org/jira/browse/HBASE-10174 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.94.15 Attachments: 10174-v2.txt, 9667-0.94.patch On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported NoClassDefFoundError when he used Guava 15. The issue has been fixed in 0.96 + by HBASE-9667 This JIRA ports the fix to 0.94 branch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8701) distributedLogReplay need to apply wal edits in the receiving order of those edits
[ https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849651#comment-13849651 ] Hadoop QA commented on HBASE-8701: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618942/hbase-8701-tag.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//console This message is automatically generated. distributedLogReplay need to apply wal edits in the receiving order of those edits -- Key: HBASE-8701 URL: https://issues.apache.org/jira/browse/HBASE-8701 Project: HBase Issue Type: Bug Components: MTTR Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0 Attachments: 8701-v3.txt, hbase-8701-tag-v1.patch, hbase-8701-tag.patch, hbase-8701-v4.patch, hbase-8701-v5.patch, hbase-8701-v6.patch, hbase-8701-v7.patch, hbase-8701-v8.patch This issue happens in distributedLogReplay mode when recovering multiple puts of the same key + version(timestamp). After replay, the value is nondeterministic of the key h5. The original concern situation raised from [~eclark]: For all edits the rowkey is the same. There's a log with: [ A (ts = 0), B (ts = 0) ] Replay the first half of the log. A user puts in C (ts = 0) Memstore has to flush A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid. Replay the rest of the Log. Flush The issue will happen in similar situation like Put(key, t=T) in WAL1 and Put(key,t=T) in WAL2 h5. Below is the option(proposed by Ted) I'd like to use: a) During replay, we pass original wal sequence number of each edit to the receiving RS b) In receiving RS, we store negative original sequence number of wal edits into mvcc field of KVs of wal edits c) Add handling of negative MVCC in KVScannerComparator and KVComparator d) In receiving RS, write original sequence number into an optional field of wal file for chained RS failure situation e) When opening a region, we add a safety bumper(a large number) in order for the new sequence number of a newly opened region not to collide with old sequence numbers. In the
[jira] [Created] (HBASE-10179) HRegionServer underreports readRequestCounts by 1 under certain conditions
Perry Trolard created HBASE-10179: - Summary: HRegionServer underreports readRequestCounts by 1 under certain conditions Key: HBASE-10179 URL: https://issues.apache.org/jira/browse/HBASE-10179 Project: HBase Issue Type: Bug Components: metrics Affects Versions: 0.94.6, 0.99.0 Reporter: Perry Trolard Priority: Minor In HRegionServer.scan(), if (a) the number of results returned, n, is greater than zero (b) but less than the size of the batch (nbRows) (c) and the size in bytes is smaller than the max size (maxScannerResultSize) then the readRequestCount will be reported as n - 1 rather than n. (This is because the for-loop counter i is used to update the readRequestCount, and if the scan runs out of rows before reaching max rows or size, the code `break`s out of the loop and i is not incremented for the final time.) To reproduce, create a test table and open its details page in the web UI. Insert a single row, then note the current request count, c. Scan the table, returning 1 row; the request count will still be c, whereas it should be c + 1. I have a patch against TRUNK I can submit. At Splice Machine we're running 0.94, I'd be happy to submit a patch against that as well. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10179) HRegionServer underreports readRequestCounts by 1 under certain conditions
[ https://issues.apache.org/jira/browse/HBASE-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Perry Trolard updated HBASE-10179: -- Fix Version/s: 0.99.0 Status: Patch Available (was: Open) HRegionServer underreports readRequestCounts by 1 under certain conditions -- Key: HBASE-10179 URL: https://issues.apache.org/jira/browse/HBASE-10179 Project: HBase Issue Type: Bug Components: metrics Affects Versions: 0.94.6, 0.99.0 Reporter: Perry Trolard Priority: Minor Fix For: 0.99.0 In HRegionServer.scan(), if (a) the number of results returned, n, is greater than zero (b) but less than the size of the batch (nbRows) (c) and the size in bytes is smaller than the max size (maxScannerResultSize) then the readRequestCount will be reported as n - 1 rather than n. (This is because the for-loop counter i is used to update the readRequestCount, and if the scan runs out of rows before reaching max rows or size, the code `break`s out of the loop and i is not incremented for the final time.) To reproduce, create a test table and open its details page in the web UI. Insert a single row, then note the current request count, c. Scan the table, returning 1 row; the request count will still be c, whereas it should be c + 1. I have a patch against TRUNK I can submit. At Splice Machine we're running 0.94, I'd be happy to submit a patch against that as well. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-10179) HRegionServer underreports readRequestCounts by 1 under certain conditions
[ https://issues.apache.org/jira/browse/HBASE-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Perry Trolard updated HBASE-10179: -- Attachment: 10179-against-trunk.diff HRegionServer underreports readRequestCounts by 1 under certain conditions -- Key: HBASE-10179 URL: https://issues.apache.org/jira/browse/HBASE-10179 Project: HBase Issue Type: Bug Components: metrics Affects Versions: 0.94.6, 0.99.0 Reporter: Perry Trolard Priority: Minor Fix For: 0.99.0 Attachments: 10179-against-trunk.diff In HRegionServer.scan(), if (a) the number of results returned, n, is greater than zero (b) but less than the size of the batch (nbRows) (c) and the size in bytes is smaller than the max size (maxScannerResultSize) then the readRequestCount will be reported as n - 1 rather than n. (This is because the for-loop counter i is used to update the readRequestCount, and if the scan runs out of rows before reaching max rows or size, the code `break`s out of the loop and i is not incremented for the final time.) To reproduce, create a test table and open its details page in the web UI. Insert a single row, then note the current request count, c. Scan the table, returning 1 row; the request count will still be c, whereas it should be c + 1. I have a patch against TRUNK I can submit. At Splice Machine we're running 0.94, I'd be happy to submit a patch against that as well. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-10174) Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94
[ https://issues.apache.org/jira/browse/HBASE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849661#comment-13849661 ] Lars Hofhansl commented on HBASE-10174: --- It's certainly better to make HBase work with the various versions of Guava out there. I'm still confused, though. This would be an issue if the *client* used the Guava classes, right? Surely nobody would need to upgrade Guava on the HBase server (unless one has custom filters or coprocessors that use a newer Guava, in which case I'd say tough luck). If we use Guava on the client, I'd agree that we not force a client application to a specific version of Guava just due to the way we're using it. The changed classes are server only? Back port HBASE-9667 'NullOutputStream removed from Guava 15' to 0.94 - Key: HBASE-10174 URL: https://issues.apache.org/jira/browse/HBASE-10174 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.94.15 Attachments: 10174-v2.txt, 9667-0.94.patch On user mailing list under the thread 'Guava 15', Kristoffer Sjögren reported NoClassDefFoundError when he used Guava 15. The issue has been fixed in 0.96 + by HBASE-9667 This JIRA ports the fix to 0.94 branch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8329) Limit compaction speed
[ https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849660#comment-13849660 ] Hadoop QA commented on HBASE-8329: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593121/HBASE-8329-8-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8178//console This message is automatically generated. Limit compaction speed -- Key: HBASE-8329 URL: https://issues.apache.org/jira/browse/HBASE-8329 Project: HBase Issue Type: Improvement Components: Compaction Reporter: binlijin Assignee: binlijin Fix For: 0.99.0 Attachments: HBASE-8329-2-trunk.patch, HBASE-8329-3-trunk.patch, HBASE-8329-4-trunk.patch, HBASE-8329-5-trunk.patch, HBASE-8329-6-trunk.patch, HBASE-8329-7-trunk.patch, HBASE-8329-8-trunk.patch, HBASE-8329-trunk.patch There is no speed or resource limit for compaction,I think we should add this feature especially when request burst. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.
[ https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849665#comment-13849665 ] Hadoop QA commented on HBASE-5617: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520451/HBASE-5617_2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8179//console This message is automatically generated. Provide coprocessor hooks in put flow while rollbackMemstore. - Key: HBASE-5617 URL: https://issues.apache.org/jira/browse/HBASE-5617 Project: HBase Issue Type: Improvement Components: Coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.99.0 Attachments: HBASE-5617_1.patch, HBASE-5617_2.patch With coprocessors hooks while put happens we have the provision to create new puts to other tables or regions. These puts can be done with writeToWal as false. In 0.94 and above the puts are first written to memstore and then to WAL. If any failure in the WAL append or sync the memstore is rollbacked. Now the problem is that if the put that happens in the main flow fails there is no way to rollback the puts that happened in the prePut. We can add coprocessor hooks to like pre/postRoolBackMemStore. Is any one hook enough here? -- This message was sent by Atlassian JIRA (v6.1.4#6159)