[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants
[ https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557927#comment-14557927 ] zhangduo commented on HBASE-13716: -- Seems we updated the version of findbugs? 2.0.3-3.0.0? Stop using Hadoop's FSConstants --- Key: HBASE-13716 URL: https://issues.apache.org/jira/browse/HBASE-13716 Project: HBase Issue Type: Task Affects Versions: 1.0.0, 1.1.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13716.1.patch the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off of it sooner rather htan later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12451) IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits in rolling update of cluster
[ https://issues.apache.org/jira/browse/HBASE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555374#comment-14555374 ] zhangduo commented on HBASE-12451: -- Maybe we could make a new split policy? Make getCountOfCommonTableRegions as an abstract protected method, the old IncreasingToUpperBoundRegionSplitPolicy just use the old implementation which only counts region locally and our new policy will fetch information from master? IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits in rolling update of cluster Key: HBASE-12451 URL: https://issues.apache.org/jira/browse/HBASE-12451 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0 Attachments: HBASE-12451-v1.diff, HBASE-12451-v2.diff Currently IncreasingToUpperBoundRegionSplitPolicy is the default region split policy. In this policy, split size is the number of regions that are on this server that all are of the same table, cubed, times 2x the region flush size. But when unloading regions of a regionserver in a cluster using region_mover.rb, the number of regions that are on this server that all are of the same table will decrease, and the split size will decrease too, which may cause the left region split in the regionsever. Region Splits also happens when loading regions of a regionserver in a cluster. A improvment may set a minimum split size in IncreasingToUpperBoundRegionSplitPolicy Suggestions are welcomed. Thanks~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants
[ https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551923#comment-14551923 ] zhangduo commented on HBASE-13716: -- There is an {{HdfsUtils.isHealthy(URI)}} method in hdfs. At least it has been introduced in hadoop-2.2.0. Could we make use of this method instead of calling {{DistributedFileSystem.setSafeMode}}? Stop using Hadoop's FSConstants --- Key: HBASE-13716 URL: https://issues.apache.org/jira/browse/HBASE-13716 Project: HBase Issue Type: Task Affects Versions: 1.0.0, 1.1.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1 the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off of it sooner rather htan later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants
[ https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552264#comment-14552264 ] zhangduo commented on HBASE-13716: -- {quote} I also have an open request on the HDFs ticket for what we're supposed to use. It could use more details about what we're trying to check. {quote} Do you mean open an HDFS issue that add methods for HBase? Stop using Hadoop's FSConstants --- Key: HBASE-13716 URL: https://issues.apache.org/jira/browse/HBASE-13716 Project: HBase Issue Type: Task Affects Versions: 1.0.0, 1.1.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1 the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off of it sooner rather htan later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants
[ https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553409#comment-14553409 ] zhangduo commented on HBASE-13716: -- +1 for now. Add I check the code again, {{HdfsUtils.isHealthy(URI)}} calls {{DistributedFileSystem.setSafeMode(GET, false)}}, but in HBase we calls {{DistributedFileSystem.setSafeMode(GET, true)}}. I think the difference is when the second parameter is true then BackupNN will throw a StandByException that force client to connect to ActiveNN. If we must connect to ActiveNN in HBase, then {{HdfsUtils.isHealthy(URI)}} is not enough. So add new methods in {{HdfsUtils}}? Stop using Hadoop's FSConstants --- Key: HBASE-13716 URL: https://issues.apache.org/jira/browse/HBASE-13716 Project: HBase Issue Type: Task Affects Versions: 1.0.0, 1.1.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13716.1.patch the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off of it sooner rather htan later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants
[ https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551848#comment-14551848 ] zhangduo commented on HBASE-13716: -- There are two FSConstants... One is {{org.apache.hadoop.fs.FsConstants}}, and the other is {{org.apache.hadoop.hdfs.protocol.FSConstants}}. The former one is marked as public and the latter one is what HDFS-8135 wants to remove. There is only one place where we use {{org.apache.hadoop.hdfs.protocol.FSConstants}}. FSUtils calls {{DistributedFileSystem.setSafeMode}}. We could just replace it with {{HdfsConstants}}. But {{HdfsConstants}} is marked as private. Is there any other ways to check if an HDFS is in safe mode? Stop using Hadoop's FSConstants --- Key: HBASE-13716 URL: https://issues.apache.org/jira/browse/HBASE-13716 Project: HBase Issue Type: Task Affects Versions: 1.0.0, 1.1.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Blocker Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1 the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off of it sooner rather htan later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13637) branch-1.1 does not build against hadoop-2.2.
[ https://issues.apache.org/jira/browse/HBASE-13637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538860#comment-14538860 ] zhangduo commented on HBASE-13637: -- Just wake up... Thanks [~ndimiduk] for the quick fix. +1 on the new patch. branch-1.1 does not build against hadoop-2.2. - Key: HBASE-13637 URL: https://issues.apache.org/jira/browse/HBASE-13637 Project: HBase Issue Type: Bug Reporter: Nick Dimiduk Assignee: zhangduo Fix For: 1.1.0, 1.2.0 Attachments: HBASE-13637-branch-1.1.01.patch, HBASE-13637-branch-1.1.patch From RC0 VOTE thread, {quote} The build is broken with Hadoop-2.2 because mini-kdc is not found: \[ERROR\] Failed to execute goal on project hbase-server: Could not resolve dependencies for project org.apache.hbase:hbase-server:jar:1.1.0: Could not find artifact org.apache.hadoop:hadoop-minikdc:jar:2.2 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13653) Uninitialized HRegionServer#walFactory may result in NullPointerException at region server startup
[ https://issues.apache.org/jira/browse/HBASE-13653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536110#comment-14536110 ] zhangduo commented on HBASE-13653: -- There maybe a race condition between calling reportForDuty and handleReportForDutyResponse. But there are lots of things which are not initialized if handleReportForDutyResponse is not finished, not only walFactory. So I'm not sure if the fix in this patch is enough. Uninitialized HRegionServer#walFactory may result in NullPointerException at region server startup --- Key: HBASE-13653 URL: https://issues.apache.org/jira/browse/HBASE-13653 Project: HBase Issue Type: Bug Components: hbase Reporter: Romil Choksi Assignee: Ted Yu Attachments: 13653-branch-1.txt hbase --config /tmp/hbaseConf org.apache.hadoop.hbase.IntegrationTestIngest --monkey unbalance causes NPE {code} 2015-05-08 08:44:20,885 ERROR [B.defaultRpcServer.handler=28,queue=1,port=16000] master.ServerManager: Received exception in RPC for warmup server:RegionServer1,16020,1431074656202region: {ENCODED = 40133c823b6d9d9dece99db1aad62730, NAME = 'SYSTEM.SEQUENCE,2\x00\x00\x00,1431070054641.40133c823b6d9d9dece99db1aad62730.', STARTKEY = '2\x00\x00\x00', ENDKEY = '3\x00\x00\x00'}exception: java.io.IOException: java.io.IOException at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2154) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:1825) at org.apache.hadoop.hbase.regionserver.RSRpcServices.warmupRegion(RSRpcServices.java:1559) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:21997) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue
[ https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13628: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to all branches. Thanks [~apurtell] and [~stack]. Use AtomicLong as size in BoundedConcurrentLinkedQueue -- Key: HBASE-13628 URL: https://issues.apache.org/jira/browse/HBASE-13628 Project: HBase Issue Type: Bug Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13628.patch Remove the high priority findbugs warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue
[ https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530030#comment-14530030 ] zhangduo commented on HBASE-13628: -- {code} for (T element; (element = super.poll()) != null;) { {code} This is reported by checkstyle as a 'InnerAssignment' issue. This is a common style when polling from queue so I think it is fine? Use AtomicLong as size in BoundedConcurrentLinkedQueue -- Key: HBASE-13628 URL: https://issues.apache.org/jira/browse/HBASE-13628 Project: HBase Issue Type: Bug Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13628.patch Remove the high priority findbugs warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue
[ https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530039#comment-14530039 ] zhangduo commented on HBASE-13628: -- OK, let me commit. Use AtomicLong as size in BoundedConcurrentLinkedQueue -- Key: HBASE-13628 URL: https://issues.apache.org/jira/browse/HBASE-13628 Project: HBase Issue Type: Bug Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13628.patch Remove the high priority findbugs warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13637) branch-1.1 does not build against hadoop-2.2.
[ https://issues.apache.org/jira/browse/HBASE-13637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531741#comment-14531741 ] zhangduo commented on HBASE-13637: -- {quote} It seems that 2.2 does not contain mini KDC at all {quote} Yes, mini-kdc is first introduced in hadoop 2.3. But as [~apurtell] said before, mini-kdc does not depends on any other hadoop modules. See here http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-minikdc/2.7.0 So we could give mini-kdc a separated version instead of the common hadoop-two.version? I can prepare a patch for it. Thanks. branch-1.1 does not build against hadoop-2.2. - Key: HBASE-13637 URL: https://issues.apache.org/jira/browse/HBASE-13637 Project: HBase Issue Type: Bug Reporter: Nick Dimiduk Fix For: 1.1.0 From RC0 VOTE thread, {quote} The build is broken with Hadoop-2.2 because mini-kdc is not found: \[ERROR\] Failed to execute goal on project hbase-server: Could not resolve dependencies for project org.apache.hbase:hbase-server:jar:1.1.0: Could not find artifact org.apache.hadoop:hadoop-minikdc:jar:2.2 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13637) branch-1.1 does not build against hadoop-2.2.
[ https://issues.apache.org/jira/browse/HBASE-13637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13637: - Attachment: HBASE-13637-branch-1.1.patch Tried locally with {noformat} mvn clean package -Dhadoop-two.version=2.2.0 -DskipTests {noformat} Passed. branch-1.1 does not build against hadoop-2.2. - Key: HBASE-13637 URL: https://issues.apache.org/jira/browse/HBASE-13637 Project: HBase Issue Type: Bug Reporter: Nick Dimiduk Fix For: 1.1.0 Attachments: HBASE-13637-branch-1.1.patch From RC0 VOTE thread, {quote} The build is broken with Hadoop-2.2 because mini-kdc is not found: \[ERROR\] Failed to execute goal on project hbase-server: Could not resolve dependencies for project org.apache.hbase:hbase-server:jar:1.1.0: Could not find artifact org.apache.hadoop:hadoop-minikdc:jar:2.2 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13420) RegionEnvironment.offerExecutionLatency Blocks Threads under Heavy Load
[ https://issues.apache.org/jira/browse/HBASE-13420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529640#comment-14529640 ] zhangduo commented on HBASE-13420: -- This patch introduces 3 high priority findbugs warnings(VO_VOLATILE_INCREMENT). Mind I open a issue to change 'size' from volatile to AtomicLong? [~apurtell]. Thanks. RegionEnvironment.offerExecutionLatency Blocks Threads under Heavy Load --- Key: HBASE-13420 URL: https://issues.apache.org/jira/browse/HBASE-13420 Project: HBase Issue Type: Improvement Reporter: John Leach Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Attachments: 1M-0.98.12.svg, 1M-0.98.13-SNAPSHOT.svg, HBASE-13420.patch, HBASE-13420.txt, hbase-13420.tar.gz, offerExecutionLatency.tiff Original Estimate: 3h Remaining Estimate: 3h The ArrayBlockingQueue blocks threads for 20s during a performance run focusing on creating numerous small scans. I see a buffer size of (100) private final BlockingQueueLong coprocessorTimeNanos = new ArrayBlockingQueueLong( LATENCY_BUFFER_SIZE); and then I see a drain coming from MetricsRegionWrapperImpl with 45 second executor HRegionMetricsWrapperRunable RegionCoprocessorHost#getCoprocessorExecutionStatistics() RegionCoprocessorHost#getExecutionLatenciesNanos() Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue
[ https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13628: - Status: Patch Available (was: Open) Use AtomicLong as size in BoundedConcurrentLinkedQueue -- Key: HBASE-13628 URL: https://issues.apache.org/jira/browse/HBASE-13628 Project: HBase Issue Type: Bug Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13628.patch Remove the high priority findbugs warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue
[ https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13628: - Attachment: HBASE-13628.patch Use AtomicLong as size in BoundedConcurrentLinkedQueue -- Key: HBASE-13628 URL: https://issues.apache.org/jira/browse/HBASE-13628 Project: HBase Issue Type: Bug Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13628.patch Remove the high priority findbugs warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10800) Use CellComparator instead of KVComparator
[ https://issues.apache.org/jira/browse/HBASE-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529659#comment-14529659 ] zhangduo commented on HBASE-10800: -- https://builds.apache.org/job/HBase-TRUNK/6456/findbugsResult/new/HIGH/ Should be a bug? {code:title=BufferedDataBlockEncoder.java} if (comparator != null) { ... } else { Cell r = new KeyValue.KeyOnlyKeyValue(current.keyBuffer, 0, current.keyLength); comp = comparator.compareKeyIgnoresMvcc(seekCell, r); // NPE? } {code} Use CellComparator instead of KVComparator -- Key: HBASE-10800 URL: https://issues.apache.org/jira/browse/HBASE-10800 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-10800_1.patch, HBASE-10800_11.patch, HBASE-10800_12.patch, HBASE-10800_13.patch, HBASE-10800_14.patch, HBASE-10800_15.patch, HBASE-10800_16.patch, HBASE-10800_18.patch, HBASE-10800_2.patch, HBASE-10800_23.patch, HBASE-10800_23.patch, HBASE-10800_24.patch, HBASE-10800_25.patch, HBASE-10800_26.patch, HBASE-10800_27.patch, HBASE-10800_28.patch, HBASE-10800_29.patch, HBASE-10800_3.patch, HBASE-10800_4.patch, HBASE-10800_4.patch, HBASE-10800_5.patch, HBASE-10800_6.patch, HBASE-10800_7.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue
zhangduo created HBASE-13628: Summary: Use AtomicLong as size in BoundedConcurrentLinkedQueue Key: HBASE-13628 URL: https://issues.apache.org/jira/browse/HBASE-13628 Project: HBase Issue Type: Bug Reporter: zhangduo Assignee: zhangduo Remove the high priority findbugs warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13609) TestFastFail is still failing
[ https://issues.apache.org/jira/browse/HBASE-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525246#comment-14525246 ] zhangduo commented on HBASE-13609: -- Just remove the last assert for now? numBlockedWorkers increments if the requestTime 1s, this is not a stable condition I think. TestFastFail is still failing - Key: HBASE-13609 URL: https://issues.apache.org/jira/browse/HBASE-13609 Project: HBase Issue Type: Bug Components: test Affects Versions: 1.1.0 Reporter: Nick Dimiduk {noformat} testFastFail(org.apache.hadoop.hbase.client.TestFastFail) Time elapsed: 13.106 sec FAILURE! java.lang.AssertionError: Only few thread should ideally be waiting for the dead regionserver to be coming back. numBlockedWorkers:15 threads that retried : 2 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.client.TestFastFail.testFastFail(TestFastFail.java:288) {noformat} This is failing consistently for me locally. Sometimes it's 15, sometimes it's 5, sometimes 26. We've seen this one before, HBASE-12771, HBASE-12881. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13530) Add param for bulkload wait duration in HRegion.
[ https://issues.apache.org/jira/browse/HBASE-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512372#comment-14512372 ] zhangduo commented on HBASE-13530: -- [~victorunique] So we need to find the real operation that locks region for a long time? The ReentrantReadWriteLock in java does have the problem as what you said, if one thread is waiting on getting WriteLock, then all following operations of getting ReadLock will be blocked even in unfair mode. But there are lots of other operations other than bulkload that need to hold WriteLock and lots of them do not have a timeout. So this is a general problem, we need to find a general way to solve it. Thanks. Add param for bulkload wait duration in HRegion. Key: HBASE-13530 URL: https://issues.apache.org/jira/browse/HBASE-13530 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.98.12 Reporter: Victor Xu Priority: Minor Fix For: 2.0.0, 0.98.13 Attachments: HBASE-13530-0.98-v2.patch, HBASE-13530-0.98.patch, HBASE-13530-master-v2.patch, HBASE-13530-master.patch, HBASE-13530-v1.patch In our scenario, incremental read/write operations and complete bulkload operations are mixed together. Bulkload needs write lock while read/write and flush/compact need read lock. When a region is compacting, the bulkload could hang at writeLock.tryLock(waitDuration, TimeUnit) method. The original default waitDuration is 60sec (from 'hbase.busy.wait.duration'), and this could block all read/writes operations from acquiring read lock for 1 minute. The chances of this scenario become high when compaction speed limit feature(HBASE-8329) is used. Maybe we need to decrease the wait duration ONLY for bulkload, and let read/write keep theirs. So I add this param('hbase.bulkload.wait.duration') to tune wait duration for bulkloading. Of course, it is a table level setting, and the default value is from original logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13530) Add param for bulkload wait duration in HRegion.
[ https://issues.apache.org/jira/browse/HBASE-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510575#comment-14510575 ] zhangduo commented on HBASE-13530: -- I think we do not hold readLock during the whole flush or compaction life time? Add param for bulkload wait duration in HRegion. Key: HBASE-13530 URL: https://issues.apache.org/jira/browse/HBASE-13530 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.98.12 Reporter: Victor Xu Priority: Minor Fix For: 2.0.0, 0.98.13 Attachments: HBASE-13530-0.98-v2.patch, HBASE-13530-0.98.patch, HBASE-13530-master-v2.patch, HBASE-13530-master.patch, HBASE-13530-v1.patch In our scenario, incremental read/write operations and complete bulkload operations are mixed together. Bulkload needs write lock while read/write and flush/compact need read lock. When a region is compacting, the bulkload could hang at writeLock.tryLock(waitDuration, TimeUnit) method. The original default waitDuration is 60sec (from 'hbase.busy.wait.duration'), and this could block all read/writes operations from acquiring read lock for 1 minute. The chances of this scenario become high when compaction speed limit feature(HBASE-8329) is used. Maybe we need to decrease the wait duration ONLY for bulkload, and let read/write keep theirs. So I add this param('hbase.bulkload.wait.duration') to tune wait duration for bulkloading. Of course, it is a table level setting, and the default value is from original logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510839#comment-14510839 ] zhangduo commented on HBASE-13528: -- [~enis] haven't replied yet...Is it safe to commit to branch-1.0? Seems an RC is on-going... A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2, 1.2.0 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13529) Procedure v2 - WAL Improvements
[ https://issues.apache.org/jira/browse/HBASE-13529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507028#comment-14507028 ] zhangduo commented on HBASE-13529: -- What about LinkedTransferQueue? Procedure v2 - WAL Improvements --- Key: HBASE-13529 URL: https://issues.apache.org/jira/browse/HBASE-13529 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.1.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13529-v0.patch, ProcedureStoreTest.java from the discussion in HBASE-12439 the wal was resulting slow. * there is an error around the awake of the slotCond.await(), causing more wait then necessary * ArrayBlockingQueue is dog slow, replace it with ConcurrentLinkedQueue * roll the wal only if reaches a threshold (conf ops) to amortize the cost * hsync() is used by default, when the normal wal is using just hflush() make it tunable via conf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506524#comment-14506524 ] zhangduo commented on HBASE-13528: -- +1. And can you use git format-patch to generate the patch file? Using git am with sign off can retain the author of the patch. [~tedyu] What do you think? You replied on mailing list. Thanks. A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13528: - Fix Version/s: (was: 1.0.1) 1.2.0 1.0.2 1.1.0 2.0.0 A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2, 1.2.0 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508191#comment-14508191 ] zhangduo commented on HBASE-13528: -- [~ndimiduk] [~enis] [~apurtell] Should go into all branches? Thanks. A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2, 1.2.0 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13499) AsyncRpcClient test cases failure in powerpc
[ https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504426#comment-14504426 ] zhangduo commented on HBASE-13499: -- Good. Will commit it tomorrow if no objection. AsyncRpcClient test cases failure in powerpc Key: HBASE-13499 URL: https://issues.apache.org/jira/browse/HBASE-13499 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: sangamesh Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: HBASE-13499.patch The new AsyncRpcClient feature added through the jira defect HBASE-12684 causing some test cases failures in powerpc64 environment. I am testing it in master branch. Looks like the version of netty (4.0.23) doesn't provide a support for non amd64 platforms and suggested to use pure java netty Here is the discussion on that https://github.com/aphyr/riemann/pull/508 So new Async test cases will fail in ppc64 and other non amd64 platforms too. Here is the output of the error. Running org.apache.hadoop.hbase.ipc.TestAsyncIPC Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC) Time elapsed: 0.048 sec ERROR! java.lang.UnsatisfiedLinkError: /tmp/libnetty-transport-native-epoll4286512618055650929.so: /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64-bit platform) at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13499) AsyncRpcClient test cases failure in powerpc
[ https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13499: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to branch-1.1+. Thanks [~sangameshs] [~stack]. AsyncRpcClient test cases failure in powerpc Key: HBASE-13499 URL: https://issues.apache.org/jira/browse/HBASE-13499 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: sangamesh Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: HBASE-13499.patch The new AsyncRpcClient feature added through the jira defect HBASE-12684 causing some test cases failures in powerpc64 environment. I am testing it in master branch. Looks like the version of netty (4.0.23) doesn't provide a support for non amd64 platforms and suggested to use pure java netty Here is the discussion on that https://github.com/aphyr/riemann/pull/508 So new Async test cases will fail in ppc64 and other non amd64 platforms too. Here is the output of the error. Running org.apache.hadoop.hbase.ipc.TestAsyncIPC Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC) Time elapsed: 0.048 sec ERROR! java.lang.UnsatisfiedLinkError: /tmp/libnetty-transport-native-epoll4286512618055650929.so: /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64-bit platform) at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506263#comment-14506263 ] zhangduo commented on HBASE-13528: -- I think this line is also redundant? {code} long size = selectNow ? compaction.getRequest().getSize() : 0; {code} The if selectNow is false, then we will not execute throttleCompaction then 'size' is useless? Thanks. A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98.patch, HBASE-13528-1.0.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool
[ https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506317#comment-14506317 ] zhangduo commented on HBASE-13528: -- This will cause NPE...compaction will be null if selectNow == false. Try this? {code} ThreadPoolExecutor pool = (selectNow s.throttleCompaction(compaction.getRequest().getSize())) ? largeCompactions : smallCompactions; {code} A bug on selecting compaction pool -- Key: HBASE-13528 URL: https://issues.apache.org/jira/browse/HBASE-13528 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 0.98.12 Reporter: Shuaifeng Zhou Assignee: Shuaifeng Zhou Priority: Minor Fix For: 1.0.1, 0.98.13 Attachments: HBASE-13528-0.98.patch, HBASE-13528-1.0.patch, HBASE-13528-master.patch When the selectNow == true, in requestCompactionInternal, the compaction pool section is incorrect. as discussed in: http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13259) mmap() based BucketCache IOEngine
[ https://issues.apache.org/jira/browse/HBASE-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503941#comment-14503941 ] zhangduo commented on HBASE-13259: -- I can pick this up and address the 'ugly ByteBufferArray'. But we do not have enough time to test it on large dataset if we want to catch up with the first rc of 1.1 I think. It is a tuning work, the time we need is unpredictable. We can file a new issue to hold the tuning work and resolve this issue before the first rc of 1.1. What do you think? [~ndimiduk] Thanks. mmap() based BucketCache IOEngine - Key: HBASE-13259 URL: https://issues.apache.org/jira/browse/HBASE-13259 Project: HBase Issue Type: New Feature Components: BlockCache Affects Versions: 0.98.10 Reporter: Zee Chen Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13259-v2.patch, HBASE-13259.patch, ioread-1.svg, mmap-0.98-v1.patch, mmap-1.svg, mmap-trunk-v1.patch Of the existing BucketCache IOEngines, FileIOEngine uses pread() to copy data from kernel space to user space. This is a good choice when the total working set size is much bigger than the available RAM and the latency is dominated by IO access. However, when the entire working set is small enough to fit in the RAM, using mmap() (and subsequent memcpy()) to move data from kernel space to user space is faster. I have run some short keyval gets tests and the results indicate a reduction of 2%-7% of kernel CPU on my system, depending on the load. On the gets, the latency histograms from mmap() are identical to those from pread(), but peak throughput is close to 40% higher. This patch modifies ByteByfferArray to allow it to specify a backing file. Example for using this feature: set hbase.bucketcache.ioengine to mmap:/dev/shm/bucketcache.0 in hbase-site.xml. Attached perf measured CPU usage breakdown in flames graph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13499) AsyncRpcClient test cases failure in powerpc
[ https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502799#comment-14502799 ] zhangduo commented on HBASE-13499: -- Seems we should also add a 'amd64' check which is same with the issue you mentioned. Let me prepare a patch. Thanks. AsyncRpcClient test cases failure in powerpc Key: HBASE-13499 URL: https://issues.apache.org/jira/browse/HBASE-13499 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 1.1.0 Reporter: sangamesh The new AsyncRpcClient feature added through the jira defect HBASE-12684 causing some test cases failures in powerpc64 environment. I am testing it in master branch. Looks like the version of netty (4.0.23) doesn't provide a support for non amd64 platforms and suggested to use pure java netty Here is the discussion on that https://github.com/aphyr/riemann/pull/508 So new Async test cases will fail in ppc64 and other non amd64 platforms too. Here is the output of the error. Running org.apache.hadoop.hbase.ipc.TestAsyncIPC Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC) Time elapsed: 0.048 sec ERROR! java.lang.UnsatisfiedLinkError: /tmp/libnetty-transport-native-epoll4286512618055650929.so: /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64-bit platform) at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13499) AsyncRpcClient test cases failure in powerpc
[ https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13499: - Fix Version/s: 1.2.0 1.1.0 2.0.0 Assignee: zhangduo Affects Version/s: 1.2.0 2.0.0 Status: Patch Available (was: Open) AsyncRpcClient test cases failure in powerpc Key: HBASE-13499 URL: https://issues.apache.org/jira/browse/HBASE-13499 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: sangamesh Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: HBASE-13499.patch The new AsyncRpcClient feature added through the jira defect HBASE-12684 causing some test cases failures in powerpc64 environment. I am testing it in master branch. Looks like the version of netty (4.0.23) doesn't provide a support for non amd64 platforms and suggested to use pure java netty Here is the discussion on that https://github.com/aphyr/riemann/pull/508 So new Async test cases will fail in ppc64 and other non amd64 platforms too. Here is the output of the error. Running org.apache.hadoop.hbase.ipc.TestAsyncIPC Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC) Time elapsed: 0.048 sec ERROR! java.lang.UnsatisfiedLinkError: /tmp/libnetty-transport-native-epoll4286512618055650929.so: /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64-bit platform) at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13499) AsyncRpcClient test cases failure in powerpc
[ https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13499: - Attachment: HBASE-13499.patch Add 'amd64' check. And there is a typo in the testcase... [~sangameshs] Could you please help testing the patch on ppc? Thanks. AsyncRpcClient test cases failure in powerpc Key: HBASE-13499 URL: https://issues.apache.org/jira/browse/HBASE-13499 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 1.1.0 Reporter: sangamesh Attachments: HBASE-13499.patch The new AsyncRpcClient feature added through the jira defect HBASE-12684 causing some test cases failures in powerpc64 environment. I am testing it in master branch. Looks like the version of netty (4.0.23) doesn't provide a support for non amd64 platforms and suggested to use pure java netty Here is the discussion on that https://github.com/aphyr/riemann/pull/508 So new Async test cases will fail in ppc64 and other non amd64 platforms too. Here is the output of the error. Running org.apache.hadoop.hbase.ipc.TestAsyncIPC Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC) Time elapsed: 0.048 sec ERROR! java.lang.UnsatisfiedLinkError: /tmp/libnetty-transport-native-epoll4286512618055650929.so: /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64-bit platform) at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13259) mmap() based BucketCache IOEngine
[ https://issues.apache.org/jira/browse/HBASE-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497347#comment-14497347 ] zhangduo commented on HBASE-13259: -- No, I haven't tested the patch... mmap() based BucketCache IOEngine - Key: HBASE-13259 URL: https://issues.apache.org/jira/browse/HBASE-13259 Project: HBase Issue Type: New Feature Components: BlockCache Affects Versions: 0.98.10 Reporter: Zee Chen Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13259-v2.patch, HBASE-13259.patch, ioread-1.svg, mmap-0.98-v1.patch, mmap-1.svg, mmap-trunk-v1.patch Of the existing BucketCache IOEngines, FileIOEngine uses pread() to copy data from kernel space to user space. This is a good choice when the total working set size is much bigger than the available RAM and the latency is dominated by IO access. However, when the entire working set is small enough to fit in the RAM, using mmap() (and subsequent memcpy()) to move data from kernel space to user space is faster. I have run some short keyval gets tests and the results indicate a reduction of 2%-7% of kernel CPU on my system, depending on the load. On the gets, the latency histograms from mmap() are identical to those from pread(), but peak throughput is close to 40% higher. This patch modifies ByteByfferArray to allow it to specify a backing file. Example for using this feature: set hbase.bucketcache.ioengine to mmap:/dev/shm/bucketcache.0 in hbase-site.xml. Attached perf measured CPU usage breakdown in flames graph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493843#comment-14493843 ] zhangduo commented on HBASE-13301: -- Seems the second time is fine. Let me commit. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493851#comment-14493851 ] zhangduo commented on HBASE-13301: -- Integrated to all branches. Thanks all you guys who help me finish this. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13467) Prototype using GRPC as IPC mechanism
[ https://issues.apache.org/jira/browse/HBASE-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495447#comment-14495447 ] zhangduo commented on HBASE-13467: -- Nice try! Two things. 1. Wire compatibility. gRPC is based on HTTP/2, and the old rpc is based on raw TCP. If we can not keep compatibility at the protocol level, then we should find other ways to let people use old client communicate with new server. 2. Secure HBase. gRPC is based on HTTP/2, so I'm not worried about the kerberos authentication part. But security is a big system, a little change here may require large change there. It maybe a big project. Thanks. Prototype using GRPC as IPC mechanism - Key: HBASE-13467 URL: https://issues.apache.org/jira/browse/HBASE-13467 Project: HBase Issue Type: Improvement Components: API Affects Versions: 2.0.0 Reporter: Louis Ryan Priority: Minor GRPC provide an RPC layer for protocol buffers on top of Netty 4/5. This could be used to replace the current internal implementation. GRPC supports some advanced features like streaming, async, flow-control, cancellation timeout which might be useful Will prototype on GitHub here if folks are interested https://github.com/louiscryan/hbase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493271#comment-14493271 ] zhangduo commented on HBASE-13301: -- Let me port the new patch to the branches other than master. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301-branch-1.0_v1.patch Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301-0.98_v1.patch Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301-branch-1_v1.patch Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301-0.98_v1.patch Retry for 0.98 Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: (was: HBASE-13301-0.98_v1.patch) Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490819#comment-14490819 ] zhangduo commented on HBASE-13301: -- Any other questions? [~ndimiduk] Thanks. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490505#comment-14490505 ] zhangduo commented on HBASE-13301: -- Seems pass on branch-1.0. Since the RC of 0.98.12 and 1.0.1 have been cut, we can push to all branches then? [~enis] [~apurtell] Thanks. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301_v3.patch [~ndimiduk] Yes, I tried it on every branch. Just change 'backingMap.remove(cacheKey, bucketEntry)' back to 'bucketEntry.equals(backingMap.remove(cacheKey))' in BucketCache.evictBlock, the test will fail every time. And for the sleep in testcase... For the evictThread, it is not easy to add a count down latch since we expect the thread to be blocked on the IdLock. And for the BucketCache.cacheBlock, it is a simple queue based async operation, I think it is not worth to add more logic other than a simple sleep wait, it is fast... I extracted the cacheAndWait operation to a method and add some comments to explain the reason. And I added a method in IdLock to check the number of waiters who are waiting on the given id and use this method to confirm the evictThread is blocked on the IdLock. Thanks. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486301#comment-14486301 ] zhangduo commented on HBASE-13301: -- Will commit later if no objections. [~ndimiduk] [~enis] [~apurtell] OK to commit to branches other than master? Thanks. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301_v1.patch Check thread state in testcase. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484362#comment-14484362 ] zhangduo commented on HBASE-13301: -- [~apurtell] A little problem about the compareTo and equals methods in BucketEntry. Will prepare a new patch soon. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301_v2.patch Remove compareTo and equals methods, use a COMPARATOR instead. Change 'accessTime' to 'accessCounter' since it is always assigned by 'accessCount.incrementAndGet' which is not the actual time. 'accessTime' makes people confuse why accessTime equals means object equals in compare methods. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: (was: HBASE-13301_v2.patch) Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301_v2.patch Sorry I should have my head hitten... Fix a wrong comment, the comparator is in descending order, not ascending order... Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Status: Patch Available (was: Open) Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, HBASE-13301_v2.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14396259#comment-14396259 ] zhangduo commented on HBASE-13408: -- Looks good. And a little hint, log truncating is also an important purpose of doing flush. So if you keep some data in memstore for a long time, then there will be lots of WALs that can not be truncated and increase MTTR. So if the flush request comes from LogRoller, then you should enter the panic mode and flush the memstore(Maybe you have already known but I haven't seen log truncating things in your design doc so just put it here :) ) And I remember that xiaomi said they have a 'HLog reform' feature which can solve this problem in their private version of HBase, but seems they have not donated to community yet. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301.patch Change declaration of backingMap from Map to ConcurrentMap and use remove(key, value) to prevent removing wrong entry. Also done some other cleanups and fixes. Use the first testcase since the second one is only used to prove this can happen in real scenario. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Status: Patch Available (was: Open) Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301-testcase_v1.patch A new testcase shows that it is possible to evict and cache a block again in real world. Move to another RS and move back can make this happen. Of course, this is a rarest case. It is almost impossible for a thread to halt for such a long time. But this a time bomb. I do not think it is a good idea to leave it there and wait for a bang... I will try to fix it. Thanks. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0
[ https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13385: - Attachment: HBASE-13385_v1.patch fix compile error with hadoop 2.4.1 TestGenerateDelegationToken is broken with hadoop 2.8.0 --- Key: HBASE-13385 URL: https://issues.apache.org/jira/browse/HBASE-13385 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13385.patch, HBASE-13385_v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0
[ https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13385: - Status: Patch Available (was: Open) TestGenerateDelegationToken is broken with hadoop 2.8.0 --- Key: HBASE-13385 URL: https://issues.apache.org/jira/browse/HBASE-13385 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13385.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0
[ https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13385: - Resolution: Fixed Fix Version/s: 1.1.0 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to master and branch-1. Thanks [~tedyu] for reviewing. TestGenerateDelegationToken is broken with hadoop 2.8.0 --- Key: HBASE-13385 URL: https://issues.apache.org/jira/browse/HBASE-13385 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13385.patch, HBASE-13385_v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0
[ https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392640#comment-14392640 ] zhangduo commented on HBASE-13385: -- Test with {noformat} mvn clean test -Dtest=TestGenerateDelegationToken -Dhadoop-two.version=2.8.0-SNAPSHOT {noformat} Passed. TestGenerateDelegationToken is broken with hadoop 2.8.0 --- Key: HBASE-13385 URL: https://issues.apache.org/jira/browse/HBASE-13385 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13385.patch, HBASE-13385_v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0
[ https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392607#comment-14392607 ] zhangduo commented on HBASE-13385: -- What's this? {noformat} * Printing headers for files without AL header... === ==/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java.rej === --- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java +++ hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java @@ -41,6 +41,7 @@ import org.apache.hadoop.hbase.coordination.BaseCoordinatedStateManager; import org.apache.hadoop.hbase.coordination.RegionMergeCoordination.RegionMergeDetails; import org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos.RegionStateTransition.TransitionCode; +import org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.JournalEntryImpl; import org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.LoggingProgressable; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.util.ConfigUtil; * {noformat} TestGenerateDelegationToken is broken with hadoop 2.8.0 --- Key: HBASE-13385 URL: https://issues.apache.org/jira/browse/HBASE-13385 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13385.patch, HBASE-13385_v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0
[ https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392615#comment-14392615 ] zhangduo commented on HBASE-13385: -- Oh, it is caused by HBASE-12975. Other people have already reported it. TestGenerateDelegationToken is broken with hadoop 2.8.0 --- Key: HBASE-13385 URL: https://issues.apache.org/jira/browse/HBASE-13385 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13385.patch, HBASE-13385_v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12259) Bring quorum based write ahead log into HBase
[ https://issues.apache.org/jira/browse/HBASE-12259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392764#comment-14392764 ] zhangduo commented on HBASE-12259: -- Any progress here? Thanks. Bring quorum based write ahead log into HBase - Key: HBASE-12259 URL: https://issues.apache.org/jira/browse/HBASE-12259 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 2.0.0 Reporter: Elliott Clark Attachments: Architecture for HydraBase (5).pdf, RaftProtocolImplementationDesignDoc.pdf HydraBase ( https://code.facebook.com/posts/32638043166/hydrabase-the-evolution-of-hbase-facebook/ ) Facebook's implementation of HBase with Raft for consensus will be going open source shortly. We should pull in the parts of that fb-0.89 based implementation, and offer it as a feature in whatever next major release is next up. Right now the Hydrabase code base isn't ready to be released into the wild; it should be ready soon ( for some definition of soon). Since Hydrabase is based upon 0.89 most of the code is not directly applicable. So lots of work will probably need to be done in a feature branch before a merge vote. Is this something that's wanted? Is there anything clean up that needs to be done before the log implementation is able to be replaced like this? What's our story with upgrading to this? Are we ok with requiring down time ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13187) Add ITBLL that exercises per CF flush
[ https://issues.apache.org/jira/browse/HBASE-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13187: - Attachment: HBASE-13187_v1.patch Add existence check. Run it locally on master using command {noformat} mvn -Dit.test=IntegrationTestBigLinkedList -Dgenerator.multiple.columnfamilies=true verify {noformat} Passed. [~stack] Add ITBLL that exercises per CF flush - Key: HBASE-13187 URL: https://issues.apache.org/jira/browse/HBASE-13187 Project: HBase Issue Type: Task Components: integration tests Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.1.0 Attachments: 13187.txt, HBASE-13187_v1.patch Let me work on this. It would be excellent if we could have confidence to turn this on earlier rather than later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0
zhangduo created HBASE-13385: Summary: TestGenerateDelegationToken is broken with hadoop 2.8.0 Key: HBASE-13385 URL: https://issues.apache.org/jira/browse/HBASE-13385 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0
[ https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13385: - Attachment: HBASE-13385.patch Also start dfs cluster in secure mode. Copied some code from hdfs testcase. TestGenerateDelegationToken is broken with hadoop 2.8.0 --- Key: HBASE-13385 URL: https://issues.apache.org/jira/browse/HBASE-13385 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13385.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13371) Fix typo in TestAsyncIPC
[ https://issues.apache.org/jira/browse/HBASE-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13371: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Fix typo in TestAsyncIPC Key: HBASE-13371 URL: https://issues.apache.org/jira/browse/HBASE-13371 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13371.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13371) Fix typo in TestAsyncIPC
[ https://issues.apache.org/jira/browse/HBASE-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389966#comment-14389966 ] zhangduo commented on HBASE-13371: -- Pushed to master and branch-1. Thanks [~tedyu] for reviewing. Fix typo in TestAsyncIPC Key: HBASE-13371 URL: https://issues.apache.org/jira/browse/HBASE-13371 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13371.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13371) Fix typo in TestAsyncIPC
zhangduo created HBASE-13371: Summary: Fix typo in TestAsyncIPC Key: HBASE-13371 URL: https://issues.apache.org/jira/browse/HBASE-13371 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13371) Fix typo in TestAsyncIPC
[ https://issues.apache.org/jira/browse/HBASE-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13371: - Attachment: HBASE-13371.patch Forget to modify the auto-generated code. One line patch. Fix typo in TestAsyncIPC Key: HBASE-13371 URL: https://issues.apache.org/jira/browse/HBASE-13371 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13371.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13371) Fix typo in TestAsyncIPC
[ https://issues.apache.org/jira/browse/HBASE-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13371: - Status: Patch Available (was: Open) Fix typo in TestAsyncIPC Key: HBASE-13371 URL: https://issues.apache.org/jira/browse/HBASE-13371 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13371.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385682#comment-14385682 ] zhangduo commented on HBASE-13301: -- {quote} In btw a context switch t1 completed the caching and done evict and again cached same block.. This seems rarest of rare case. {quote} Agree. But HBase is a long running service, small probability events always occur if we keep it running long enough... Let me revisit the whole read write path in regionserver which relates to BlockCache and give a clear locking schema first. Then it is easier to say if the situation in this testcase could happen. Will come back later. Thanks. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13301-testcase.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13295) TestInfoServers hang
[ https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13295: - Resolution: Fixed Fix Version/s: 1.1.0 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) TestInfoServers hang Key: HBASE-13295 URL: https://issues.apache.org/jira/browse/HBASE-13295 Project: HBase Issue Type: Bug Components: test Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13295.patch https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt No progress after this line {noformat} 2015-03-19 22:46:06,809 INFO [main] hbase.TestInfoServers(127): Testing http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey= has Table action request accepted {noformat} I think the problem maybe we do not wait for master finish becoming active, and there is no timeout when doing http request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385133#comment-14385133 ] zhangduo commented on HBASE-13301: -- Thanks [~anoopsamjohn], and could you explain why this won't happen? And if this won’t happen, then maybe we just need a null check to confirm that the block has not been evicted by others yet? A 'get and check' still make people think that we could evict and cache again as what I did in the testcase... Thanks. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13301-testcase.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13295) TestInfoServers hang
[ https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383303#comment-14383303 ] zhangduo commented on HBASE-13295: -- Pushed to master and branch-1. TestInfoServers hang Key: HBASE-13295 URL: https://issues.apache.org/jira/browse/HBASE-13295 Project: HBase Issue Type: Bug Components: test Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13295.patch https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt No progress after this line {noformat} 2015-03-19 22:46:06,809 INFO [main] hbase.TestInfoServers(127): Testing http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey= has Table action request accepted {noformat} I think the problem maybe we do not wait for master finish becoming active, and there is no timeout when doing http request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13295) TestInfoServers hang
[ https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383267#comment-14383267 ] zhangduo commented on HBASE-13295: -- Let me pick this up. At least fix for master and branch-1. TestInfoServers hang Key: HBASE-13295 URL: https://issues.apache.org/jira/browse/HBASE-13295 Project: HBase Issue Type: Bug Components: test Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13295.patch https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt No progress after this line {noformat} 2015-03-19 22:46:06,809 INFO [main] hbase.TestInfoServers(127): Testing http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey= has Table action request accepted {noformat} I think the problem maybe we do not wait for master finish becoming active, and there is no timeout when doing http request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381237#comment-14381237 ] zhangduo commented on HBASE-13301: -- [~ram_krish] Yes, get and compare then remove is perfect. But it is not straight forward in this case. We do have a IdLock, but it is only used in get and evict, and the lock key is offset, not BlockCacheKey. And my wonder is whether the error in this testcase could happen in real world. Maybe the access pattern we use can avoid this error? I do not know... Anyway, remove first then compare is not a good idea I think. If we can enter the 'not equals' branch then no doubt, it is a bug. And if we never enter the 'not equals' branch, then why we need the compare... Thanks. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13301-testcase.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13310) Fix high priority findbugs warnings
[ https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375864#comment-14375864 ] zhangduo commented on HBASE-13310: -- Put the patch on reviewboard. Will commit it if there is no objection when I come back. Thanks. Fix high priority findbugs warnings --- Key: HBASE-13310 URL: https://issues.apache.org/jira/browse/HBASE-13310 Project: HBase Issue Type: Task Affects Versions: 2.0.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0 Attachments: HBASE-13310.patch, HBASE-13310_v1.patch, HBASE-13310_v1.patch See here. https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/ High priority warnings usually introduce bugs or have very bad impact on performace. Let's fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13257) Show coverage report on jenkins
[ https://issues.apache.org/jira/browse/HBASE-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo resolved HBASE-13257. -- Resolution: Fixed Fix Version/s: 2.0.0 Show coverage report on jenkins --- Key: HBASE-13257 URL: https://issues.apache.org/jira/browse/HBASE-13257 Project: HBase Issue Type: Task Reporter: zhangduo Assignee: zhangduo Priority: Minor Fix For: 2.0.0 Think of showing jacoco coverage report on https://builds.apache.org . And there is an advantage of showing it on jenkins that the jenkins jacoco plugin can handle cross module coverage. Can not do it locally since https://github.com/jacoco/jacoco/pull/97 is still pending. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings
[ https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13310: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to master. Thanks [~eclark] and [~tedyu]. Fix high priority findbugs warnings --- Key: HBASE-13310 URL: https://issues.apache.org/jira/browse/HBASE-13310 Project: HBase Issue Type: Task Affects Versions: 2.0.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0 Attachments: HBASE-13310.patch, HBASE-13310_v1.patch, HBASE-13310_v1.patch See here. https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/ High priority warnings usually introduce bugs or have very bad impact on performace. Let's fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings
[ https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13310: - Attachment: HBASE-13310_v1.patch Fix the stupid NPE... Fix high priority findbugs warnings --- Key: HBASE-13310 URL: https://issues.apache.org/jira/browse/HBASE-13310 Project: HBase Issue Type: Task Affects Versions: 2.0.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0 Attachments: HBASE-13310.patch, HBASE-13310_v1.patch See here. https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/ High priority warnings usually introduce bugs or have very bad impact on performace. Let's fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings
[ https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13310: - Attachment: HBASE-13310.patch Fix high priority findbugs warnings --- Key: HBASE-13310 URL: https://issues.apache.org/jira/browse/HBASE-13310 Project: HBase Issue Type: Task Affects Versions: 2.0.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0 Attachments: HBASE-13310.patch See here. https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/ High priority warnings usually introduce bugs or have very bad impact on performace. Let's fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings
[ https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13310: - Fix Version/s: 2.0.0 Affects Version/s: 2.0.0 Status: Patch Available (was: Open) Fix high priority findbugs warnings --- Key: HBASE-13310 URL: https://issues.apache.org/jira/browse/HBASE-13310 Project: HBase Issue Type: Task Affects Versions: 2.0.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0 Attachments: HBASE-13310.patch See here. https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/ High priority warnings usually introduce bugs or have very bad impact on performace. Let's fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings
[ https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13310: - Attachment: HBASE-13310_v1.patch Do not know... I run it several times locally, it didn't hang... Try again. Fix high priority findbugs warnings --- Key: HBASE-13310 URL: https://issues.apache.org/jira/browse/HBASE-13310 Project: HBase Issue Type: Task Affects Versions: 2.0.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0 Attachments: HBASE-13310.patch, HBASE-13310_v1.patch, HBASE-13310_v1.patch See here. https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/ High priority warnings usually introduce bugs or have very bad impact on performace. Let's fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13295) TestInfoServers hang
[ https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13295: - Summary: TestInfoServers hang (was: TestInfoServers hung) TestInfoServers hang Key: HBASE-13295 URL: https://issues.apache.org/jira/browse/HBASE-13295 Project: HBase Issue Type: Bug Components: test Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13295.patch https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt No progress after this line {noformat} 2015-03-19 22:46:06,809 INFO [main] hbase.TestInfoServers(127): Testing http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey= has Table action request accepted {noformat} I think the problem maybe we do not wait for master finish becoming active, and there is no timeout when doing http request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction
[ https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13308: - Description: https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ First, we split 'e9eb97847340ea7c6b9616d63d62a784' to 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'. And then, we try to split 'abe1973ea732066b12d8e33fce12a951'. {noformat} 2015-03-21 03:58:46,970 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): Initiating region split for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,976 INFO [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.RSRpcServices(1596): Splitting testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.CompactSplitThread(259): Split requested for testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.. compaction_queue=(0:0), split_queue=1, merge_queue=0 2015-03-21 03:58:46,978 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): blocking until region is split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(328): Released /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global event loop group NioEventLoopGroup 2015-03-21 03:58:46,988 INFO [RS:0;priapus:54177-splits-1426910324832] regionserver.SplitRequest(142): Split transaction journal: STARTED at 1426910326977 {noformat} We can see that it failed without any error message. I think can only happen when the parent is not splittable or we can not find a splitrow. {noformat} 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in family of testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 12.6 K. This selection was in queue for 0sec, and took 0sec to execute. 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: Request = regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951., storeName=family, fileCount=2, fileSize=25.5 K, priority=1, time=14542808784655186; duration=0sec 2015-03-21 03:58:47,020 DEBUG [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread Status: compaction_queue=(0:0), split_queue=0, merge_queue=0 {noformat} We can see that, the compaction was completed at 03:58:47,019, but split was started at 03:58:46,970 which is earlier. So we have a reference file and is not splittable. I think the problem is 'compactAndBlockUntilDone' is not reliable, it may return before the compaction complete. Will try to prepare a patch. was: https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ First, we split 'e9eb97847340ea7c6b9616d63d62a784' to 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'. And then, we try to split 'abe1973ea732066b12d8e33fce12a951'. {noformat} 2015-03-21 03:58:46,970 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): Initiating region split for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,976 INFO [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.RSRpcServices(1596): Splitting testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.CompactSplitThread(259): Split requested for testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.. compaction_queue=(0:0), split_queue=1, merge_queue=0 2015-03-21 03:58:46,978 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): blocking until region is split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,985 DEBUG
[jira] [Created] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction
zhangduo created HBASE-13308: Summary: Fix flaky TestEndToEndSplitTransaction Key: HBASE-13308 URL: https://issues.apache.org/jira/browse/HBASE-13308 Project: HBase Issue Type: Bug Components: test Reporter: zhangduo Assignee: zhangduo https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ First, we split 'e9eb97847340ea7c6b9616d63d62a784' to 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'. And then, we try to split 'abe1973ea732066b12d8e33fce12a951'. {noformat} 2015-03-21 03:58:46,970 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): Initiating region split for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,976 INFO [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.RSRpcServices(1596): Splitting testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.CompactSplitThread(259): Split requested for testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.. compaction_queue=(0:0), split_queue=1, merge_queue=0 2015-03-21 03:58:46,978 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): blocking until region is split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(328): Released /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global event loop group NioEventLoopGroup 2015-03-21 03:58:46,988 INFO [RS:0;priapus:54177-splits-1426910324832] regionserver.SplitRequest(142): Split transaction journal: STARTED at 1426910326977 {noformat} We can see that it failed without any error message. I think can only happen when the parent is not splittable or we can not find a splitrow. {noformat} 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in family of testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 12.6 K. This selection was in queue for 0sec, and took 0sec to execute. 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: Request = regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951., storeName=family, fileCount=2, fileSize=25.5 K, priority=1, time=14542808784655186; duration=0sec 2015-03-21 03:58:47,020 DEBUG [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread Status: compaction_queue=(0:0), split_queue=0, merge_queue=0 {noformat} We can see that, the compaction was completed at 03:58:47,019, but split is started at 03:58:46,970 which is earlier. So we have a reference file and is not splittable. I think the problem is 'compactAndBlockUntilDone' is not reliable, it may return before the compaction complete. Will try to prepare a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction
[ https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372611#comment-14372611 ] zhangduo commented on HBASE-13308: -- This is our 'compactAndBlockUntilDone' method. {code:title=TestEndToEndSplitTransaction.java} public static void compactAndBlockUntilDone(Admin admin, HRegionServer rs, byte[] regionName) throws IOException, InterruptedException { log(Compacting region: + Bytes.toStringBinary(regionName)); admin.majorCompactRegion(regionName); log(blocking until compaction is complete: + Bytes.toStringBinary(regionName)); Threads.sleepWithoutInterrupt(500); while (rs.compactSplitThread.getCompactionQueueSize() 0) { Threads.sleep(50); } } {code} It uses the thread pool's workQueue size as condition. But {code} public static void main(String[] args) throws InterruptedException { ThreadPoolExecutor pool = new ThreadPoolExecutor(1, 1, 60, TimeUnit.SECONDS, new LinkedBlockingQueueRunnable()); pool.execute(new Runnable() { @Override public void run() { try { Thread.currentThread().join(); } catch (InterruptedException e) {} } }); Thread.sleep(2000); System.out.println(pool.getActiveCount()); System.out.println(pool.getQueue().size()); pool.shutdownNow(); } {code} The output is {noformat} 1 0 {noformat} A thread pool's queue size does not include the running tasks. So if there is only one running compaction, then the compaction queue size will be zero... So, it is not safe to use compaction queue size as condition. Fix flaky TestEndToEndSplitTransaction -- Key: HBASE-13308 URL: https://issues.apache.org/jira/browse/HBASE-13308 Project: HBase Issue Type: Bug Components: test Reporter: zhangduo Assignee: zhangduo https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ First, we split 'e9eb97847340ea7c6b9616d63d62a784' to 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'. And then, we try to split 'abe1973ea732066b12d8e33fce12a951'. {noformat} 2015-03-21 03:58:46,970 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): Initiating region split for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,976 INFO [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.RSRpcServices(1596): Splitting testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.CompactSplitThread(259): Split requested for testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.. compaction_queue=(0:0), split_queue=1, merge_queue=0 2015-03-21 03:58:46,978 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): blocking until region is split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(328): Released /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global event loop group NioEventLoopGroup 2015-03-21 03:58:46,988 INFO [RS:0;priapus:54177-splits-1426910324832] regionserver.SplitRequest(142): Split transaction journal: STARTED at 1426910326977 {noformat} We can see that it failed without any error message. I think can only happen when the parent is not splittable or we can not find a splitrow. {noformat} 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in family of testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 12.6 K. This selection was in queue for 0sec, and took 0sec to execute. 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: Request = regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951., storeName=family, fileCount=2, fileSize=25.5 K, priority=1, time=14542808784655186; duration=0sec 2015-03-21
[jira] [Updated] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction
[ https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13308: - Attachment: HBASE-13308.patch Use memstore size and store file count as condition variable. Also cleanup old APIs. Fix flaky TestEndToEndSplitTransaction -- Key: HBASE-13308 URL: https://issues.apache.org/jira/browse/HBASE-13308 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13308.patch https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ First, we split 'e9eb97847340ea7c6b9616d63d62a784' to 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'. And then, we try to split 'abe1973ea732066b12d8e33fce12a951'. {noformat} 2015-03-21 03:58:46,970 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): Initiating region split for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,976 INFO [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.RSRpcServices(1596): Splitting testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.CompactSplitThread(259): Split requested for testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.. compaction_queue=(0:0), split_queue=1, merge_queue=0 2015-03-21 03:58:46,978 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): blocking until region is split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(328): Released /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global event loop group NioEventLoopGroup 2015-03-21 03:58:46,988 INFO [RS:0;priapus:54177-splits-1426910324832] regionserver.SplitRequest(142): Split transaction journal: STARTED at 1426910326977 {noformat} We can see that it failed without any error message. I think can only happen when the parent is not splittable or we can not find a splitrow. {noformat} 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in family of testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 12.6 K. This selection was in queue for 0sec, and took 0sec to execute. 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: Request = regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951., storeName=family, fileCount=2, fileSize=25.5 K, priority=1, time=14542808784655186; duration=0sec 2015-03-21 03:58:47,020 DEBUG [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread Status: compaction_queue=(0:0), split_queue=0, merge_queue=0 {noformat} We can see that, the compaction was completed at 03:58:47,019, but split was started at 03:58:46,970 which is earlier. So we have a reference file and is not splittable. I think the problem is 'compactAndBlockUntilDone' is not reliable, it may return before the compaction complete. Will try to prepare a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction
[ https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13308: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Fix flaky TestEndToEndSplitTransaction -- Key: HBASE-13308 URL: https://issues.apache.org/jira/browse/HBASE-13308 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13308.patch https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ First, we split 'e9eb97847340ea7c6b9616d63d62a784' to 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'. And then, we try to split 'abe1973ea732066b12d8e33fce12a951'. {noformat} 2015-03-21 03:58:46,970 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): Initiating region split for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,976 INFO [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.RSRpcServices(1596): Splitting testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.CompactSplitThread(259): Split requested for testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.. compaction_queue=(0:0), split_queue=1, merge_queue=0 2015-03-21 03:58:46,978 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): blocking until region is split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(328): Released /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global event loop group NioEventLoopGroup 2015-03-21 03:58:46,988 INFO [RS:0;priapus:54177-splits-1426910324832] regionserver.SplitRequest(142): Split transaction journal: STARTED at 1426910326977 {noformat} We can see that it failed without any error message. I think can only happen when the parent is not splittable or we can not find a splitrow. {noformat} 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in family of testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 12.6 K. This selection was in queue for 0sec, and took 0sec to execute. 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: Request = regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951., storeName=family, fileCount=2, fileSize=25.5 K, priority=1, time=14542808784655186; duration=0sec 2015-03-21 03:58:47,020 DEBUG [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread Status: compaction_queue=(0:0), split_queue=0, merge_queue=0 {noformat} We can see that, the compaction was completed at 03:58:47,019, but split was started at 03:58:46,970 which is earlier. So we have a reference file and is not splittable. I think the problem is 'compactAndBlockUntilDone' is not reliable, it may return before the compaction complete. Will try to prepare a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction
[ https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372723#comment-14372723 ] zhangduo commented on HBASE-13308: -- Pushed to master and branch-1. Thanks [~tedyu] for reviewing. Fix flaky TestEndToEndSplitTransaction -- Key: HBASE-13308 URL: https://issues.apache.org/jira/browse/HBASE-13308 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13308.patch https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/ First, we split 'e9eb97847340ea7c6b9616d63d62a784' to 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'. And then, we try to split 'abe1973ea732066b12d8e33fce12a951'. {noformat} 2015-03-21 03:58:46,970 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): Initiating region split for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,976 INFO [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.RSRpcServices(1596): Splitting testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] regionserver.CompactSplitThread(259): Split requested for testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.. compaction_queue=(0:0), split_queue=1, merge_queue=0 2015-03-21 03:58:46,978 INFO [Thread-191] regionserver.TestEndToEndSplitTransaction(399): blocking until region is split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] lock.ZKInterProcessLockBase(328): Released /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global event loop group NioEventLoopGroup 2015-03-21 03:58:46,988 INFO [RS:0;priapus:54177-splits-1426910324832] regionserver.SplitRequest(142): Split transaction journal: STARTED at 1426910326977 {noformat} We can see that it failed without any error message. I think can only happen when the parent is not splittable or we can not find a splitrow. {noformat} 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in family of testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951. into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 12.6 K. This selection was in queue for 0sec, and took 0sec to execute. 2015-03-21 03:58:47,019 INFO [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: Request = regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951., storeName=family, fileCount=2, fileSize=25.5 K, priority=1, time=14542808784655186; duration=0sec 2015-03-21 03:58:47,020 DEBUG [RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread Status: compaction_queue=(0:0), split_queue=0, merge_queue=0 {noformat} We can see that, the compaction was completed at 03:58:47,019, but split was started at 03:58:46,970 which is earlier. So we have a reference file and is not splittable. I think the problem is 'compactAndBlockUntilDone' is not reliable, it may return before the compaction complete. Will try to prepare a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13310) Fix high priority findbugs warnings
zhangduo created HBASE-13310: Summary: Fix high priority findbugs warnings Key: HBASE-13310 URL: https://issues.apache.org/jira/browse/HBASE-13310 Project: HBase Issue Type: Task Reporter: zhangduo Assignee: zhangduo See here. https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/ High priority warnings usually introduce bugs or have very bad impact on performace. Let's fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13257) Show coverage report on jenkins
[ https://issues.apache.org/jira/browse/HBASE-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372376#comment-14372376 ] zhangduo commented on HBASE-13257: -- Let me try a few more times this weekend. The build results are red most time now... So when I finish, I could just change the config of HBase-TRUNK and remove HBase-TRUNK-jacoco? Thanks. Show coverage report on jenkins --- Key: HBASE-13257 URL: https://issues.apache.org/jira/browse/HBASE-13257 Project: HBase Issue Type: Task Reporter: zhangduo Assignee: zhangduo Priority: Minor Think of showing jacoco coverage report on https://builds.apache.org . And there is an advantage of showing it on jenkins that the jenkins jacoco plugin can handle cross module coverage. Can not do it locally since https://github.com/jacoco/jacoco/pull/97 is still pending. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13295) TestInfoServers hung
[ https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372403#comment-14372403 ] zhangduo commented on HBASE-13295: -- I think this patch could be applied to all branches? [~apurtell] [~enis] Thanks. TestInfoServers hung Key: HBASE-13295 URL: https://issues.apache.org/jira/browse/HBASE-13295 Project: HBase Issue Type: Bug Components: test Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13295.patch https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt No progress after this line {noformat} 2015-03-19 22:46:06,809 INFO [main] hbase.TestInfoServers(127): Testing http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey= has Table action request accepted {noformat} I think the problem maybe we do not wait for master finish becoming active, and there is no timeout when doing http request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13258) Promote TestHRegion to LargeTests
[ https://issues.apache.org/jira/browse/HBASE-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13258: - Resolution: Fixed Fix Version/s: 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Resolve this since it had already been pushed to master several days ago. Can open backport issue if we want to integration jacoco report to other branches. Promote TestHRegion to LargeTests - Key: HBASE-13258 URL: https://issues.apache.org/jira/browse/HBASE-13258 Project: HBase Issue Type: Sub-task Components: test Reporter: zhangduo Assignee: zhangduo Fix For: 2.0.0 Attachments: HBASE-13258-addendum.patch, HBASE-13258.patch, HBASE-13258.patch It always timeout we I tried to get a coverage report locally. The problem is testWritesWhileGetting, it runs extremely slow when jacoco agent enabled(not a bug, there is progress). Since it has a VerySlowRegionServerTests annotation on it, I think it is OK to promote it to LargeTests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-13301: - Attachment: HBASE-13301-testcase.patch Only a testcase. A little tricky so I really need other ones help confirming the problem. The flow is: t1 cache a block t2 evict the block but stopped before acquiring offsetLock(this is done by holding the offsetLock with t1 in this testcase) t1 evict the block, and cache the block again. t2 continue evicting the block and find that it is not the block which it should deal with, so just give up and return false. Then we have blockCount=1, and some used spaces in BucketAllocator, but no block in BucketCache. So we have no chance to free the used spaces. Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13301-testcase.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache
[ https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371352#comment-14371352 ] zhangduo commented on HBASE-13301: -- Do not submit the patch since it is not a fix. Experts needed. [~stack] (I do not know who is the right person since [~zjushch] seems not active for a long time, so...) Possible memory leak in BucketCache --- Key: HBASE-13301 URL: https://issues.apache.org/jira/browse/HBASE-13301 Project: HBase Issue Type: Bug Components: BlockCache Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13301-testcase.patch {code:title=BucketCache.java} public boolean evictBlock(BlockCacheKey cacheKey) { ... if (bucketEntry.equals(backingMap.remove(cacheKey))) { bucketAllocator.freeBlock(bucketEntry.offset()); realCacheSize.addAndGet(-1 * bucketEntry.getLength()); blocksByHFile.remove(cacheKey.getHfileName(), cacheKey); if (removedBlock == null) { this.blockNumber.decrementAndGet(); } } else { return false; } ... {code} I think the problem is here. We remove a BucketEntry that should not be removed by us, but we do not put it back and also do not do any clean up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)