[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants

2015-05-24 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557927#comment-14557927
 ] 

zhangduo commented on HBASE-13716:
--

Seems we updated the version of findbugs? 2.0.3-3.0.0?

 Stop using Hadoop's FSConstants
 ---

 Key: HBASE-13716
 URL: https://issues.apache.org/jira/browse/HBASE-13716
 Project: HBase
  Issue Type: Task
Affects Versions: 1.0.0, 1.1.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1

 Attachments: HBASE-13716.1.patch


 the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 
 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off 
 of it sooner rather htan later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12451) IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits in rolling update of cluster

2015-05-21 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555374#comment-14555374
 ] 

zhangduo commented on HBASE-12451:
--

Maybe we could make a new split policy?
Make getCountOfCommonTableRegions as an abstract protected method, the old 
IncreasingToUpperBoundRegionSplitPolicy just use the old implementation which 
only counts region locally and our new policy will fetch information from 
master?

 IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits 
 in rolling update of cluster
 

 Key: HBASE-12451
 URL: https://issues.apache.org/jira/browse/HBASE-12451
 Project: HBase
  Issue Type: Bug
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Fix For: 2.0.0

 Attachments: HBASE-12451-v1.diff, HBASE-12451-v2.diff


 Currently IncreasingToUpperBoundRegionSplitPolicy is the default region split 
 policy. In this policy, split size is the number of regions that are on this 
 server that all are of the same table, cubed, times 2x the region flush size.
 But when unloading regions of a regionserver in a cluster using 
 region_mover.rb, the number of regions that are on this server that all are 
 of the same table will decrease, and the split size will decrease too, which 
 may cause the left region split in the regionsever. Region Splits also 
 happens when loading regions of a regionserver in a cluster. 
 A improvment may set a minimum split size in 
 IncreasingToUpperBoundRegionSplitPolicy
 Suggestions are welcomed. Thanks~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants

2015-05-20 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551923#comment-14551923
 ] 

zhangduo commented on HBASE-13716:
--

There is an {{HdfsUtils.isHealthy(URI)}} method in hdfs. At least it has been 
introduced in hadoop-2.2.0. Could we make use of this method instead of calling 
{{DistributedFileSystem.setSafeMode}}?

 Stop using Hadoop's FSConstants
 ---

 Key: HBASE-13716
 URL: https://issues.apache.org/jira/browse/HBASE-13716
 Project: HBase
  Issue Type: Task
Affects Versions: 1.0.0, 1.1.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1


 the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 
 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off 
 of it sooner rather htan later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants

2015-05-20 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552264#comment-14552264
 ] 

zhangduo commented on HBASE-13716:
--

{quote}
 I also have an open request on the HDFs ticket for what we're supposed to use. 
It could use more details about what we're trying to check.
{quote}
Do you mean open an HDFS issue that add methods for HBase?

 Stop using Hadoop's FSConstants
 ---

 Key: HBASE-13716
 URL: https://issues.apache.org/jira/browse/HBASE-13716
 Project: HBase
  Issue Type: Task
Affects Versions: 1.0.0, 1.1.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1


 the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 
 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off 
 of it sooner rather htan later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants

2015-05-20 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553409#comment-14553409
 ] 

zhangduo commented on HBASE-13716:
--

+1 for now.

Add I check the code again, {{HdfsUtils.isHealthy(URI)}} calls 
{{DistributedFileSystem.setSafeMode(GET, false)}}, but in HBase we calls 
{{DistributedFileSystem.setSafeMode(GET, true)}}. I think the difference is 
when the second parameter is true then BackupNN will throw a StandByException 
that force client to connect to ActiveNN.

If we must connect to ActiveNN in HBase, then {{HdfsUtils.isHealthy(URI)}} is 
not enough. So add new methods in {{HdfsUtils}}?

 Stop using Hadoop's FSConstants
 ---

 Key: HBASE-13716
 URL: https://issues.apache.org/jira/browse/HBASE-13716
 Project: HBase
  Issue Type: Task
Affects Versions: 1.0.0, 1.1.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1

 Attachments: HBASE-13716.1.patch


 the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 
 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off 
 of it sooner rather htan later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13716) Stop using Hadoop's FSConstants

2015-05-19 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551848#comment-14551848
 ] 

zhangduo commented on HBASE-13716:
--

There are two FSConstants...
One is {{org.apache.hadoop.fs.FsConstants}}, and the other is 
{{org.apache.hadoop.hdfs.protocol.FSConstants}}.
The former one is marked as public and the latter one is what HDFS-8135 wants 
to remove.

There is only one place where we use 
{{org.apache.hadoop.hdfs.protocol.FSConstants}}. FSUtils calls 
{{DistributedFileSystem.setSafeMode}}. We could just replace it with 
{{HdfsConstants}}. But {{HdfsConstants}} is marked as private. Is there any 
other ways to check if an HDFS is in safe mode?

 Stop using Hadoop's FSConstants
 ---

 Key: HBASE-13716
 URL: https://issues.apache.org/jira/browse/HBASE-13716
 Project: HBase
  Issue Type: Task
Affects Versions: 1.0.0, 1.1.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1


 the FSConstants class was removed in HDFS-8135 (currently slated for Hadoop 
 2.8.0). I'm trying to have it reverted in branch-2, but we should migrate off 
 of it sooner rather htan later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13637) branch-1.1 does not build against hadoop-2.2.

2015-05-11 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538860#comment-14538860
 ] 

zhangduo commented on HBASE-13637:
--

Just wake up...
Thanks [~ndimiduk] for the quick fix.
+1 on the new patch.

 branch-1.1 does not build against hadoop-2.2.
 -

 Key: HBASE-13637
 URL: https://issues.apache.org/jira/browse/HBASE-13637
 Project: HBase
  Issue Type: Bug
Reporter: Nick Dimiduk
Assignee: zhangduo
 Fix For: 1.1.0, 1.2.0

 Attachments: HBASE-13637-branch-1.1.01.patch, 
 HBASE-13637-branch-1.1.patch


 From RC0 VOTE thread,
 {quote}
 The build is broken with Hadoop-2.2 because mini-kdc is not found:
 \[ERROR\] Failed to execute goal on project hbase-server: Could not resolve 
 dependencies for project org.apache.hbase:hbase-server:jar:1.1.0: Could not 
 find artifact org.apache.hadoop:hadoop-minikdc:jar:2.2
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13653) Uninitialized HRegionServer#walFactory may result in NullPointerException at region server startup​

2015-05-08 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536110#comment-14536110
 ] 

zhangduo commented on HBASE-13653:
--

There maybe a race condition between calling reportForDuty and 
handleReportForDutyResponse.
But there are lots of things which are not initialized if 
handleReportForDutyResponse is not finished, not only walFactory. So I'm not 
sure if the fix in this patch is enough.

 Uninitialized HRegionServer#walFactory may result in NullPointerException at 
 region server startup​
 ---

 Key: HBASE-13653
 URL: https://issues.apache.org/jira/browse/HBASE-13653
 Project: HBase
  Issue Type: Bug
  Components: hbase
Reporter: Romil Choksi
Assignee: Ted Yu
 Attachments: 13653-branch-1.txt


 hbase --config /tmp/hbaseConf org.apache.hadoop.hbase.IntegrationTestIngest 
 --monkey unbalance
 causes NPE
 {code}
 2015-05-08 08:44:20,885 ERROR 
 [B.defaultRpcServer.handler=28,queue=1,port=16000] master.ServerManager: 
 Received exception in RPC for warmup 
 server:RegionServer1,16020,1431074656202region: {ENCODED = 
 40133c823b6d9d9dece99db1aad62730, NAME = 
 'SYSTEM.SEQUENCE,2\x00\x00\x00,1431070054641.40133c823b6d9d9dece99db1aad62730.',
  STARTKEY = '2\x00\x00\x00', ENDKEY = '3\x00\x00\x00'}exception: 
 java.io.IOException: java.io.IOException
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2154)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
   at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:1825)
   at 
 org.apache.hadoop.hbase.regionserver.RSRpcServices.warmupRegion(RSRpcServices.java:1559)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:21997)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
   ... 4 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue

2015-05-06 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13628:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to all branches.

Thanks [~apurtell] and [~stack].

 Use AtomicLong as size in BoundedConcurrentLinkedQueue
 --

 Key: HBASE-13628
 URL: https://issues.apache.org/jira/browse/HBASE-13628
 Project: HBase
  Issue Type: Bug
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1

 Attachments: HBASE-13628.patch


 Remove the high priority findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue

2015-05-06 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530030#comment-14530030
 ] 

zhangduo commented on HBASE-13628:
--

{code}
for (T element; (element = super.poll()) != null;) {
{code}
This is reported by checkstyle as a 'InnerAssignment' issue.

This is a common style when polling from queue so I think it is fine?

 Use AtomicLong as size in BoundedConcurrentLinkedQueue
 --

 Key: HBASE-13628
 URL: https://issues.apache.org/jira/browse/HBASE-13628
 Project: HBase
  Issue Type: Bug
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1

 Attachments: HBASE-13628.patch


 Remove the high priority findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue

2015-05-06 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530039#comment-14530039
 ] 

zhangduo commented on HBASE-13628:
--

OK, let me commit.

 Use AtomicLong as size in BoundedConcurrentLinkedQueue
 --

 Key: HBASE-13628
 URL: https://issues.apache.org/jira/browse/HBASE-13628
 Project: HBase
  Issue Type: Bug
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1

 Attachments: HBASE-13628.patch


 Remove the high priority findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13637) branch-1.1 does not build against hadoop-2.2.

2015-05-06 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531741#comment-14531741
 ] 

zhangduo commented on HBASE-13637:
--

{quote}
It seems that 2.2 does not contain mini KDC at all
{quote}

Yes, mini-kdc is first introduced in hadoop 2.3.
But as [~apurtell] said before, mini-kdc does not depends on any other hadoop 
modules.
See here
http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-minikdc/2.7.0

So we could give mini-kdc a separated version instead of the common 
hadoop-two.version? I can prepare a patch for it.

Thanks.

 branch-1.1 does not build against hadoop-2.2.
 -

 Key: HBASE-13637
 URL: https://issues.apache.org/jira/browse/HBASE-13637
 Project: HBase
  Issue Type: Bug
Reporter: Nick Dimiduk
 Fix For: 1.1.0


 From RC0 VOTE thread,
 {quote}
 The build is broken with Hadoop-2.2 because mini-kdc is not found:
 \[ERROR\] Failed to execute goal on project hbase-server: Could not resolve 
 dependencies for project org.apache.hbase:hbase-server:jar:1.1.0: Could not 
 find artifact org.apache.hadoop:hadoop-minikdc:jar:2.2
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13637) branch-1.1 does not build against hadoop-2.2.

2015-05-06 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13637:
-
Attachment: HBASE-13637-branch-1.1.patch

Tried locally with

{noformat}
mvn clean package -Dhadoop-two.version=2.2.0 -DskipTests
{noformat}

Passed.

 branch-1.1 does not build against hadoop-2.2.
 -

 Key: HBASE-13637
 URL: https://issues.apache.org/jira/browse/HBASE-13637
 Project: HBase
  Issue Type: Bug
Reporter: Nick Dimiduk
 Fix For: 1.1.0

 Attachments: HBASE-13637-branch-1.1.patch


 From RC0 VOTE thread,
 {quote}
 The build is broken with Hadoop-2.2 because mini-kdc is not found:
 \[ERROR\] Failed to execute goal on project hbase-server: Could not resolve 
 dependencies for project org.apache.hbase:hbase-server:jar:1.1.0: Could not 
 find artifact org.apache.hadoop:hadoop-minikdc:jar:2.2
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13420) RegionEnvironment.offerExecutionLatency Blocks Threads under Heavy Load

2015-05-05 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529640#comment-14529640
 ] 

zhangduo commented on HBASE-13420:
--

This patch introduces 3 high priority findbugs warnings(VO_VOLATILE_INCREMENT).
Mind I open a issue to change 'size' from volatile to AtomicLong? [~apurtell].
Thanks.

 RegionEnvironment.offerExecutionLatency Blocks Threads under Heavy Load
 ---

 Key: HBASE-13420
 URL: https://issues.apache.org/jira/browse/HBASE-13420
 Project: HBase
  Issue Type: Improvement
Reporter: John Leach
Assignee: Andrew Purtell
 Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1

 Attachments: 1M-0.98.12.svg, 1M-0.98.13-SNAPSHOT.svg, 
 HBASE-13420.patch, HBASE-13420.txt, hbase-13420.tar.gz, 
 offerExecutionLatency.tiff

   Original Estimate: 3h
  Remaining Estimate: 3h

 The ArrayBlockingQueue blocks threads for 20s during a performance run 
 focusing on creating numerous small scans.  
 I see a buffer size of (100)
 private final BlockingQueueLong coprocessorTimeNanos = new 
 ArrayBlockingQueueLong(
 LATENCY_BUFFER_SIZE);
 and then I see a drain coming from
  MetricsRegionWrapperImpl with 45 second executor
  HRegionMetricsWrapperRunable
  RegionCoprocessorHost#getCoprocessorExecutionStatistics()   
  RegionCoprocessorHost#getExecutionLatenciesNanos()
 Am I missing something?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue

2015-05-05 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13628:
-
Status: Patch Available  (was: Open)

 Use AtomicLong as size in BoundedConcurrentLinkedQueue
 --

 Key: HBASE-13628
 URL: https://issues.apache.org/jira/browse/HBASE-13628
 Project: HBase
  Issue Type: Bug
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13628.patch


 Remove the high priority findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue

2015-05-05 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13628:
-
Attachment: HBASE-13628.patch

 Use AtomicLong as size in BoundedConcurrentLinkedQueue
 --

 Key: HBASE-13628
 URL: https://issues.apache.org/jira/browse/HBASE-13628
 Project: HBase
  Issue Type: Bug
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13628.patch


 Remove the high priority findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10800) Use CellComparator instead of KVComparator

2015-05-05 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529659#comment-14529659
 ] 

zhangduo commented on HBASE-10800:
--

https://builds.apache.org/job/HBase-TRUNK/6456/findbugsResult/new/HIGH/

Should be a bug?

{code:title=BufferedDataBlockEncoder.java}
if (comparator != null) {
  ...
} else {
  Cell r = new KeyValue.KeyOnlyKeyValue(current.keyBuffer, 0, 
current.keyLength);
  comp = comparator.compareKeyIgnoresMvcc(seekCell, r);  //  NPE?
}
{code}

 Use CellComparator instead of KVComparator
 --

 Key: HBASE-10800
 URL: https://issues.apache.org/jira/browse/HBASE-10800
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-10800_1.patch, HBASE-10800_11.patch, 
 HBASE-10800_12.patch, HBASE-10800_13.patch, HBASE-10800_14.patch, 
 HBASE-10800_15.patch, HBASE-10800_16.patch, HBASE-10800_18.patch, 
 HBASE-10800_2.patch, HBASE-10800_23.patch, HBASE-10800_23.patch, 
 HBASE-10800_24.patch, HBASE-10800_25.patch, HBASE-10800_26.patch, 
 HBASE-10800_27.patch, HBASE-10800_28.patch, HBASE-10800_29.patch, 
 HBASE-10800_3.patch, HBASE-10800_4.patch, HBASE-10800_4.patch, 
 HBASE-10800_5.patch, HBASE-10800_6.patch, HBASE-10800_7.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13628) Use AtomicLong as size in BoundedConcurrentLinkedQueue

2015-05-05 Thread zhangduo (JIRA)
zhangduo created HBASE-13628:


 Summary: Use AtomicLong as size in BoundedConcurrentLinkedQueue
 Key: HBASE-13628
 URL: https://issues.apache.org/jira/browse/HBASE-13628
 Project: HBase
  Issue Type: Bug
Reporter: zhangduo
Assignee: zhangduo


Remove the high priority findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13609) TestFastFail is still failing

2015-05-02 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525246#comment-14525246
 ] 

zhangduo commented on HBASE-13609:
--

Just remove the last assert for now?

numBlockedWorkers increments if the requestTime  1s, this is not a stable 
condition I think.

 TestFastFail is still failing
 -

 Key: HBASE-13609
 URL: https://issues.apache.org/jira/browse/HBASE-13609
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.0
Reporter: Nick Dimiduk

 {noformat}
 testFastFail(org.apache.hadoop.hbase.client.TestFastFail)  Time elapsed: 
 13.106 sec   FAILURE!
 java.lang.AssertionError: Only few thread should ideally be waiting for the 
 dead regionserver to be coming back. numBlockedWorkers:15 threads that 
 retried : 2
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.assertTrue(Assert.java:41)
 at 
 org.apache.hadoop.hbase.client.TestFastFail.testFastFail(TestFastFail.java:288)
 {noformat}
 This is failing consistently for me locally. Sometimes it's 15, sometimes 
 it's 5, sometimes 26.
 We've seen this one before, HBASE-12771, HBASE-12881.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13530) Add param for bulkload wait duration in HRegion.

2015-04-25 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512372#comment-14512372
 ] 

zhangduo commented on HBASE-13530:
--

[~victorunique] So we need to find the real operation that locks region for a 
long time?

The ReentrantReadWriteLock in java does have the problem as what you said, if 
one thread is waiting on getting WriteLock, then all following operations of 
getting ReadLock will be blocked even in unfair mode.

But there are lots of other operations other than bulkload that need to hold 
WriteLock and lots of them do not have a timeout. So this is a general problem, 
we need to find a general way to solve it.

Thanks.

 Add param for bulkload wait duration in HRegion.
 

 Key: HBASE-13530
 URL: https://issues.apache.org/jira/browse/HBASE-13530
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Victor Xu
Priority: Minor
 Fix For: 2.0.0, 0.98.13

 Attachments: HBASE-13530-0.98-v2.patch, HBASE-13530-0.98.patch, 
 HBASE-13530-master-v2.patch, HBASE-13530-master.patch, HBASE-13530-v1.patch


 In our scenario, incremental read/write operations and complete bulkload 
 operations are mixed together. Bulkload needs write lock while read/write and 
 flush/compact need read lock. When a region is compacting, the bulkload could 
 hang at writeLock.tryLock(waitDuration, TimeUnit) method. The original 
 default waitDuration is 60sec (from 'hbase.busy.wait.duration'), and this 
 could block all read/writes operations from acquiring read lock for 1 minute. 
 The chances of this scenario become high when compaction speed limit 
 feature(HBASE-8329) is used.
 Maybe we need to decrease the wait duration ONLY for bulkload, and let 
 read/write keep theirs. So I add this param('hbase.bulkload.wait.duration') 
 to tune wait duration for bulkloading. Of course, it is a table level 
 setting, and the default value is from original logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13530) Add param for bulkload wait duration in HRegion.

2015-04-24 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510575#comment-14510575
 ] 

zhangduo commented on HBASE-13530:
--

I think we do not hold readLock during the whole flush or compaction life time?

 Add param for bulkload wait duration in HRegion.
 

 Key: HBASE-13530
 URL: https://issues.apache.org/jira/browse/HBASE-13530
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Victor Xu
Priority: Minor
 Fix For: 2.0.0, 0.98.13

 Attachments: HBASE-13530-0.98-v2.patch, HBASE-13530-0.98.patch, 
 HBASE-13530-master-v2.patch, HBASE-13530-master.patch, HBASE-13530-v1.patch


 In our scenario, incremental read/write operations and complete bulkload 
 operations are mixed together. Bulkload needs write lock while read/write and 
 flush/compact need read lock. When a region is compacting, the bulkload could 
 hang at writeLock.tryLock(waitDuration, TimeUnit) method. The original 
 default waitDuration is 60sec (from 'hbase.busy.wait.duration'), and this 
 could block all read/writes operations from acquiring read lock for 1 minute. 
 The chances of this scenario become high when compaction speed limit 
 feature(HBASE-8329) is used.
 Maybe we need to decrease the wait duration ONLY for bulkload, and let 
 read/write keep theirs. So I add this param('hbase.bulkload.wait.duration') 
 to tune wait duration for bulkloading. Of course, it is a table level 
 setting, and the default value is from original logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool

2015-04-24 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510839#comment-14510839
 ] 

zhangduo commented on HBASE-13528:
--

[~enis] haven't replied yet...Is it safe to commit to branch-1.0? Seems an RC 
is on-going...

 A bug on selecting compaction pool
 --

 Key: HBASE-13528
 URL: https://issues.apache.org/jira/browse/HBASE-13528
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.98.12
Reporter: Shuaifeng Zhou
Assignee: Shuaifeng Zhou
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2, 1.2.0

 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, 
 HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, 
 HBASE-13528-master.patch


 When the selectNow == true, in requestCompactionInternal, the compaction pool 
 section is incorrect.
 as discussed in:
 http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13529) Procedure v2 - WAL Improvements

2015-04-22 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507028#comment-14507028
 ] 

zhangduo commented on HBASE-13529:
--

What about LinkedTransferQueue?

 Procedure v2 - WAL Improvements
 ---

 Key: HBASE-13529
 URL: https://issues.apache.org/jira/browse/HBASE-13529
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.1.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13529-v0.patch, ProcedureStoreTest.java


 from the discussion in HBASE-12439 the wal was resulting slow.
  * there is an error around the awake of the slotCond.await(), causing more 
 wait then necessary
  * ArrayBlockingQueue is dog slow, replace it with ConcurrentLinkedQueue
  * roll the wal only if reaches a threshold (conf ops) to amortize the cost
  * hsync() is used by default, when the normal wal is using just hflush() 
 make it tunable via conf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool

2015-04-22 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506524#comment-14506524
 ] 

zhangduo commented on HBASE-13528:
--

+1.
And can you use git format-patch to generate the patch file? Using git am with 
sign off can retain the author of the patch.

[~tedyu] What do you think? You replied on mailing list.

Thanks.

 A bug on selecting compaction pool
 --

 Key: HBASE-13528
 URL: https://issues.apache.org/jira/browse/HBASE-13528
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.98.12
Reporter: Shuaifeng Zhou
Assignee: Shuaifeng Zhou
Priority: Minor
 Fix For: 1.0.1, 0.98.13

 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, 
 HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, 
 HBASE-13528-master.patch


 When the selectNow == true, in requestCompactionInternal, the compaction pool 
 section is incorrect.
 as discussed in:
 http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13528) A bug on selecting compaction pool

2015-04-22 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13528:
-
Fix Version/s: (was: 1.0.1)
   1.2.0
   1.0.2
   1.1.0
   2.0.0

 A bug on selecting compaction pool
 --

 Key: HBASE-13528
 URL: https://issues.apache.org/jira/browse/HBASE-13528
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.98.12
Reporter: Shuaifeng Zhou
Assignee: Shuaifeng Zhou
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2, 1.2.0

 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, 
 HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, 
 HBASE-13528-master.patch


 When the selectNow == true, in requestCompactionInternal, the compaction pool 
 section is incorrect.
 as discussed in:
 http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool

2015-04-22 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508191#comment-14508191
 ] 

zhangduo commented on HBASE-13528:
--

[~ndimiduk] [~enis] [~apurtell]
Should go into all branches?
Thanks.

 A bug on selecting compaction pool
 --

 Key: HBASE-13528
 URL: https://issues.apache.org/jira/browse/HBASE-13528
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.98.12
Reporter: Shuaifeng Zhou
Assignee: Shuaifeng Zhou
Priority: Minor
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2, 1.2.0

 Attachments: HBASE-13528-0.98-1.patch, HBASE-13528-0.98.patch, 
 HBASE-13528-1.0-1.patch, HBASE-13528-1.0.patch, HBASE-13528-master-1.patch, 
 HBASE-13528-master.patch


 When the selectNow == true, in requestCompactionInternal, the compaction pool 
 section is incorrect.
 as discussed in:
 http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13499) AsyncRpcClient test cases failure in powerpc

2015-04-21 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504426#comment-14504426
 ] 

zhangduo commented on HBASE-13499:
--

Good. Will commit it tomorrow if no objection.

 AsyncRpcClient test cases failure in powerpc
 

 Key: HBASE-13499
 URL: https://issues.apache.org/jira/browse/HBASE-13499
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: sangamesh
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 1.2.0

 Attachments: HBASE-13499.patch


 The new AsyncRpcClient feature added through the jira defect HBASE-12684 
 causing some test cases failures in powerpc64 environment.
 I am testing it in master branch.
 Looks like the version of netty (4.0.23) doesn't provide a support for non 
 amd64 platforms and suggested to use pure java netty 
 Here is the discussion on that https://github.com/aphyr/riemann/pull/508
 So new Async test cases will fail in ppc64 and other non amd64 platforms too.
 Here is the output of the error.
 Running org.apache.hadoop.hbase.ipc.TestAsyncIPC
 Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec 
  FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC
 testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC)
   Time elapsed: 0.048 sec   ERROR!
 java.lang.UnsatisfiedLinkError: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open 
 shared object file: No such file or directory (Possible cause: can't load AMD 
 64-bit .so on a Power PC 64-bit platform)
   at java.lang.ClassLoader$NativeLibrary.load(Native Method)
   at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13499) AsyncRpcClient test cases failure in powerpc

2015-04-21 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13499:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-1.1+.
Thanks [~sangameshs] [~stack].

 AsyncRpcClient test cases failure in powerpc
 

 Key: HBASE-13499
 URL: https://issues.apache.org/jira/browse/HBASE-13499
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: sangamesh
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 1.2.0

 Attachments: HBASE-13499.patch


 The new AsyncRpcClient feature added through the jira defect HBASE-12684 
 causing some test cases failures in powerpc64 environment.
 I am testing it in master branch.
 Looks like the version of netty (4.0.23) doesn't provide a support for non 
 amd64 platforms and suggested to use pure java netty 
 Here is the discussion on that https://github.com/aphyr/riemann/pull/508
 So new Async test cases will fail in ppc64 and other non amd64 platforms too.
 Here is the output of the error.
 Running org.apache.hadoop.hbase.ipc.TestAsyncIPC
 Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec 
  FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC
 testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC)
   Time elapsed: 0.048 sec   ERROR!
 java.lang.UnsatisfiedLinkError: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open 
 shared object file: No such file or directory (Possible cause: can't load AMD 
 64-bit .so on a Power PC 64-bit platform)
   at java.lang.ClassLoader$NativeLibrary.load(Native Method)
   at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool

2015-04-21 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506263#comment-14506263
 ] 

zhangduo commented on HBASE-13528:
--

I think this line is also redundant?

{code}
long size = selectNow ? compaction.getRequest().getSize() : 0;
{code}

The if selectNow is false, then we will not execute throttleCompaction then 
'size' is useless?

Thanks.

 A bug on selecting compaction pool
 --

 Key: HBASE-13528
 URL: https://issues.apache.org/jira/browse/HBASE-13528
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.98.12
Reporter: Shuaifeng Zhou
Assignee: Shuaifeng Zhou
Priority: Minor
 Fix For: 1.0.1, 0.98.13

 Attachments: HBASE-13528-0.98.patch, HBASE-13528-1.0.patch, 
 HBASE-13528-master.patch


 When the selectNow == true, in requestCompactionInternal, the compaction pool 
 section is incorrect.
 as discussed in:
 http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13528) A bug on selecting compaction pool

2015-04-21 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506317#comment-14506317
 ] 

zhangduo commented on HBASE-13528:
--

This will cause NPE...compaction will be null if selectNow == false.
Try this?
{code}
ThreadPoolExecutor pool = (selectNow  
s.throttleCompaction(compaction.getRequest().getSize()))
  ? largeCompactions : smallCompactions;
{code}

 A bug on selecting compaction pool
 --

 Key: HBASE-13528
 URL: https://issues.apache.org/jira/browse/HBASE-13528
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.98.12
Reporter: Shuaifeng Zhou
Assignee: Shuaifeng Zhou
Priority: Minor
 Fix For: 1.0.1, 0.98.13

 Attachments: HBASE-13528-0.98.patch, HBASE-13528-1.0.patch, 
 HBASE-13528-master.patch


 When the selectNow == true, in requestCompactionInternal, the compaction pool 
 section is incorrect.
 as discussed in:
 http://mail-archives.apache.org/mod_mbox/hbase-dev/201504.mbox/%3CCAAAYAnNC06E-pUG_Fhu9-7x5z--tm_apnFqpuqfn%3DLdNESE3mA%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13259) mmap() based BucketCache IOEngine

2015-04-20 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503941#comment-14503941
 ] 

zhangduo commented on HBASE-13259:
--

I can pick this up and address the 'ugly ByteBufferArray'.
But we do not have enough time to test it on large dataset if we want to catch 
up with the first rc of 1.1 I think. It is a tuning work, the time we need is 
unpredictable. We can file a new issue to hold the tuning work and resolve this 
issue before the first rc of 1.1.

What do you think? [~ndimiduk] 
Thanks.

 mmap() based BucketCache IOEngine
 -

 Key: HBASE-13259
 URL: https://issues.apache.org/jira/browse/HBASE-13259
 Project: HBase
  Issue Type: New Feature
  Components: BlockCache
Affects Versions: 0.98.10
Reporter: Zee Chen
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13259-v2.patch, HBASE-13259.patch, ioread-1.svg, 
 mmap-0.98-v1.patch, mmap-1.svg, mmap-trunk-v1.patch


 Of the existing BucketCache IOEngines, FileIOEngine uses pread() to copy data 
 from kernel space to user space. This is a good choice when the total working 
 set size is much bigger than the available RAM and the latency is dominated 
 by IO access. However, when the entire working set is small enough to fit in 
 the RAM, using mmap() (and subsequent memcpy()) to move data from kernel 
 space to user space is faster. I have run some short keyval gets tests and 
 the results indicate a reduction of 2%-7% of kernel CPU on my system, 
 depending on the load. On the gets, the latency histograms from mmap() are 
 identical to those from pread(), but peak throughput is close to 40% higher.
 This patch modifies ByteByfferArray to allow it to specify a backing file.
 Example for using this feature: set  hbase.bucketcache.ioengine to 
 mmap:/dev/shm/bucketcache.0 in hbase-site.xml.
 Attached perf measured CPU usage breakdown in flames graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13499) AsyncRpcClient test cases failure in powerpc

2015-04-20 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502799#comment-14502799
 ] 

zhangduo commented on HBASE-13499:
--

Seems we should also add a 'amd64' check which is same with the issue you 
mentioned.

Let me prepare a patch.

Thanks.

 AsyncRpcClient test cases failure in powerpc
 

 Key: HBASE-13499
 URL: https://issues.apache.org/jira/browse/HBASE-13499
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 1.1.0
Reporter: sangamesh

 The new AsyncRpcClient feature added through the jira defect HBASE-12684 
 causing some test cases failures in powerpc64 environment.
 I am testing it in master branch.
 Looks like the version of netty (4.0.23) doesn't provide a support for non 
 amd64 platforms and suggested to use pure java netty 
 Here is the discussion on that https://github.com/aphyr/riemann/pull/508
 So new Async test cases will fail in ppc64 and other non amd64 platforms too.
 Here is the output of the error.
 Running org.apache.hadoop.hbase.ipc.TestAsyncIPC
 Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec 
  FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC
 testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC)
   Time elapsed: 0.048 sec   ERROR!
 java.lang.UnsatisfiedLinkError: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open 
 shared object file: No such file or directory (Possible cause: can't load AMD 
 64-bit .so on a Power PC 64-bit platform)
   at java.lang.ClassLoader$NativeLibrary.load(Native Method)
   at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13499) AsyncRpcClient test cases failure in powerpc

2015-04-20 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13499:
-
Fix Version/s: 1.2.0
   1.1.0
   2.0.0
 Assignee: zhangduo
Affects Version/s: 1.2.0
   2.0.0
   Status: Patch Available  (was: Open)

 AsyncRpcClient test cases failure in powerpc
 

 Key: HBASE-13499
 URL: https://issues.apache.org/jira/browse/HBASE-13499
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: sangamesh
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 1.2.0

 Attachments: HBASE-13499.patch


 The new AsyncRpcClient feature added through the jira defect HBASE-12684 
 causing some test cases failures in powerpc64 environment.
 I am testing it in master branch.
 Looks like the version of netty (4.0.23) doesn't provide a support for non 
 amd64 platforms and suggested to use pure java netty 
 Here is the discussion on that https://github.com/aphyr/riemann/pull/508
 So new Async test cases will fail in ppc64 and other non amd64 platforms too.
 Here is the output of the error.
 Running org.apache.hadoop.hbase.ipc.TestAsyncIPC
 Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec 
  FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC
 testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC)
   Time elapsed: 0.048 sec   ERROR!
 java.lang.UnsatisfiedLinkError: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open 
 shared object file: No such file or directory (Possible cause: can't load AMD 
 64-bit .so on a Power PC 64-bit platform)
   at java.lang.ClassLoader$NativeLibrary.load(Native Method)
   at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13499) AsyncRpcClient test cases failure in powerpc

2015-04-20 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13499:
-
Attachment: HBASE-13499.patch

Add 'amd64' check. And there is a typo in the testcase...

[~sangameshs] Could you please help testing the patch on ppc?

Thanks.

 AsyncRpcClient test cases failure in powerpc
 

 Key: HBASE-13499
 URL: https://issues.apache.org/jira/browse/HBASE-13499
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 1.1.0
Reporter: sangamesh
 Attachments: HBASE-13499.patch


 The new AsyncRpcClient feature added through the jira defect HBASE-12684 
 causing some test cases failures in powerpc64 environment.
 I am testing it in master branch.
 Looks like the version of netty (4.0.23) doesn't provide a support for non 
 amd64 platforms and suggested to use pure java netty 
 Here is the discussion on that https://github.com/aphyr/riemann/pull/508
 So new Async test cases will fail in ppc64 and other non amd64 platforms too.
 Here is the output of the error.
 Running org.apache.hadoop.hbase.ipc.TestAsyncIPC
 Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec 
  FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC
 testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC)
   Time elapsed: 0.048 sec   ERROR!
 java.lang.UnsatisfiedLinkError: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: 
 /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open 
 shared object file: No such file or directory (Possible cause: can't load AMD 
 64-bit .so on a Power PC 64-bit platform)
   at java.lang.ClassLoader$NativeLibrary.load(Native Method)
   at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13259) mmap() based BucketCache IOEngine

2015-04-15 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497347#comment-14497347
 ] 

zhangduo commented on HBASE-13259:
--

No, I haven't tested the patch...

 mmap() based BucketCache IOEngine
 -

 Key: HBASE-13259
 URL: https://issues.apache.org/jira/browse/HBASE-13259
 Project: HBase
  Issue Type: New Feature
  Components: BlockCache
Affects Versions: 0.98.10
Reporter: Zee Chen
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13259-v2.patch, HBASE-13259.patch, ioread-1.svg, 
 mmap-0.98-v1.patch, mmap-1.svg, mmap-trunk-v1.patch


 Of the existing BucketCache IOEngines, FileIOEngine uses pread() to copy data 
 from kernel space to user space. This is a good choice when the total working 
 set size is much bigger than the available RAM and the latency is dominated 
 by IO access. However, when the entire working set is small enough to fit in 
 the RAM, using mmap() (and subsequent memcpy()) to move data from kernel 
 space to user space is faster. I have run some short keyval gets tests and 
 the results indicate a reduction of 2%-7% of kernel CPU on my system, 
 depending on the load. On the gets, the latency histograms from mmap() are 
 identical to those from pread(), but peak throughput is close to 40% higher.
 This patch modifies ByteByfferArray to allow it to specify a backing file.
 Example for using this feature: set  hbase.bucketcache.ioengine to 
 mmap:/dev/shm/bucketcache.0 in hbase-site.xml.
 Attached perf measured CPU usage breakdown in flames graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-04-14 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493843#comment-14493843
 ] 

zhangduo commented on HBASE-13301:
--

Seems the second time is fine. Let me commit.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-04-14 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493851#comment-14493851
 ] 

zhangduo commented on HBASE-13301:
--

Integrated to all branches.
Thanks all you guys who help me finish this.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-14 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13467) Prototype using GRPC as IPC mechanism

2015-04-14 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495447#comment-14495447
 ] 

zhangduo commented on HBASE-13467:
--

Nice try! 
Two things.
1. Wire compatibility. gRPC is based on HTTP/2, and the old rpc is based on raw 
TCP. If we can not keep compatibility at the protocol level, then we should 
find other ways to let people use old client communicate with new server.
2. Secure HBase. gRPC is based on HTTP/2, so I'm not worried about the kerberos 
authentication part. But security is a big system, a little change here may 
require large change there. It maybe a big project.
Thanks.

 Prototype using GRPC as IPC mechanism
 -

 Key: HBASE-13467
 URL: https://issues.apache.org/jira/browse/HBASE-13467
 Project: HBase
  Issue Type: Improvement
  Components: API
Affects Versions: 2.0.0
Reporter: Louis Ryan
Priority: Minor

 GRPC provide an RPC layer for protocol buffers on top of Netty 4/5. This 
 could be used to replace the current internal implementation.
 GRPC supports some advanced features like streaming, async, flow-control, 
 cancellation  timeout which might be useful
 Will prototype on GitHub here if folks are interested 
 https://github.com/louiscryan/hbase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-04-13 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493271#comment-14493271
 ] 

zhangduo commented on HBASE-13301:
--

Let me port the new patch to the branches other than master.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, 
 HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-13 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301-branch-1.0_v1.patch

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0_v1.patch, 
 HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, 
 HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, 
 HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-13 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301-0.98_v1.patch

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-13 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301-branch-1_v1.patch

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-13 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301-0.98_v1.patch

Retry for 0.98

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-0.98_v1.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0_v1.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-branch-1_v1.patch, HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-13 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: (was: HBASE-13301-0.98_v1.patch)

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.0_v1.patch, 
 HBASE-13301-branch-1.patch, HBASE-13301-branch-1_v1.patch, 
 HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, 
 HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-04-11 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490819#comment-14490819
 ] 

zhangduo commented on HBASE-13301:
--

Any other questions? [~ndimiduk]
Thanks.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, 
 HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-04-10 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490505#comment-14490505
 ] 

zhangduo commented on HBASE-13301:
--

Seems pass on branch-1.0.
Since the RC of 0.98.12 and 1.0.1 have been cut, we can push to all branches 
then?
[~enis] [~apurtell] Thanks.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, 
 HBASE-13301_v1.patch, HBASE-13301_v2.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-10 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301_v3.patch

[~ndimiduk] Yes, I tried it on every branch. Just change 
'backingMap.remove(cacheKey, bucketEntry)' back to 
'bucketEntry.equals(backingMap.remove(cacheKey))' in BucketCache.evictBlock, 
the test will fail every time.

And for the sleep in testcase...
For the evictThread, it is not easy to add a count down latch since we expect 
the thread to be blocked on the IdLock. And for the BucketCache.cacheBlock, it 
is a simple queue based async operation, I think it is not worth to add more 
logic other than a simple sleep wait, it is fast...

I extracted the cacheAndWait operation to a method and add some comments to 
explain the reason. And I added a method in IdLock to check the number of 
waiters who are waiting on the given id and use this method to confirm the 
evictThread is blocked on the IdLock.

Thanks.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-0.98.patch, HBASE-13301-branch-1.0.patch, 
 HBASE-13301-branch-1.0.patch, HBASE-13301-branch-1.patch, 
 HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch, HBASE-13301.patch, 
 HBASE-13301_v1.patch, HBASE-13301_v2.patch, HBASE-13301_v3.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-04-08 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486301#comment-14486301
 ] 

zhangduo commented on HBASE-13301:
--

Will commit later if no objections.
[~ndimiduk] [~enis] [~apurtell] OK to commit to branches other than master?
Thanks.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-07 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301_v1.patch

Check thread state in testcase.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-04-07 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484362#comment-14484362
 ] 

zhangduo commented on HBASE-13301:
--

[~apurtell] A little problem about the compareTo and equals methods in 
BucketEntry.
Will prepare a new patch soon.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-07 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301_v2.patch

Remove compareTo and equals methods, use a COMPARATOR instead.
Change 'accessTime' to 'accessCounter' since it is always assigned by 
'accessCount.incrementAndGet' which is not the actual time. 'accessTime' makes 
people confuse why accessTime equals means object equals in compare methods.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-07 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: (was: HBASE-13301_v2.patch)

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-07 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301_v2.patch

Sorry I should have my head hitten...
Fix a wrong comment, the comparator is in descending order, not ascending 
order...

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-07 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Status: Patch Available  (was: Open)

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2

 Attachments: HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch, HBASE-13301_v1.patch, 
 HBASE-13301_v2.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-04-05 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14396259#comment-14396259
 ] 

zhangduo commented on HBASE-13408:
--

Looks good.

And a little hint, log truncating is also an important purpose of doing flush. 
So if you keep some data in memstore for a long time, then there will be lots 
of WALs that can not be truncated and increase MTTR. So if the flush request 
comes from LogRoller, then you should enter the panic mode and flush the 
memstore(Maybe you have already known but I haven't seen log truncating things 
in your design doc so just put it here :) )

And I remember that xiaomi said they have a 'HLog reform' feature which can 
solve this problem in their private version of HBase, but seems they have not 
donated to community yet.


 HBase In-Memory Memstore Compaction
 ---

 Key: HBASE-13408
 URL: https://issues.apache.org/jira/browse/HBASE-13408
 Project: HBase
  Issue Type: New Feature
Reporter: Eshcar Hillel
 Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf


 A store unit holds a column family in a region, where the memstore is its 
 in-memory component. The memstore absorbs all updates to the store; from time 
 to time these updates are flushed to a file on disk, where they are 
 compacted. Unlike disk components, the memstore is not compacted until it is 
 written to the filesystem and optionally to block-cache. This may result in 
 underutilization of the memory due to duplicate entries per row, for example, 
 when hot data is continuously updated. 
 Generally, the faster the data is accumulated in memory, more flushes are 
 triggered, the data sinks to disk more frequently, slowing down retrieval of 
 data, even if very recent.
 In high-churn workloads, compacting the memstore can help maintain the data 
 in memory, and thereby speed up data retrieval. 
 We suggest a new compacted memstore with the following principles:
 1.The data is kept in memory for as long as possible
 2.Memstore data is either compacted or in process of being compacted 
 3.Allow a panic mode, which may interrupt an in-progress compaction and 
 force a flush of part of the memstore.
 We suggest applying this optimization only to in-memory column families.
 A design document is attached.
 This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-04 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301.patch

Change declaration of backingMap from Map to ConcurrentMap and use remove(key, 
value) to prevent removing wrong entry.

Also done some other cleanups and fixes. Use the first testcase since the 
second one is only used to prove this can happen in real scenario.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-04 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Status: Patch Available  (was: Open)

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13301-testcase.patch, 
 HBASE-13301-testcase_v1.patch, HBASE-13301.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-04-03 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301-testcase_v1.patch

A new testcase shows that it is possible to evict and cache a block again in 
real world. Move to another RS and move back can make this happen.

Of course, this is a rarest case. It is almost impossible for a thread to halt 
for such a long time. But this a time bomb. I do not think it is a good idea to 
leave it there and wait for a bang...

I will try to fix it.
Thanks.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13301-testcase.patch, HBASE-13301-testcase_v1.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0

2015-04-02 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13385:
-
Attachment: HBASE-13385_v1.patch

fix compile error with hadoop 2.4.1

 TestGenerateDelegationToken is broken with hadoop 2.8.0
 ---

 Key: HBASE-13385
 URL: https://issues.apache.org/jira/browse/HBASE-13385
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13385.patch, HBASE-13385_v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0

2015-04-02 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13385:
-
Status: Patch Available  (was: Open)

 TestGenerateDelegationToken is broken with hadoop 2.8.0
 ---

 Key: HBASE-13385
 URL: https://issues.apache.org/jira/browse/HBASE-13385
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13385.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0

2015-04-02 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13385:
-
   Resolution: Fixed
Fix Version/s: 1.1.0
   2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Pushed to master and branch-1.
Thanks [~tedyu] for reviewing.

 TestGenerateDelegationToken is broken with hadoop 2.8.0
 ---

 Key: HBASE-13385
 URL: https://issues.apache.org/jira/browse/HBASE-13385
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13385.patch, HBASE-13385_v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0

2015-04-02 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392640#comment-14392640
 ] 

zhangduo commented on HBASE-13385:
--

Test with
{noformat}
mvn clean test -Dtest=TestGenerateDelegationToken 
-Dhadoop-two.version=2.8.0-SNAPSHOT
{noformat}
Passed.

 TestGenerateDelegationToken is broken with hadoop 2.8.0
 ---

 Key: HBASE-13385
 URL: https://issues.apache.org/jira/browse/HBASE-13385
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13385.patch, HBASE-13385_v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0

2015-04-02 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392607#comment-14392607
 ] 

zhangduo commented on HBASE-13385:
--

What's this?
{noformat}
*
 Printing headers for files without AL header...
 
 
===
==/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java.rej
===
--- 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java
+++ 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java
@@ -41,6 +41,7 @@
 import org.apache.hadoop.hbase.coordination.BaseCoordinatedStateManager;
 import 
org.apache.hadoop.hbase.coordination.RegionMergeCoordination.RegionMergeDetails;
 import 
org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos.RegionStateTransition.TransitionCode;
+import 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.JournalEntryImpl;
 import 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.LoggingProgressable;
 import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.util.ConfigUtil;

*
{noformat}

 TestGenerateDelegationToken is broken with hadoop 2.8.0
 ---

 Key: HBASE-13385
 URL: https://issues.apache.org/jira/browse/HBASE-13385
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13385.patch, HBASE-13385_v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0

2015-04-02 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392615#comment-14392615
 ] 

zhangduo commented on HBASE-13385:
--

Oh, it is caused by HBASE-12975. Other people have already reported it.

 TestGenerateDelegationToken is broken with hadoop 2.8.0
 ---

 Key: HBASE-13385
 URL: https://issues.apache.org/jira/browse/HBASE-13385
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13385.patch, HBASE-13385_v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12259) Bring quorum based write ahead log into HBase

2015-04-02 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392764#comment-14392764
 ] 

zhangduo commented on HBASE-12259:
--

Any progress here?
Thanks.

 Bring quorum based write ahead log into HBase
 -

 Key: HBASE-12259
 URL: https://issues.apache.org/jira/browse/HBASE-12259
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 2.0.0
Reporter: Elliott Clark
 Attachments: Architecture for HydraBase (5).pdf, 
 RaftProtocolImplementationDesignDoc.pdf


 HydraBase ( 
 https://code.facebook.com/posts/32638043166/hydrabase-the-evolution-of-hbase-facebook/
  ) Facebook's implementation of HBase with Raft for consensus will be going 
 open source shortly. We should pull in the parts of that fb-0.89 based 
 implementation, and offer it as a feature in whatever next major release is 
 next up. Right now the Hydrabase code base isn't ready to be released into 
 the wild; it should be ready soon ( for some definition of soon).
 Since Hydrabase is based upon 0.89 most of the code is not directly 
 applicable. So lots of work will probably need to be done in a feature branch 
 before a merge vote.
 Is this something that's wanted?
 Is there anything clean up that needs to be done before the log 
 implementation is able to be replaced like this?
 What's our story with upgrading to this? Are we ok with requiring down time ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13187) Add ITBLL that exercises per CF flush

2015-04-01 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13187:
-
Attachment: HBASE-13187_v1.patch

Add existence check.
Run it locally on master using command
{noformat}
mvn -Dit.test=IntegrationTestBigLinkedList 
-Dgenerator.multiple.columnfamilies=true verify
{noformat}

Passed. [~stack]

 Add ITBLL that exercises per CF flush
 -

 Key: HBASE-13187
 URL: https://issues.apache.org/jira/browse/HBASE-13187
 Project: HBase
  Issue Type: Task
  Components: integration tests
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.1.0

 Attachments: 13187.txt, HBASE-13187_v1.patch


 Let me work on this. It would be excellent if we could have confidence to 
 turn this on earlier rather than later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0

2015-04-01 Thread zhangduo (JIRA)
zhangduo created HBASE-13385:


 Summary: TestGenerateDelegationToken is broken with hadoop 2.8.0
 Key: HBASE-13385
 URL: https://issues.apache.org/jira/browse/HBASE-13385
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13385) TestGenerateDelegationToken is broken with hadoop 2.8.0

2015-04-01 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13385:
-
Attachment: HBASE-13385.patch

Also start dfs cluster in secure mode.
Copied some code from hdfs testcase.

 TestGenerateDelegationToken is broken with hadoop 2.8.0
 ---

 Key: HBASE-13385
 URL: https://issues.apache.org/jira/browse/HBASE-13385
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13385.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13371) Fix typo in TestAsyncIPC

2015-03-31 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13371:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Fix typo in TestAsyncIPC
 

 Key: HBASE-13371
 URL: https://issues.apache.org/jira/browse/HBASE-13371
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13371.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13371) Fix typo in TestAsyncIPC

2015-03-31 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389966#comment-14389966
 ] 

zhangduo commented on HBASE-13371:
--

Pushed to master and branch-1. Thanks [~tedyu] for reviewing.

 Fix typo in TestAsyncIPC
 

 Key: HBASE-13371
 URL: https://issues.apache.org/jira/browse/HBASE-13371
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13371.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13371) Fix typo in TestAsyncIPC

2015-03-31 Thread zhangduo (JIRA)
zhangduo created HBASE-13371:


 Summary: Fix typo in TestAsyncIPC
 Key: HBASE-13371
 URL: https://issues.apache.org/jira/browse/HBASE-13371
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13371) Fix typo in TestAsyncIPC

2015-03-31 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13371:
-
Attachment: HBASE-13371.patch

Forget to modify the auto-generated code.
One line patch.

 Fix typo in TestAsyncIPC
 

 Key: HBASE-13371
 URL: https://issues.apache.org/jira/browse/HBASE-13371
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13371.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13371) Fix typo in TestAsyncIPC

2015-03-31 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13371:
-
Status: Patch Available  (was: Open)

 Fix typo in TestAsyncIPC
 

 Key: HBASE-13371
 URL: https://issues.apache.org/jira/browse/HBASE-13371
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13371.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-03-29 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385682#comment-14385682
 ] 

zhangduo commented on HBASE-13301:
--

{quote}
In btw a context switch t1 completed the caching and done evict and again 
cached same block.. This seems rarest of rare case. 
{quote}
Agree. But HBase is a long running service, small probability events always 
occur if we keep it running long enough...

Let me revisit the whole read write path in regionserver which relates to 
BlockCache and give a clear locking schema first. Then it is easier to say if 
the situation in this testcase could happen.

Will come back later. Thanks.



 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13301-testcase.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13295) TestInfoServers hang

2015-03-27 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13295:
-
   Resolution: Fixed
Fix Version/s: 1.1.0
   2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 TestInfoServers hang
 

 Key: HBASE-13295
 URL: https://issues.apache.org/jira/browse/HBASE-13295
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13295.patch


 https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt
 No progress after this line
 {noformat}
 2015-03-19 22:46:06,809 INFO  [main] hbase.TestInfoServers(127): Testing 
 http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey=
  has Table action request accepted
 {noformat}
 I think the problem maybe we do not wait for master finish becoming active, 
 and there is no timeout when doing http request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-03-27 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385133#comment-14385133
 ] 

zhangduo commented on HBASE-13301:
--

Thanks [~anoopsamjohn], and could you explain why this won't happen?

And if this won’t happen, then maybe we just need a null check to confirm that 
the block has not been evicted by others yet? A 'get and check' still make 
people think that we could evict and cache again as what I did in the 
testcase...

Thanks.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13301-testcase.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13295) TestInfoServers hang

2015-03-26 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383303#comment-14383303
 ] 

zhangduo commented on HBASE-13295:
--

Pushed to master and branch-1.

 TestInfoServers hang
 

 Key: HBASE-13295
 URL: https://issues.apache.org/jira/browse/HBASE-13295
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13295.patch


 https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt
 No progress after this line
 {noformat}
 2015-03-19 22:46:06,809 INFO  [main] hbase.TestInfoServers(127): Testing 
 http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey=
  has Table action request accepted
 {noformat}
 I think the problem maybe we do not wait for master finish becoming active, 
 and there is no timeout when doing http request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13295) TestInfoServers hang

2015-03-26 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383267#comment-14383267
 ] 

zhangduo commented on HBASE-13295:
--

Let me pick this up. At least fix for master and branch-1.

 TestInfoServers hang
 

 Key: HBASE-13295
 URL: https://issues.apache.org/jira/browse/HBASE-13295
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13295.patch


 https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt
 No progress after this line
 {noformat}
 2015-03-19 22:46:06,809 INFO  [main] hbase.TestInfoServers(127): Testing 
 http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey=
  has Table action request accepted
 {noformat}
 I think the problem maybe we do not wait for master finish becoming active, 
 and there is no timeout when doing http request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-03-25 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381237#comment-14381237
 ] 

zhangduo commented on HBASE-13301:
--

[~ram_krish]
Yes, get and compare then remove is perfect. But it is not straight forward in 
this case. We do have a IdLock, but it is only used in get and evict, and the 
lock key is offset, not BlockCacheKey.

And my wonder is whether the error in this testcase could happen in real world. 
Maybe the access pattern we use can avoid this error? I do not know...

Anyway, remove first then compare is not a good idea I think. If we can enter 
the 'not equals' branch then no doubt, it is a bug. And if we never enter the 
'not equals' branch, then why we need the compare...

Thanks.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13301-testcase.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13310) Fix high priority findbugs warnings

2015-03-23 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375864#comment-14375864
 ] 

zhangduo commented on HBASE-13310:
--

Put the patch on reviewboard.
Will commit it if there is no objection when I come back.
Thanks.

 Fix high priority findbugs warnings
 ---

 Key: HBASE-13310
 URL: https://issues.apache.org/jira/browse/HBASE-13310
 Project: HBase
  Issue Type: Task
Affects Versions: 2.0.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: HBASE-13310.patch, HBASE-13310_v1.patch, 
 HBASE-13310_v1.patch


 See here.
 https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/
 High priority warnings usually introduce bugs or have very bad impact on 
 performace. Let's fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-13257) Show coverage report on jenkins

2015-03-23 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo resolved HBASE-13257.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

 Show coverage report on jenkins
 ---

 Key: HBASE-13257
 URL: https://issues.apache.org/jira/browse/HBASE-13257
 Project: HBase
  Issue Type: Task
Reporter: zhangduo
Assignee: zhangduo
Priority: Minor
 Fix For: 2.0.0


 Think of showing jacoco coverage report on https://builds.apache.org .
 And there is an advantage of showing it on jenkins that the jenkins jacoco 
 plugin can handle cross module coverage.
 Can not do it locally since https://github.com/jacoco/jacoco/pull/97 is still 
 pending.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings

2015-03-23 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13310:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master.

Thanks [~eclark] and [~tedyu].

 Fix high priority findbugs warnings
 ---

 Key: HBASE-13310
 URL: https://issues.apache.org/jira/browse/HBASE-13310
 Project: HBase
  Issue Type: Task
Affects Versions: 2.0.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: HBASE-13310.patch, HBASE-13310_v1.patch, 
 HBASE-13310_v1.patch


 See here.
 https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/
 High priority warnings usually introduce bugs or have very bad impact on 
 performace. Let's fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings

2015-03-22 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13310:
-
Attachment: HBASE-13310_v1.patch

Fix the stupid NPE...

 Fix high priority findbugs warnings
 ---

 Key: HBASE-13310
 URL: https://issues.apache.org/jira/browse/HBASE-13310
 Project: HBase
  Issue Type: Task
Affects Versions: 2.0.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: HBASE-13310.patch, HBASE-13310_v1.patch


 See here.
 https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/
 High priority warnings usually introduce bugs or have very bad impact on 
 performace. Let's fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings

2015-03-22 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13310:
-
Attachment: HBASE-13310.patch

 Fix high priority findbugs warnings
 ---

 Key: HBASE-13310
 URL: https://issues.apache.org/jira/browse/HBASE-13310
 Project: HBase
  Issue Type: Task
Affects Versions: 2.0.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: HBASE-13310.patch


 See here.
 https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/
 High priority warnings usually introduce bugs or have very bad impact on 
 performace. Let's fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings

2015-03-22 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13310:
-
Fix Version/s: 2.0.0
Affects Version/s: 2.0.0
   Status: Patch Available  (was: Open)

 Fix high priority findbugs warnings
 ---

 Key: HBASE-13310
 URL: https://issues.apache.org/jira/browse/HBASE-13310
 Project: HBase
  Issue Type: Task
Affects Versions: 2.0.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: HBASE-13310.patch


 See here.
 https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/
 High priority warnings usually introduce bugs or have very bad impact on 
 performace. Let's fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13310) Fix high priority findbugs warnings

2015-03-22 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13310:
-
Attachment: HBASE-13310_v1.patch

Do not know...
I run it several times locally, it didn't hang...
Try again.

 Fix high priority findbugs warnings
 ---

 Key: HBASE-13310
 URL: https://issues.apache.org/jira/browse/HBASE-13310
 Project: HBase
  Issue Type: Task
Affects Versions: 2.0.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: HBASE-13310.patch, HBASE-13310_v1.patch, 
 HBASE-13310_v1.patch


 See here.
 https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/
 High priority warnings usually introduce bugs or have very bad impact on 
 performace. Let's fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13295) TestInfoServers hang

2015-03-22 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13295:
-
Summary: TestInfoServers hang  (was: TestInfoServers hung)

 TestInfoServers hang
 

 Key: HBASE-13295
 URL: https://issues.apache.org/jira/browse/HBASE-13295
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13295.patch


 https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt
 No progress after this line
 {noformat}
 2015-03-19 22:46:06,809 INFO  [main] hbase.TestInfoServers(127): Testing 
 http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey=
  has Table action request accepted
 {noformat}
 I think the problem maybe we do not wait for master finish becoming active, 
 and there is no timeout when doing http request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction

2015-03-21 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13308:
-
Description: 
https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/

First, we split 'e9eb97847340ea7c6b9616d63d62a784' to  
'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'.

And then, we try to split 'abe1973ea732066b12d8e33fce12a951'.
{noformat}
2015-03-21 03:58:46,970 INFO  [Thread-191] 
regionserver.TestEndToEndSplitTransaction(399): Initiating region split 
for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
2015-03-21 03:58:46,976 INFO  [PriorityRpcServer.handler=7,queue=1,port=54177] 
regionserver.RSRpcServices(1596): Splitting 
testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] 
regionserver.CompactSplitThread(259): Split requested for 
testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951..
  compaction_queue=(0:0), split_queue=1, merge_queue=0
2015-03-21 03:58:46,978 INFO  [Thread-191] 
regionserver.TestEndToEndSplitTransaction(399): blocking until region is 
split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
lock.ZKInterProcessLockBase(226): Acquired a lock for 
/hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
lock.ZKInterProcessLockBase(328): Released 
/hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global 
event loop group NioEventLoopGroup
2015-03-21 03:58:46,988 INFO  [RS:0;priapus:54177-splits-1426910324832] 
regionserver.SplitRequest(142): Split transaction journal:
STARTED at 1426910326977
{noformat}

We can see that it failed without any error message.
I think can only happen when the parent is not splittable or we can not find a 
splitrow.
{noformat}
2015-03-21 03:58:47,019 INFO  
[RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334): 
Completed major compaction of 2 (all) file(s) in family of 
testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 
12.6 K. This selection was in queue for 0sec, and took 0sec to execute.
2015-03-21 03:58:47,019 INFO  
[RS:0;priapus:54177-shortCompactions-1426910324308] 
regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: 
Request = 
regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.,
 storeName=family, fileCount=2, fileSize=25.5 K, priority=1, 
time=14542808784655186; duration=0sec
2015-03-21 03:58:47,020 DEBUG 
[RS:0;priapus:54177-shortCompactions-1426910324308] 
regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread 
Status: compaction_queue=(0:0), split_queue=0, merge_queue=0
{noformat}

We can see that, the compaction was completed at 03:58:47,019, but split was 
started at 03:58:46,970 which is earlier.
So we have a reference file and is not splittable.

I think the problem is 'compactAndBlockUntilDone' is not reliable, it may 
return before the compaction complete.

Will try to prepare a patch.

  was:
https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/

First, we split 'e9eb97847340ea7c6b9616d63d62a784' to  
'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'.

And then, we try to split 'abe1973ea732066b12d8e33fce12a951'.
{noformat}
2015-03-21 03:58:46,970 INFO  [Thread-191] 
regionserver.TestEndToEndSplitTransaction(399): Initiating region split 
for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
2015-03-21 03:58:46,976 INFO  [PriorityRpcServer.handler=7,queue=1,port=54177] 
regionserver.RSRpcServices(1596): Splitting 
testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] 
regionserver.CompactSplitThread(259): Split requested for 
testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951..
  compaction_queue=(0:0), split_queue=1, merge_queue=0
2015-03-21 03:58:46,978 INFO  [Thread-191] 
regionserver.TestEndToEndSplitTransaction(399): blocking until region is 
split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
2015-03-21 03:58:46,985 DEBUG 

[jira] [Created] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction

2015-03-21 Thread zhangduo (JIRA)
zhangduo created HBASE-13308:


 Summary: Fix flaky TestEndToEndSplitTransaction
 Key: HBASE-13308
 URL: https://issues.apache.org/jira/browse/HBASE-13308
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: zhangduo
Assignee: zhangduo


https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/

First, we split 'e9eb97847340ea7c6b9616d63d62a784' to  
'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'.

And then, we try to split 'abe1973ea732066b12d8e33fce12a951'.
{noformat}
2015-03-21 03:58:46,970 INFO  [Thread-191] 
regionserver.TestEndToEndSplitTransaction(399): Initiating region split 
for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
2015-03-21 03:58:46,976 INFO  [PriorityRpcServer.handler=7,queue=1,port=54177] 
regionserver.RSRpcServices(1596): Splitting 
testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
2015-03-21 03:58:46,977 DEBUG [PriorityRpcServer.handler=7,queue=1,port=54177] 
regionserver.CompactSplitThread(259): Split requested for 
testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951..
  compaction_queue=(0:0), split_queue=1, merge_queue=0
2015-03-21 03:58:46,978 INFO  [Thread-191] 
regionserver.TestEndToEndSplitTransaction(399): blocking until region is 
split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
lock.ZKInterProcessLockBase(226): Acquired a lock for 
/hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
lock.ZKInterProcessLockBase(328): Released 
/hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use global 
event loop group NioEventLoopGroup
2015-03-21 03:58:46,988 INFO  [RS:0;priapus:54177-splits-1426910324832] 
regionserver.SplitRequest(142): Split transaction journal:
STARTED at 1426910326977
{noformat}

We can see that it failed without any error message.
I think can only happen when the parent is not splittable or we can not find a 
splitrow.
{noformat}
2015-03-21 03:58:47,019 INFO  
[RS:0;priapus:54177-shortCompactions-1426910324308] regionserver.HStore(1334): 
Completed major compaction of 2 (all) file(s) in family of 
testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 
12.6 K. This selection was in queue for 0sec, and took 0sec to execute.
2015-03-21 03:58:47,019 INFO  
[RS:0;priapus:54177-shortCompactions-1426910324308] 
regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: 
Request = 
regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.,
 storeName=family, fileCount=2, fileSize=25.5 K, priority=1, 
time=14542808784655186; duration=0sec
2015-03-21 03:58:47,020 DEBUG 
[RS:0;priapus:54177-shortCompactions-1426910324308] 
regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread 
Status: compaction_queue=(0:0), split_queue=0, merge_queue=0
{noformat}

We can see that, the compaction was completed at 03:58:47,019, but split is 
started at 03:58:46,970 which is earlier.
So we have a reference file and is not splittable.

I think the problem is 'compactAndBlockUntilDone' is not reliable, it may 
return before the compaction complete.

Will try to prepare a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction

2015-03-21 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372611#comment-14372611
 ] 

zhangduo commented on HBASE-13308:
--

This is our 'compactAndBlockUntilDone' method.
{code:title=TestEndToEndSplitTransaction.java}
  public static void compactAndBlockUntilDone(Admin admin, HRegionServer rs, 
byte[] regionName)
  throws IOException, InterruptedException {
log(Compacting region:  + Bytes.toStringBinary(regionName));
admin.majorCompactRegion(regionName);
log(blocking until compaction is complete:  + 
Bytes.toStringBinary(regionName));
Threads.sleepWithoutInterrupt(500);
while (rs.compactSplitThread.getCompactionQueueSize()  0) {
  Threads.sleep(50);
}
  }
{code}

It uses the thread pool's workQueue size as condition. But
{code}
  public static void main(String[] args) throws InterruptedException {
ThreadPoolExecutor pool =
new ThreadPoolExecutor(1, 1, 60, TimeUnit.SECONDS, new 
LinkedBlockingQueueRunnable());
pool.execute(new Runnable() {
  
  @Override
  public void run() {
try {
  Thread.currentThread().join();
} catch (InterruptedException e) {}
  }
});
Thread.sleep(2000);
System.out.println(pool.getActiveCount());
System.out.println(pool.getQueue().size());
pool.shutdownNow();
  }
{code}
The output is 
{noformat}
1
0
{noformat}
A thread pool's queue size does not include the running tasks. So if there is 
only one running compaction, then the compaction queue size will be zero...

So, it is not safe to use compaction queue size as condition.

 Fix flaky TestEndToEndSplitTransaction
 --

 Key: HBASE-13308
 URL: https://issues.apache.org/jira/browse/HBASE-13308
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: zhangduo
Assignee: zhangduo

 https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/
 First, we split 'e9eb97847340ea7c6b9616d63d62a784' to  
 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'.
 And then, we try to split 'abe1973ea732066b12d8e33fce12a951'.
 {noformat}
 2015-03-21 03:58:46,970 INFO  [Thread-191] 
 regionserver.TestEndToEndSplitTransaction(399): Initiating region split 
 for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,976 INFO  
 [PriorityRpcServer.handler=7,queue=1,port=54177] 
 regionserver.RSRpcServices(1596): Splitting 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,977 DEBUG 
 [PriorityRpcServer.handler=7,queue=1,port=54177] 
 regionserver.CompactSplitThread(259): Split requested for 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951..
   compaction_queue=(0:0), split_queue=1, merge_queue=0
 2015-03-21 03:58:46,978 INFO  [Thread-191] 
 regionserver.TestEndToEndSplitTransaction(399): blocking until region is 
 split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
 lock.ZKInterProcessLockBase(226): Acquired a lock for 
 /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
 lock.ZKInterProcessLockBase(328): Released 
 /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use 
 global event loop group NioEventLoopGroup
 2015-03-21 03:58:46,988 INFO  [RS:0;priapus:54177-splits-1426910324832] 
 regionserver.SplitRequest(142): Split transaction journal:
   STARTED at 1426910326977
 {noformat}
 We can see that it failed without any error message.
 I think can only happen when the parent is not splittable or we can not find 
 a splitrow.
 {noformat}
 2015-03-21 03:58:47,019 INFO  
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in 
 family of 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
  into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 
 12.6 K. This selection was in queue for 0sec, and took 0sec to execute.
 2015-03-21 03:58:47,019 INFO  
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: 
 Request = 
 regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.,
  storeName=family, fileCount=2, fileSize=25.5 K, priority=1, 
 time=14542808784655186; duration=0sec
 2015-03-21 

[jira] [Updated] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction

2015-03-21 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13308:
-
Attachment: HBASE-13308.patch

Use memstore size and store file count as condition variable.

Also cleanup old APIs.

 Fix flaky TestEndToEndSplitTransaction
 --

 Key: HBASE-13308
 URL: https://issues.apache.org/jira/browse/HBASE-13308
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13308.patch


 https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/
 First, we split 'e9eb97847340ea7c6b9616d63d62a784' to  
 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'.
 And then, we try to split 'abe1973ea732066b12d8e33fce12a951'.
 {noformat}
 2015-03-21 03:58:46,970 INFO  [Thread-191] 
 regionserver.TestEndToEndSplitTransaction(399): Initiating region split 
 for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,976 INFO  
 [PriorityRpcServer.handler=7,queue=1,port=54177] 
 regionserver.RSRpcServices(1596): Splitting 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,977 DEBUG 
 [PriorityRpcServer.handler=7,queue=1,port=54177] 
 regionserver.CompactSplitThread(259): Split requested for 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951..
   compaction_queue=(0:0), split_queue=1, merge_queue=0
 2015-03-21 03:58:46,978 INFO  [Thread-191] 
 regionserver.TestEndToEndSplitTransaction(399): blocking until region is 
 split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
 lock.ZKInterProcessLockBase(226): Acquired a lock for 
 /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
 lock.ZKInterProcessLockBase(328): Released 
 /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use 
 global event loop group NioEventLoopGroup
 2015-03-21 03:58:46,988 INFO  [RS:0;priapus:54177-splits-1426910324832] 
 regionserver.SplitRequest(142): Split transaction journal:
   STARTED at 1426910326977
 {noformat}
 We can see that it failed without any error message.
 I think can only happen when the parent is not splittable or we can not find 
 a splitrow.
 {noformat}
 2015-03-21 03:58:47,019 INFO  
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in 
 family of 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
  into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 
 12.6 K. This selection was in queue for 0sec, and took 0sec to execute.
 2015-03-21 03:58:47,019 INFO  
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: 
 Request = 
 regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.,
  storeName=family, fileCount=2, fileSize=25.5 K, priority=1, 
 time=14542808784655186; duration=0sec
 2015-03-21 03:58:47,020 DEBUG 
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread 
 Status: compaction_queue=(0:0), split_queue=0, merge_queue=0
 {noformat}
 We can see that, the compaction was completed at 03:58:47,019, but split was 
 started at 03:58:46,970 which is earlier.
 So we have a reference file and is not splittable.
 I think the problem is 'compactAndBlockUntilDone' is not reliable, it may 
 return before the compaction complete.
 Will try to prepare a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction

2015-03-21 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13308:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Fix flaky TestEndToEndSplitTransaction
 --

 Key: HBASE-13308
 URL: https://issues.apache.org/jira/browse/HBASE-13308
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13308.patch


 https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/
 First, we split 'e9eb97847340ea7c6b9616d63d62a784' to  
 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'.
 And then, we try to split 'abe1973ea732066b12d8e33fce12a951'.
 {noformat}
 2015-03-21 03:58:46,970 INFO  [Thread-191] 
 regionserver.TestEndToEndSplitTransaction(399): Initiating region split 
 for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,976 INFO  
 [PriorityRpcServer.handler=7,queue=1,port=54177] 
 regionserver.RSRpcServices(1596): Splitting 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,977 DEBUG 
 [PriorityRpcServer.handler=7,queue=1,port=54177] 
 regionserver.CompactSplitThread(259): Split requested for 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951..
   compaction_queue=(0:0), split_queue=1, merge_queue=0
 2015-03-21 03:58:46,978 INFO  [Thread-191] 
 regionserver.TestEndToEndSplitTransaction(399): blocking until region is 
 split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
 lock.ZKInterProcessLockBase(226): Acquired a lock for 
 /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
 lock.ZKInterProcessLockBase(328): Released 
 /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use 
 global event loop group NioEventLoopGroup
 2015-03-21 03:58:46,988 INFO  [RS:0;priapus:54177-splits-1426910324832] 
 regionserver.SplitRequest(142): Split transaction journal:
   STARTED at 1426910326977
 {noformat}
 We can see that it failed without any error message.
 I think can only happen when the parent is not splittable or we can not find 
 a splitrow.
 {noformat}
 2015-03-21 03:58:47,019 INFO  
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in 
 family of 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
  into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 
 12.6 K. This selection was in queue for 0sec, and took 0sec to execute.
 2015-03-21 03:58:47,019 INFO  
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: 
 Request = 
 regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.,
  storeName=family, fileCount=2, fileSize=25.5 K, priority=1, 
 time=14542808784655186; duration=0sec
 2015-03-21 03:58:47,020 DEBUG 
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread 
 Status: compaction_queue=(0:0), split_queue=0, merge_queue=0
 {noformat}
 We can see that, the compaction was completed at 03:58:47,019, but split was 
 started at 03:58:46,970 which is earlier.
 So we have a reference file and is not splittable.
 I think the problem is 'compactAndBlockUntilDone' is not reliable, it may 
 return before the compaction complete.
 Will try to prepare a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13308) Fix flaky TestEndToEndSplitTransaction

2015-03-21 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372723#comment-14372723
 ] 

zhangduo commented on HBASE-13308:
--

Pushed to master and branch-1.

Thanks [~tedyu] for reviewing.

 Fix flaky TestEndToEndSplitTransaction
 --

 Key: HBASE-13308
 URL: https://issues.apache.org/jira/browse/HBASE-13308
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13308.patch


 https://builds.apache.org/job/HBase-TRUNK-jacoco/24/testReport/junit/org.apache.hadoop.hbase.regionserver/TestEndToEndSplitTransaction/testFromClientSideWhileSplitting/
 First, we split 'e9eb97847340ea7c6b9616d63d62a784' to  
 'abe1973ea732066b12d8e33fce12a951' and '4940dad7ef9b4b699fd13eede5740d9d'.
 And then, we try to split 'abe1973ea732066b12d8e33fce12a951'.
 {noformat}
 2015-03-21 03:58:46,970 INFO  [Thread-191] 
 regionserver.TestEndToEndSplitTransaction(399): Initiating region split 
 for:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,976 INFO  
 [PriorityRpcServer.handler=7,queue=1,port=54177] 
 regionserver.RSRpcServices(1596): Splitting 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,977 DEBUG 
 [PriorityRpcServer.handler=7,queue=1,port=54177] 
 regionserver.CompactSplitThread(259): Split requested for 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951..
   compaction_queue=(0:0), split_queue=1, merge_queue=0
 2015-03-21 03:58:46,978 INFO  [Thread-191] 
 regionserver.TestEndToEndSplitTransaction(399): blocking until region is 
 split:testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
 2015-03-21 03:58:46,985 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
 lock.ZKInterProcessLockBase(226): Acquired a lock for 
 /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
 2015-03-21 03:58:46,988 DEBUG [RS:0;priapus:54177-splits-1426910324832] 
 lock.ZKInterProcessLockBase(328): Released 
 /hbase/table-lock/testFromClientSideWhileSplitting/read-regionserver:5417702
 2015-03-21 03:58:46,988 DEBUG [Thread-191] ipc.AsyncRpcClient(163): Use 
 global event loop group NioEventLoopGroup
 2015-03-21 03:58:46,988 INFO  [RS:0;priapus:54177-splits-1426910324832] 
 regionserver.SplitRequest(142): Split transaction journal:
   STARTED at 1426910326977
 {noformat}
 We can see that it failed without any error message.
 I think can only happen when the parent is not splittable or we can not find 
 a splitrow.
 {noformat}
 2015-03-21 03:58:47,019 INFO  
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.HStore(1334): Completed major compaction of 2 (all) file(s) in 
 family of 
 testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.
  into e97bccdc4b014c15a52a579cd49ebb31(size=12.6 K), total size for store is 
 12.6 K. This selection was in queue for 0sec, and took 0sec to execute.
 2015-03-21 03:58:47,019 INFO  
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.CompactSplitThread$CompactionRunner(523): Completed compaction: 
 Request = 
 regionName=testFromClientSideWhileSplitting,,1426910324847.abe1973ea732066b12d8e33fce12a951.,
  storeName=family, fileCount=2, fileSize=25.5 K, priority=1, 
 time=14542808784655186; duration=0sec
 2015-03-21 03:58:47,020 DEBUG 
 [RS:0;priapus:54177-shortCompactions-1426910324308] 
 regionserver.CompactSplitThread$CompactionRunner(546): CompactSplitThread 
 Status: compaction_queue=(0:0), split_queue=0, merge_queue=0
 {noformat}
 We can see that, the compaction was completed at 03:58:47,019, but split was 
 started at 03:58:46,970 which is earlier.
 So we have a reference file and is not splittable.
 I think the problem is 'compactAndBlockUntilDone' is not reliable, it may 
 return before the compaction complete.
 Will try to prepare a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13310) Fix high priority findbugs warnings

2015-03-21 Thread zhangduo (JIRA)
zhangduo created HBASE-13310:


 Summary: Fix high priority findbugs warnings
 Key: HBASE-13310
 URL: https://issues.apache.org/jira/browse/HBASE-13310
 Project: HBase
  Issue Type: Task
Reporter: zhangduo
Assignee: zhangduo


See here.

https://builds.apache.org/job/HBase-TRUNK-jacoco/25/findbugsResult/HIGH/

High priority warnings usually introduce bugs or have very bad impact on 
performace. Let's fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13257) Show coverage report on jenkins

2015-03-20 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372376#comment-14372376
 ] 

zhangduo commented on HBASE-13257:
--

Let me try a few more times this weekend.
The build results are red most time now...
So when I finish, I could just change the config of HBase-TRUNK and remove 
HBase-TRUNK-jacoco? Thanks.

 Show coverage report on jenkins
 ---

 Key: HBASE-13257
 URL: https://issues.apache.org/jira/browse/HBASE-13257
 Project: HBase
  Issue Type: Task
Reporter: zhangduo
Assignee: zhangduo
Priority: Minor

 Think of showing jacoco coverage report on https://builds.apache.org .
 And there is an advantage of showing it on jenkins that the jenkins jacoco 
 plugin can handle cross module coverage.
 Can not do it locally since https://github.com/jacoco/jacoco/pull/97 is still 
 pending.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13295) TestInfoServers hung

2015-03-20 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372403#comment-14372403
 ] 

zhangduo commented on HBASE-13295:
--

I think this patch could be applied to all branches?
[~apurtell] [~enis]

Thanks.

 TestInfoServers hung
 

 Key: HBASE-13295
 URL: https://issues.apache.org/jira/browse/HBASE-13295
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13295.patch


 https://builds.apache.org/job/HBase-TRUNK-jacoco/16/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestInfoServers-output.txt
 No progress after this line
 {noformat}
 2015-03-19 22:46:06,809 INFO  [main] hbase.TestInfoServers(127): Testing 
 http://localhost:44749/table.jsp?name=testMasterServerReadOnlyaction=splitkey=
  has Table action request accepted
 {noformat}
 I think the problem maybe we do not wait for master finish becoming active, 
 and there is no timeout when doing http request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13258) Promote TestHRegion to LargeTests

2015-03-20 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13258:
-
   Resolution: Fixed
Fix Version/s: 2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Resolve this since it had already been pushed to master several days ago.
Can open backport issue if we want to integration jacoco report to other 
branches.

 Promote TestHRegion to LargeTests
 -

 Key: HBASE-13258
 URL: https://issues.apache.org/jira/browse/HBASE-13258
 Project: HBase
  Issue Type: Sub-task
  Components: test
Reporter: zhangduo
Assignee: zhangduo
 Fix For: 2.0.0

 Attachments: HBASE-13258-addendum.patch, HBASE-13258.patch, 
 HBASE-13258.patch


 It always timeout we I tried to get a coverage report locally. The problem is 
 testWritesWhileGetting, it runs extremely slow when jacoco agent enabled(not 
 a bug, there is progress).
 Since it has a VerySlowRegionServerTests annotation on it, I think it is OK 
 to promote it to LargeTests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13301) Possible memory leak in BucketCache

2015-03-20 Thread zhangduo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangduo updated HBASE-13301:
-
Attachment: HBASE-13301-testcase.patch

Only a testcase.
A little tricky so I really need other ones help confirming the problem.

The flow is:

t1 cache a block
t2 evict the block but stopped before acquiring offsetLock(this is done by 
holding the offsetLock with t1 in this testcase)
t1 evict the block, and cache the block again.
t2 continue evicting the block and find that it is not the block which it 
should deal with, so just give up and return false.
Then we have blockCount=1, and some used spaces in BucketAllocator, but no 
block in BucketCache. So we have no chance to free the used spaces.

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13301-testcase.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13301) Possible memory leak in BucketCache

2015-03-20 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371352#comment-14371352
 ] 

zhangduo commented on HBASE-13301:
--

Do not submit the patch since it is not a fix.

Experts needed. [~stack] (I do not know who is the right person since 
[~zjushch] seems not active for a long time, so...)

 Possible memory leak in BucketCache
 ---

 Key: HBASE-13301
 URL: https://issues.apache.org/jira/browse/HBASE-13301
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Reporter: zhangduo
Assignee: zhangduo
 Attachments: HBASE-13301-testcase.patch


 {code:title=BucketCache.java}
 public boolean evictBlock(BlockCacheKey cacheKey) {
   ...
   if (bucketEntry.equals(backingMap.remove(cacheKey))) {
 bucketAllocator.freeBlock(bucketEntry.offset());
 realCacheSize.addAndGet(-1 * bucketEntry.getLength());
 blocksByHFile.remove(cacheKey.getHfileName(), cacheKey);
 if (removedBlock == null) {
   this.blockNumber.decrementAndGet();
 }
   } else {
 return false;
   }
   ...
 {code}
 I think the problem is here. We remove a BucketEntry that should not be 
 removed by us, but we do not put it back and also do not do any clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   >