date:20131103


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812292#comment-13812292
 ] 

Hudson commented on HBASE-8942:
---

FAILURE: Integrated in hbase-0.96 #178 (See 
[https://builds.apache.org/job/hbase-0.96/178/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (stack: rev 1538318)
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1

 Attachments: 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9855) evictBlocksByHfileName improvement for bucket cache


[ 
https://issues.apache.org/jira/browse/HBASE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812291#comment-13812291
 ] 

Hudson commented on HBASE-9855:
---

FAILURE: Integrated in hbase-0.96 #178 (See 
[https://builds.apache.org/job/hbase-0.96/178/])
HBASE-9855 evictBlocksByHfileName improvement for bucket cache (stack: rev 
1538319)
* 
/hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ConcurrentIndex.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


 evictBlocksByHfileName improvement for bucket cache
 ---

 Key: HBASE-9855
 URL: https://issues.apache.org/jira/browse/HBASE-9855
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.98.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 0.98.0, 0.96.1

 Attachments: HBase-9855-v4.txt


 In deed, it comes from fb's l2 cache by [~avf]'s nice work,  i just did a 
 simple backport here. It could improve a linear-time search through the whole 
 cache map into a log-access-time map search.
 I did a small bench, showed it brings a bit gc overhead, but considering the 
 evict on close triggered by frequent compaction activity, seems reasonable?
 and i thought bring a evictOnClose config  into BucketCache ctor and only 
 put/remove the new index map while evictOnClose is true, seems this value 
 could be set by each family schema, but BucketCache is a global instance not 
 per each family, so just ignore it rightnow...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812319#comment-13812319
 ] 

Hudson commented on HBASE-8942:
---

SUCCESS: Integrated in HBase-TRUNK #4665 (See 
[https://builds.apache.org/job/HBase-TRUNK/4665/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (stack: rev 1538317)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1

 Attachments: 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9855) evictBlocksByHfileName improvement for bucket cache


[ 
https://issues.apache.org/jira/browse/HBASE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812318#comment-13812318
 ] 

Hudson commented on HBASE-9855:
---

SUCCESS: Integrated in HBase-TRUNK #4665 (See 
[https://builds.apache.org/job/HBase-TRUNK/4665/])
HBASE-9855 evictBlocksByHfileName improvement for bucket cache (stack: rev 
1538320)
* 
/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ConcurrentIndex.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


 evictBlocksByHfileName improvement for bucket cache
 ---

 Key: HBASE-9855
 URL: https://issues.apache.org/jira/browse/HBASE-9855
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.98.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 0.98.0, 0.96.1

 Attachments: HBase-9855-v4.txt


 In deed, it comes from fb's l2 cache by [~avf]'s nice work,  i just did a 
 simple backport here. It could improve a linear-time search through the whole 
 cache map into a log-access-time map search.
 I did a small bench, showed it brings a bit gc overhead, but considering the 
 evict on close triggered by frequent compaction activity, seems reasonable?
 and i thought bring a evictOnClose config  into BucketCache ctor and only 
 put/remove the new index map while evictOnClose is true, seems this value 
 could be set by each family schema, but BucketCache is a global instance not 
 per each family, so just ignore it rightnow...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9855) evictBlocksByHfileName improvement for bucket cache


[ 
https://issues.apache.org/jira/browse/HBASE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812339#comment-13812339
 ] 

Hudson commented on HBASE-9855:
---

FAILURE: Integrated in hbase-0.96-hadoop2 #112 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/112/])
HBASE-9855 evictBlocksByHfileName improvement for bucket cache (stack: rev 
1538319)
* 
/hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ConcurrentIndex.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


 evictBlocksByHfileName improvement for bucket cache
 ---

 Key: HBASE-9855
 URL: https://issues.apache.org/jira/browse/HBASE-9855
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.98.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 0.98.0, 0.96.1

 Attachments: HBase-9855-v4.txt


 In deed, it comes from fb's l2 cache by [~avf]'s nice work,  i just did a 
 simple backport here. It could improve a linear-time search through the whole 
 cache map into a log-access-time map search.
 I did a small bench, showed it brings a bit gc overhead, but considering the 
 evict on close triggered by frequent compaction activity, seems reasonable?
 and i thought bring a evictOnClose config  into BucketCache ctor and only 
 put/remove the new index map while evictOnClose is true, seems this value 
 could be set by each family schema, but BucketCache is a global instance not 
 per each family, so just ignore it rightnow...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812340#comment-13812340
 ] 

Hudson commented on HBASE-8942:
---

FAILURE: Integrated in hbase-0.96-hadoop2 #112 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/112/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (stack: rev 1538318)
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1

 Attachments: 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9855) evictBlocksByHfileName improvement for bucket cache


[ 
https://issues.apache.org/jira/browse/HBASE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812354#comment-13812354
 ] 

Hudson commented on HBASE-9855:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #824 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/824/])
HBASE-9855 evictBlocksByHfileName improvement for bucket cache (stack: rev 
1538320)
* 
/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ConcurrentIndex.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java


 evictBlocksByHfileName improvement for bucket cache
 ---

 Key: HBASE-9855
 URL: https://issues.apache.org/jira/browse/HBASE-9855
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.98.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 0.98.0, 0.96.1

 Attachments: HBase-9855-v4.txt


 In deed, it comes from fb's l2 cache by [~avf]'s nice work,  i just did a 
 simple backport here. It could improve a linear-time search through the whole 
 cache map into a log-access-time map search.
 I did a small bench, showed it brings a bit gc overhead, but considering the 
 evict on close triggered by frequent compaction activity, seems reasonable?
 and i thought bring a evictOnClose config  into BucketCache ctor and only 
 put/remove the new index map while evictOnClose is true, seems this value 
 could be set by each family schema, but BucketCache is a global instance not 
 per each family, so just ignore it rightnow...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812355#comment-13812355
 ] 

Hudson commented on HBASE-8942:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #824 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/824/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (stack: rev 1538317)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1

 Attachments: 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0

2013-11-03 Thread Eric Charles (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812388#comment-13812388
 ] 

Eric Charles commented on HBASE-6581:
-

Any further progress on this one? Patch risks to be obsolete with the time. Thx.

 Build with hadoop.profile=3.0
 -

 Key: HBASE-6581
 URL: https://issues.apache.org/jira/browse/HBASE-6581
 Project: HBase
  Issue Type: Bug
Reporter: Eric Charles
Assignee: Eric Charles
Priority: Critical
 Fix For: 0.98.0

 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, 
 HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, 
 HBASE-6581-5.patch, HBASE-6581.diff, HBASE-6581.diff


 Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
 change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
 instead of 3.0.0-SNAPSHOT in hbase-common).
 I can provide a patch that would move most of hadoop dependencies in their 
 respective profiles and will define the correct hadoop deps in the 3.0 
 profile.
 Please tell me if that's ok to go this way.
 Thx, Eric
 [1]
 $ mvn clean install -Dhadoop.profile=3.0
 [INFO] Scanning for projects...
 [ERROR] The build could not read 3 projects - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
 [ERROR] 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HBASE-9880) client.TestAsyncProcess.testWithNoClearOnFail broke on 0.96 by HBASE-9867

2013-11-03 Thread stack (JIRA)

stack created HBASE-9880:


 Summary: client.TestAsyncProcess.testWithNoClearOnFail broke on 
0.96 by HBASE-9867 
 Key: HBASE-9880
 URL: https://issues.apache.org/jira/browse/HBASE-9880
 Project: HBase
  Issue Type: Test
Reporter: stack
Assignee: stack


It looks like the backport of HBASE-9867 broke 0.96 build (fine on trunk).  
This was my patch.  Let me fix.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9870) HFileDataBlockEncoderImpl#diskToCacheFormat uses wrong format

2013-11-03 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812421#comment-13812421
 ] 

Jimmy Xiang commented on HBASE-9870:


I tried to make sure onDisk encoding always the same as the inCache one, which 
still gave me data loss. Perhaps I missed something. I will get rid of the 
inCache one and give it another try.

 HFileDataBlockEncoderImpl#diskToCacheFormat uses wrong format
 -

 Key: HBASE-9870
 URL: https://issues.apache.org/jira/browse/HBASE-9870
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang

 In this method, we have
 {code}
 if (block.getBlockType() == BlockType.ENCODED_DATA) {
   if (block.getDataBlockEncodingId() == onDisk.getId()) {
 // The block is already in the desired in-cache encoding.
 return block;
   }
 {code}
 This assumes onDisk encoding is the same as that of inCache.  This is not 
 true when we change the encoding of a CF.  This could be one of the reasons I 
 got data loss with online encoding change?
 If I make sure onDisk == inCache all the time, my ITBLL with online encoding 
 change worked once for me.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs

2013-11-03 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812425#comment-13812425
 ] 

Jimmy Xiang commented on HBASE-9863:


For those private methods such as isTableAvailableAndInitialized() and 
getNamespaceTable(), can we remove synchronized, instead make sure the callers 
have proper synchronization?  For this one:

{code}
   public synchronized NamespaceDescriptor get(String name) throws IOException {
-return get(getNamespaceTable(), name);
+return zkNamespaceManager.get(name);
   }
{code}
The change is good. We still need to check is the manger is initialized, right?


 Intermittently 
 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
 ---

 Key: HBASE-9863
 URL: https://issues.apache.org/jira/browse/HBASE-9863
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt


 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes 
 hung.
 Here were two recent occurrences:
 https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console
 https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console
 There were 9 occurrences of the following in both stack traces:
 {code}
 FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 
 waiting for monitor entry [0x6fdf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250)
   - waiting to lock 0x7f69b5f0 (a 
 org.apache.hadoop.hbase.master.TableNamespaceManager)
   at 
 org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146)
   at 
 org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
 {code}
 The test hung here:
 {code}
 pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() 
 [0x74efe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436)
   - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485)
   at 
 org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs


[ 
https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812433#comment-13812433
 ] 

Ted Yu commented on HBASE-9863:
---

bq.  can we remove synchronized, instead make sure the callers have proper 
synchronization?
isTableAvailableAndInitialized() is called by start() which is not 
synchronized. Doing the above would equate to adding synchronized(this) in 
start() before calling isTableAvailableAndInitialized().
Is that what you meant ?

For 'NamespaceDescriptor get(String name)', the semantics is the same as the 
original method.

 Intermittently 
 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
 ---

 Key: HBASE-9863
 URL: https://issues.apache.org/jira/browse/HBASE-9863
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt


 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes 
 hung.
 Here were two recent occurrences:
 https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console
 https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console
 There were 9 occurrences of the following in both stack traces:
 {code}
 FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 
 waiting for monitor entry [0x6fdf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250)
   - waiting to lock 0x7f69b5f0 (a 
 org.apache.hadoop.hbase.master.TableNamespaceManager)
   at 
 org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146)
   at 
 org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
 {code}
 The test hung here:
 {code}
 pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() 
 [0x74efe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436)
   - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485)
   at 
 org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (HBASE-9420) Math.max() on syncedTillHere lacks synchronization


 [ 
https://issues.apache.org/jira/browse/HBASE-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-9420.
---

Resolution: Later

It is better to pursue solution in HBASE-8755

 Math.max() on syncedTillHere lacks synchronization
 --

 Key: HBASE-9420
 URL: https://issues.apache.org/jira/browse/HBASE-9420
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Trivial
 Fix For: 0.98.0

 Attachments: 9420-v1.txt, 9420-v2.txt


 In FSHlog#syncer(), around line 1080:
 {code}
   this.syncedTillHere = Math.max(this.syncedTillHere, doneUpto);
 {code}
 Assignment to syncedTillHere after computing max value is not protected by 
 proper synchronization.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9188) TestHBaseFsck#testNotInMetaOrDeployedHole occasionally fails


[ 
https://issues.apache.org/jira/browse/HBASE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812451#comment-13812451
 ] 

Ted Yu commented on HBASE-9188:
---

There has been some fixes w.r.t. TestHBaseFsck.
This test hasn't failed for a while.

 TestHBaseFsck#testNotInMetaOrDeployedHole occasionally fails
 

 Key: HBASE-9188
 URL: https://issues.apache.org/jira/browse/HBASE-9188
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu

 From 
 https://builds.apache.org/job/hbase-0.95-on-hadoop2/231/testReport/org.apache.hadoop.hbase.util/TestHBaseFsck/testNotInMetaOrDeployedHole/
  (region 
 tableNotInMetaOrDeployedHole,B,1376135595424.3ec6178a369a899c007fd89807b37153):
 expected:[NOT_IN_META_OR_DEPLOYED, HOLE_IN_REGION_CHAIN] but 
 was:[NOT_IN_META_OR_DEPLOYED, NOT_DEPLOYED, HOLE_IN_REGION_CHAIN]
 Here is snippet of test output:
 {code}
 2013-08-10 11:53:16,941 DEBUG [RS_CLOSE_REGION-vesta:38578-1] 
 handler.CloseRegionHandler(168): set region closed state in zk successfully 
 for region 
 tableNotInMetaOrDeployedHole,B,1376135595424.3ec6178a369a899c007fd89807b37153.
  sn name: vesta.apache.org,38578,1376135290018
 2013-08-10 11:53:16,941 DEBUG [RS_CLOSE_REGION-vesta:38578-1] 
 handler.CloseRegionHandler(177): Closed region 
 tableNotInMetaOrDeployedHole,B,1376135595424.3ec6178a369a899c007fd89807b37153.
 2013-08-10 11:53:16,942 DEBUG [AM.ZK.Worker-pool-2-thread-13] 
 master.AssignmentManager(782): Handling transition=RS_ZK_REGION_CLOSED, 
 server=vesta.apache.org,38578,1376135290018, 
 region=3ec6178a369a899c007fd89807b37153, current state from region state map 
 ={3ec6178a369a899c007fd89807b37153 state=PENDING_CLOSE, ts=1376135596730, 
 server=vesta.apache.org,38578,1376135290018}
 2013-08-10 11:53:16,942 WARN  [AM.ZK.Worker-pool-2-thread-13] 
 master.RegionStates(245): Closed region 3ec6178a369a899c007fd89807b37153 
 still on vesta.apache.org,38578,1376135290018? Ignored, reset it to null
 2013-08-10 11:53:16,942 INFO  [AM.ZK.Worker-pool-2-thread-13] 
 master.RegionStates(260): Transitioned from {3ec6178a369a899c007fd89807b37153 
 state=PENDING_CLOSE, ts=1376135596730, 
 server=vesta.apache.org,38578,1376135290018} to 
 {3ec6178a369a899c007fd89807b37153 state=CLOSED, ts=1376135596942, server=null}
 2013-08-10 11:53:16,942 DEBUG [AM.ZK.Worker-pool-2-thread-13] 
 handler.ClosedRegionHandler(92): Handling CLOSED event for 
 3ec6178a369a899c007fd89807b37153
 2013-08-10 11:53:16,942 DEBUG [AM.ZK.Worker-pool-2-thread-13] 
 master.AssignmentManager(1462): Table being disabled so deleting ZK node and 
 removing from regions in transition, skipping assignment of region 
 tableNotInMetaOrDeployedHole,B,1376135595424.3ec6178a369a899c007fd89807b37153.
 ...
 2013-08-10 11:53:17,319 INFO  [pool-1-thread-1] 
 hbase.HBaseTestingUtility(1815): getMetaTableRows: row - 
 tableNotInMetaOrDeployedHole,B,1376135595424.3ec6178a369a899c007fd89807b37153.{ENCODED
  = 3ec6178a369a899c007fd89807b37153, NAME = 
 'tableNotInMetaOrDeployedHole,B,1376135595424.3ec6178a369a899c007fd89807b37153.',
  STARTKEY = 'B', ENDKEY = 'C'}
 2013-08-10 11:53:17,320 INFO  [pool-1-thread-1] 
 hbase.HBaseTestingUtility(1815): getMetaTableRows: row - 
 tableNotInMetaOrDeployedHole,C,1376135595424.c2ae2bddbe9302c4344c13936248ac9d.{ENCODED
  = c2ae2bddbe9302c4344c13936248ac9d, NAME = 
 'tableNotInMetaOrDeployedHole,C,1376135595424.c2ae2bddbe9302c4344c13936248ac9d.',
  STARTKEY = 'C', ENDKEY = ''}
 2013-08-10 11:53:17,320 INFO  [pool-1-thread-1] util.TestHBaseFsck(231): 
 tableNotInMetaOrDeployedHole,,1376135595423.9df585f7f666e1cd55d7b875aae22ece.
 2013-08-10 11:53:17,320 INFO  [pool-1-thread-1] util.TestHBaseFsck(231): 
 tableNotInMetaOrDeployedHole,A,1376135595424.90a7d5f2211951d321c9f29f4059671f.
 2013-08-10 11:53:17,320 INFO  [pool-1-thread-1] util.TestHBaseFsck(231): 
 tableNotInMetaOrDeployedHole,B,1376135595424.3ec6178a369a899c007fd89807b37153.
 2013-08-10 11:53:17,320 INFO  [pool-1-thread-1] util.TestHBaseFsck(231): 
 tableNotInMetaOrDeployedHole,C,1376135595424.c2ae2bddbe9302c4344c13936248ac9d.
 2013-08-10 11:53:17,326 DEBUG [pool-1-thread-1] client.ClientScanner(218): 
 Finished region={ENCODED = 1588230740, NAME = 'hbase:meta,,1', STARTKEY = 
 '', ENDKEY = ''}
 2013-08-10 11:53:17,327 INFO  [pool-1-thread-1] util.TestHBaseFsck(319): 
 {ENCODED = 9df585f7f666e1cd55d7b875aae22ece, NAME = 
 'tableNotInMetaOrDeployedHole,,1376135595423.9df585f7f666e1cd55d7b875aae22ece.',
  STARTKEY = '', ENDKEY = 'A'}vesta.apache.org,41438,1376135289941
 2013-08-10 11:53:17,328 INFO  [pool-1-thread-1] util.TestHBaseFsck(319): 
 {ENCODED = 90a7d5f2211951d321c9f29f4059671f, NAME = 
 'tableNotInMetaOrDeployedHole,A,1376135595424.90a7d5f2211951d321c9f29f4059671f.',
  STARTKEY = 'A', ENDKEY = 'B'}vesta.apache.org,38578,1376135290018
 2013-08-10 11:53:17,328 INFO

[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)

2013-11-03 Thread Himanshu Vashishtha (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812477#comment-13812477
 ] 

Himanshu Vashishtha commented on HBASE-8741:


Below are some results I got while testing HLogPE, with threads count varying 
from 1-5 on a 5 node (1 NN, 4DN) cluster. 
IMO, the result are mixed and there is almost negligible much perf hit.
{code}
for i in `seq 1 5` ; do for j in 1 2 3; do 
/home/himanshu/hbase-0.97.0-SNAPSHOT/bin/hbase 
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation  -verify 
-threads ${i} -iterations 100 -nocleanup  -keySize 50 -valueSize 100; 
done; done
{code}

||Threads||w/o patch time||w/o patch ops||w/ patch time||w/ patch ops||
|1|530.334s|1885.604ops/s|519.382s|1925.365ops/s|
|1|531.314s|1882.126ops/s|524.750s|1905.669ops/s|
|1|529.636s|1888.089ops/s|537.218s|1861.442ops/s|
|2|796.771s|2510.132ops/s|786.245s|2543.736ops/s|
|2|811.930s|2463.267ops/s|818.789s|2442.632ops/s|
|2|805.139s|2484.043ops/s|792.434s|2523.869ops/s|
|3|948.641s|3162.419ops/s|938.286s|3197.319ops/s|
|3|968.503s|3097.564ops/s|955.333s|3140.266ops/s|
|3|970.692s|3090.579ops/s|949.411s|3159.854ops/s|
|4|648.943s|6163.870ops/s|646.279s|6189.277ops/s|
|4|658.654s|6072.991ops/s|656.277s|6094.987ops/s|
|4|634.568s|6303.501ops/s|669.986s|5970.274ops/s|
|5|722.867s|6916.902ops/s|730.954s|6840.376ops/s|
|5|731.401s|6836.195ops/s|725.907s|6887.935ops/s|
|5|723.812s|6907.871ops/s|718.261s|6961.258ops/s|



 Scope sequenceid to the region rather than regionserver (WAS: Mutations on 
 Regions in recovery mode might have same sequenceIDs)
 

 Key: HBASE-8741
 URL: https://issues.apache.org/jira/browse/HBASE-8741
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Affects Versions: 0.95.1
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.98.0

 Attachments: HBASE-8741-trunk-v6.1-rebased.patch, 
 HBASE-8741-trunk-v6.2.1.patch, HBASE-8741-trunk-v6.2.2.patch, 
 HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.3.patch, 
 HBASE-8741-trunk-v6.patch, HBASE-8741-v0.patch, HBASE-8741-v2.patch, 
 HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, HBASE-8741-v4-again.patch, 
 HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, HBASE-8741-v5.patch


 Currently, when opening a region, we find the maximum sequence ID from all 
 its HFiles and then set the LogSequenceId of the log (in case the later is at 
 a small value). This works good in recovered.edits case as we are not writing 
 to the region until we have replayed all of its previous edits. 
 With distributed log replay, if we want to enable writes while a region is 
 under recovery, we need to make sure that the logSequenceId  maximum 
 logSequenceId of the old regionserver. Otherwise, we might have a situation 
 where new edits have same (or smaller) sequenceIds. 
 We can store region level information in the WALTrailer, than this scenario 
 could be avoided by:
 a) reading the trailer of the last completed file, i.e., last wal file 
 which has a trailer and,
 b) completely reading the last wal file (this file would not have the 
 trailer, so it needs to be read completely).
 In future, if we switch to multi wal file, we could read the trailer for all 
 completed WAL files, and reading the remaining incomplete files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812509#comment-13812509
 ] 

Lars Hofhansl commented on HBASE-9865:
--

This is not quite right in the partial read failure case, yet. (a log was 
partially read and is then found corrupted)


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9808) org.apache.hadoop.hbase.rest.PerformanceEvaluation is out of sync with org.apache.hadoop.hbase.PerformanceEvaluation

2013-11-03 Thread Gustavo Anatoly (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gustavo Anatoly updated HBASE-9808:
---

Attachment: HBASE-9808-v1.patch

 org.apache.hadoop.hbase.rest.PerformanceEvaluation is out of sync with 
 org.apache.hadoop.hbase.PerformanceEvaluation
 

 Key: HBASE-9808
 URL: https://issues.apache.org/jira/browse/HBASE-9808
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9808-v1.patch, HBASE-9808.patch


 Here is list of JIRAs whose fixes might have gone into 
 rest.PerformanceEvaluation :
 {code}
 
 r1527817 | mbertozzi | 2013-09-30 15:57:44 -0700 (Mon, 30 Sep 2013) | 1 line
 HBASE-9663 PerformanceEvaluation does not properly honor specified table name 
 parameter
 
 r1526452 | mbertozzi | 2013-09-26 04:58:50 -0700 (Thu, 26 Sep 2013) | 1 line
 HBASE-9662 PerformanceEvaluation input do not handle tags properties
 
 r1525269 | ramkrishna | 2013-09-21 11:01:32 -0700 (Sat, 21 Sep 2013) | 3 lines
 HBASE-8496 - Implement tags and the internals of how a tag should look like 
 (Ram)
 
 r1524985 | nkeywal | 2013-09-20 06:02:54 -0700 (Fri, 20 Sep 2013) | 1 line
 HBASE-9558  PerformanceEvaluation is in hbase-server, and creates a 
 dependency to MiniDFSCluster
 
 r1523782 | nkeywal | 2013-09-16 13:07:13 -0700 (Mon, 16 Sep 2013) | 1 line
 HBASE-9521  clean clearBufferOnFail behavior and deprecate it
 
 r1518341 | jdcryans | 2013-08-28 12:46:55 -0700 (Wed, 28 Aug 2013) | 2 lines
 HBASE-9330 Refactor PE to create HTable the correct way
 {code}
 Long term, we may consider consolidating the two PerformanceEvaluation 
 classes so that such maintenance work can be reduced.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9808) org.apache.hadoop.hbase.rest.PerformanceEvaluation is out of sync with org.apache.hadoop.hbase.PerformanceEvaluation

2013-11-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812531#comment-13812531
 ] 

Hadoop QA commented on HBASE-9808:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611840/HBASE-9808-v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestHRegion
  org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7720//console

This message is automatically generated.

 org.apache.hadoop.hbase.rest.PerformanceEvaluation is out of sync with 
 org.apache.hadoop.hbase.PerformanceEvaluation
 

 Key: HBASE-9808
 URL: https://issues.apache.org/jira/browse/HBASE-9808
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9808-v1.patch, HBASE-9808.patch


 Here is list of JIRAs whose fixes might have gone into 
 rest.PerformanceEvaluation :
 {code}
 
 r1527817 | mbertozzi | 2013-09-30 15:57:44 -0700 (Mon, 30 Sep 2013) | 1 line
 HBASE-9663 PerformanceEvaluation does not properly honor specified table name 
 parameter
 
 r1526452 | mbertozzi | 2013-09-26 04:58:50 -0700 (Thu, 26 Sep 2013) | 1 line
 HBASE-9662 PerformanceEvaluation input do not handle tags properties
 
 r1525269 | ramkrishna | 2013-09-21 11:01:32 -0700 (Sat, 21 Sep 2013) | 3 lines
 HBASE-8496 - Implement tags and the internals of how a tag should look like 
 (Ram)
 
 r1524985 | nkeywal | 2013-09-20 06:02:54 -0700 (Fri, 20 Sep 2013) | 1 line
 HBASE-9558  PerformanceEvaluation is in hbase-server, and creates a 
 dependency to MiniDFSCluster
 
 r1523782 | nkeywal | 2013-09-16 13:07:13 -0700 (Mon, 16 Sep 2013) | 1 line
 HBASE-9521  clean clearBufferOnFail behavior and

[jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs

2013-11-03 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812539#comment-13812539
 ] 

Jimmy Xiang commented on HBASE-9863:


I think start() should be synchronized too. It's  better to make sure it won't 
be called more than once also. For 'NamespaceDescriptor get(String name)',  
yes, the semantics is the same. After the change, we don't check if 
isTableAvailableAndInitialized. So zkNamespaceManager could be uninitialized.

 Intermittently 
 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
 ---

 Key: HBASE-9863
 URL: https://issues.apache.org/jira/browse/HBASE-9863
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt


 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes 
 hung.
 Here were two recent occurrences:
 https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console
 https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console
 There were 9 occurrences of the following in both stack traces:
 {code}
 FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 
 waiting for monitor entry [0x6fdf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250)
   - waiting to lock 0x7f69b5f0 (a 
 org.apache.hadoop.hbase.master.TableNamespaceManager)
   at 
 org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146)
   at 
 org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
 {code}
 The test hung here:
 {code}
 pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() 
 [0x74efe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436)
   - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485)
   at 
 org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs


 [ 
https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9863:
--

Attachment: 9863-v4.txt

Patch v4 incorporates Jimmy's comments above.

 Intermittently 
 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
 ---

 Key: HBASE-9863
 URL: https://issues.apache.org/jira/browse/HBASE-9863
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt, 9863-v4.txt


 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes 
 hung.
 Here were two recent occurrences:
 https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console
 https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console
 There were 9 occurrences of the following in both stack traces:
 {code}
 FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 
 waiting for monitor entry [0x6fdf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250)
   - waiting to lock 0x7f69b5f0 (a 
 org.apache.hadoop.hbase.master.TableNamespaceManager)
   at 
 org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146)
   at 
 org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
 {code}
 The test hung here:
 {code}
 pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() 
 [0x74efe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436)
   - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485)
   at 
 org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812556#comment-13812556
 ] 

Lars Hofhansl commented on HBASE-8942:
--

Checked the 0.94 code. Should be safe there as well. Good find.

 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1

 Attachments: 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812557#comment-13812557
 ] 

Lars Hofhansl commented on HBASE-8942:
--

Committed to 0.94 as well.

 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


 [ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-8942:
-

Fix Version/s: 0.94.14

 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9808) org.apache.hadoop.hbase.rest.PerformanceEvaluation is out of sync with org.apache.hadoop.hbase.PerformanceEvaluation


[ 
https://issues.apache.org/jira/browse/HBASE-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812585#comment-13812585
 ] 

Ted Yu commented on HBASE-9808:
---

Test failures were not related to the patch.

 org.apache.hadoop.hbase.rest.PerformanceEvaluation is out of sync with 
 org.apache.hadoop.hbase.PerformanceEvaluation
 

 Key: HBASE-9808
 URL: https://issues.apache.org/jira/browse/HBASE-9808
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9808-v1.patch, HBASE-9808.patch


 Here is list of JIRAs whose fixes might have gone into 
 rest.PerformanceEvaluation :
 {code}
 
 r1527817 | mbertozzi | 2013-09-30 15:57:44 -0700 (Mon, 30 Sep 2013) | 1 line
 HBASE-9663 PerformanceEvaluation does not properly honor specified table name 
 parameter
 
 r1526452 | mbertozzi | 2013-09-26 04:58:50 -0700 (Thu, 26 Sep 2013) | 1 line
 HBASE-9662 PerformanceEvaluation input do not handle tags properties
 
 r1525269 | ramkrishna | 2013-09-21 11:01:32 -0700 (Sat, 21 Sep 2013) | 3 lines
 HBASE-8496 - Implement tags and the internals of how a tag should look like 
 (Ram)
 
 r1524985 | nkeywal | 2013-09-20 06:02:54 -0700 (Fri, 20 Sep 2013) | 1 line
 HBASE-9558  PerformanceEvaluation is in hbase-server, and creates a 
 dependency to MiniDFSCluster
 
 r1523782 | nkeywal | 2013-09-16 13:07:13 -0700 (Mon, 16 Sep 2013) | 1 line
 HBASE-9521  clean clearBufferOnFail behavior and deprecate it
 
 r1518341 | jdcryans | 2013-08-28 12:46:55 -0700 (Wed, 28 Aug 2013) | 2 lines
 HBASE-9330 Refactor PE to create HTable the correct way
 {code}
 Long term, we may consider consolidating the two PerformanceEvaluation 
 classes so that such maintenance work can be reduced.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs

2013-11-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812587#comment-13812587
 ] 

Hadoop QA commented on HBASE-9863:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611850/9863-v4.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestHRegion
  org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:488)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7721//console

This message is automatically generated.

 Intermittently 
 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
 ---

 Key: HBASE-9863
 URL: https://issues.apache.org/jira/browse/HBASE-9863
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt, 9863-v4.txt


 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes 
 hung.
 Here were two recent occurrences:
 https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console
 https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console
 There were 9 occurrences of the following in both stack traces:
 {code}
 FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 
 waiting for monitor entry [0x6fdf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250)
   - waiting to lock 0x7f69b5f0 (a 
 org.apache.hadoop.hbase.master.TableNamespaceManager)
   at 
 org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146)
   at 
 org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105)
   at

[jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs


[ 
https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812588#comment-13812588
 ] 

Ted Yu commented on HBASE-9863:
---

Same test failure appeared in QA run for HBASE-9808
I don't think it was caused by my patch.

 Intermittently 
 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
 ---

 Key: HBASE-9863
 URL: https://issues.apache.org/jira/browse/HBASE-9863
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt, 9863-v4.txt


 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes 
 hung.
 Here were two recent occurrences:
 https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console
 https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console
 There were 9 occurrences of the following in both stack traces:
 {code}
 FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 
 waiting for monitor entry [0x6fdf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250)
   - waiting to lock 0x7f69b5f0 (a 
 org.apache.hadoop.hbase.master.TableNamespaceManager)
   at 
 org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146)
   at 
 org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
 {code}
 The test hung here:
 {code}
 pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() 
 [0x74efe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436)
   - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485)
   at 
 org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9681) Basic codec negotiation

2013-11-03 Thread Anoop Sam John (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812602#comment-13812602
]

Anoop Sam John commented on HBASE-9681:
---

Ram
So it will be like the server will always NOT write back the cell tags? Or
that also based on some context information?

Basic codec negotiation
---

Key: HBASE-9681
URL: https://issues.apache.org/jira/browse/HBASE-9681
Project: HBase
Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell

Basic codec negotiation:
There should be a default codec used for cell encoding over the RPC
connection. This should be configurable in the site file.
The client can optionally send a message, a manufactured call that would
otherwise be invalid in some way, to the server asking for a list of
supported cell codecs. An older server should simply send back an error
because the request is invalid except to servers supporting this feature. A
server supporting this feature should send back the requested information or
an error indication if something went wrong.
The client can optionally send a message, a manufactured call that would
otherwise be invalid in some way, to the server asking for it to use a given
codec for all further communication. Otherwise the server will continue to
use the default codec. The server will send back a call response
acknowledging the change or an error indication if the request cannot be
honored.
Server configuration should support mappings from one codec type to another.
We need to handle the case where the server has a codec available that
extends the requested type but overrides some behavior in the base class, and
this is what should be used in lieu of the base type. It must also be
possible to choose an alternate default codec which stands in for the default
codec, is compatible with client expectations, but changes the server side
behavior as needed in the absence of negotiation.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (HBASE-9681) Basic codec negotiation

2013-11-03 Thread Anoop Sam John (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812602#comment-13812602
]

Anoop Sam John edited comment on HBASE-9681 at 11/4/13 4:13 AM:

Ram
So it will be like the server will always NOT write back the cell tags? Or
that also based on some context information?
There is Export tool which uses MR based scan. In this case the server should
serialize back the tags also?

was (Author: anoop.hbase):
Ram
So it will be like the server will always NOT write back the cell tags? Or
that also based on some context information?

Basic codec negotiation
---

Key: HBASE-9681
URL: https://issues.apache.org/jira/browse/HBASE-9681
Project: HBase
Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9872) ModifyTable does not modify the attributes of a newly modified/changed ColumnDescriptor

[
https://issues.apache.org/jira/browse/HBASE-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812609#comment-13812609
]

ramkrishna.s.vasudevan commented on HBASE-9872:
---

Found that the modifyColumn or modifyTable works fine even if HCD is changed in
modifyTable. We have some internal code in that this does not seem to work.
Let me check on that and then see on what to do with this.

ModifyTable does not modify the attributes of a newly modified/changed
ColumnDescriptor

Key: HBASE-9872
URL: https://issues.apache.org/jira/browse/HBASE-9872
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Fix For: 0.98.0, 0.96.1, 0.94.14

This issue (if it is an expected behaviour I can close this) exists in all
versions.
If i do modifyColumn and change an HCDs parameter I am able to get back the
modified HCD with the latest data.
But when i do modifyTable and in that modify an HCD parameter say for eg. the
SCOPE of it then as we don't persist the HCD information as in
TableModifyFamilyHandler used for modifycolumn
{code}
HTableDescriptor htd =
this.masterServices.getMasterFileSystem().modifyColumn(tableName,
familyDesc);
{code}
we are not able to get the updated HCD information on the RegionServer. So
incases of replication where I need to modify the HCD's scope we are not able
to make the replication happen.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812610#comment-13812610
 ] 

Hudson commented on HBASE-8942:
---

FAILURE: Integrated in HBase-0.94-security #328 (See 
[https://builds.apache.org/job/HBase-0.94-security/328/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (Amitanand Aiyer) (larsh: rev 1538484)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9816) Address review comments in HBASE-8496


[ 
https://issues.apache.org/jira/browse/HBASE-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812611#comment-13812611
 ] 

ramkrishna.s.vasudevan commented on HBASE-9816:
---

Thanks Stack for the reviews.  I am not able to open a new RB request even now. 
 Will fix the comments. 

 Address review comments in HBASE-8496
 -

 Key: HBASE-9816
 URL: https://issues.apache.org/jira/browse/HBASE-9816
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9816.patch, HBASE-9816_1.patch, HBASE-9816_1.patch


 This JIRA would be used to address the review comments in HBASE-8496.  Any 
 more comments would be addressed and committed as part of this.  There are 
 already few comments from Stack on the RB.
 https://reviews.apache.org/r/13311/



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9874) Append and Increment operation drops Tags


[ 
https://issues.apache.org/jira/browse/HBASE-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812619#comment-13812619
 ] 

ramkrishna.s.vasudevan commented on HBASE-9874:
---

I had a similar impl in my old patches.  Patch looks good. YEs if the CP needs 
control on what to be done then we need the hook alone, but cannot depend on 
the CP always here. 


 Append and Increment operation drops Tags
 -

 Key: HBASE-9874
 URL: https://issues.apache.org/jira/browse/HBASE-9874
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.98.0

 Attachments: AccessController.postMutationBeforeWAL.txt, 
 HBASE-9874.patch, HBASE-9874_V2.patch, HBASE-9874_V3.patch


 We should consider tags in the existing cells as well as tags coming in the 
 cells within Increment/Append



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812621#comment-13812621
 ] 

Hudson commented on HBASE-8942:
---

SUCCESS: Integrated in HBase-0.94 #1194 (See 
[https://builds.apache.org/job/HBase-0.94/1194/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (Amitanand Aiyer) (larsh: rev 1538484)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812626#comment-13812626
 ] 

Lars Hofhansl commented on HBASE-9865:
--

I'm trying to grok the details of the failure logic, this has gotten pretty 
convoluted over time.
Specifically this part in ReplicationSource.run():

{code}
  try {
if (readAllEntriesToReplicateOrNextFile(currentWALisBeingWrittenTo)) {
  continue;
}
  } catch (IOException ioe) {
 ...
  if (this.replicationQueueInfo.isQueueRecovered()) {
  ...
  considerDumping = true;
  ...
  } else if (currentNbEntries != 0) {
...
considerDumping = true;
currentNbEntries = 0;
  }
  ...
  } finally {
{code}

So when we find a corrupt log file we won't replicate any of it 
({{currentNbEntries = 0}}), unless the queue was recovered, in which case we 
*do* want to replicate the partial set of edits we managed to read?


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that

[jira] [Resolved] (HBASE-9872) ModifyTable does not modify the attributes of a newly modified/changed ColumnDescriptor