[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813737#comment-13813737 ] Hadoop QA commented on HBASE-9818: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612102/9818-v1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7734//console This message is automatically generated. NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Attachments: 9818-v1.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813739#comment-13813739 ] stack commented on HBASE-9892: -- bq. Now, there is no data in regionserver's ephemeral node. It's a good idea to write static attributes like info port there. It is a suggestion. Could be tricky setting this value then triggering watches. Will have to reset them. Maybe znode is not the right place? It is too awkward and if only this one attribute, its a bit of work adding it there. It could be added to the server JMX bean but you'd have to do rmi to find it which requires a port (IIRC) There is the RS heartbeat. Currently we send load. Seems a bit silly sending over constant attributes on each heartbeat but might be easy to do. Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-9892-0.94-v1.diff The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9886) Optimize ServerName#compareTo
[ https://issues.apache.org/jira/browse/HBASE-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813743#comment-13813743 ] Hudson commented on HBASE-9886: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #826 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/826/]) HBASE-9886 Optimize ServerName#compareTo (nkeywal: rev 1538679) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerName.java Optimize ServerName#compareTo - Key: HBASE-9886 URL: https://issues.apache.org/jira/browse/HBASE-9886 Project: HBase Issue Type: Bug Components: Client, regionserver Affects Versions: 0.98.0, 0.96.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Trivial Fix For: 0.98.0, 0.96.1 Attachments: 9886.v1.patch It shows up in the profiling... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9859) Canary Shouldn't go off if the table being read from is disabled
[ https://issues.apache.org/jira/browse/HBASE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813744#comment-13813744 ] Hudson commented on HBASE-9859: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #826 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/826/]) HBASE-9859 Canary Shouldn't go off if the table being read from is disabled (eclark: rev 1538842) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java Canary Shouldn't go off if the table being read from is disabled Key: HBASE-9859 URL: https://issues.apache.org/jira/browse/HBASE-9859 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.1 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0, 0.96.1 Attachments: HBASE-9859-0.patch, HBASE-9859-1.patch Disabling a table causes the Canary to go off with an error message. We should make it so that doesn't cause an error. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813745#comment-13813745 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #826 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/826/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538867) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated HBASE-9892: --- Attachment: HBASE-9892-0.94-v2.diff New patch for hbase 0.94 a. Write rs info port to it's ephemeral node b. RegionServerTracker in HMaster watch regionservers node and keep a map: servername- infoport c, web ui in hmaster gets rs's info port from RegionServerTracker through hmaster. Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813753#comment-13813753 ] Liu Shaohui commented on HBASE-9892: {quote} Could be tricky setting this value then triggering watches. Will have to reset them. {quote} No need to reset them. RegionServerTracker only get data from zk once. {quote} It could be added to the server JMX bean but you'd have to do rmi to find it which requires a port (IIRC) There is the RS heartbeat. Currently we send load. Seems a bit silly sending over constant attributes on each heartbeat but might be easy to do. {quote} I think the info port of a rs will not change after it starts up, so no need to send over constant attributes on each heartbeat. Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding
He Liangliang created HBASE-9893: Summary: Incorrect assert condition in OrderedBytes decoding Key: HBASE-9893 URL: https://issues.apache.org/jira/browse/HBASE-9893 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0 Reporter: He Liangliang Assignee: He Liangliang Priority: Minor The following assert condition is incorrect when decoding blob var byte array. assert t == 0 : Unexpected bits remaining after decoding blob.; When the number of bytes to decode is multiples of 8 (i.e the original number of bytes is multiples of 7), this assert may fail. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding
[ https://issues.apache.org/jira/browse/HBASE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Liangliang updated HBASE-9893: - Description: The following assert condition is incorrect when decoding blob var byte array. code assert t == 0 : Unexpected bits remaining after decoding blob.; /code When the number of bytes to decode is multiples of 8 (i.e the original number of bytes is multiples of 7), this assert may fail. was: The following assert condition is incorrect when decoding blob var byte array. assert t == 0 : Unexpected bits remaining after decoding blob.; When the number of bytes to decode is multiples of 8 (i.e the original number of bytes is multiples of 7), this assert may fail. Incorrect assert condition in OrderedBytes decoding --- Key: HBASE-9893 URL: https://issues.apache.org/jira/browse/HBASE-9893 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0 Reporter: He Liangliang Assignee: He Liangliang Priority: Minor The following assert condition is incorrect when decoding blob var byte array. code assert t == 0 : Unexpected bits remaining after decoding blob.; /code When the number of bytes to decode is multiples of 8 (i.e the original number of bytes is multiples of 7), this assert may fail. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding
[ https://issues.apache.org/jira/browse/HBASE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Liangliang updated HBASE-9893: - Description: The following assert condition is incorrect when decoding blob var byte array. {code} assert t == 0 : Unexpected bits remaining after decoding blob.; {code} When the number of bytes to decode is multiples of 8 (i.e the original number of bytes is multiples of 7), this assert may fail. was: The following assert condition is incorrect when decoding blob var byte array. code assert t == 0 : Unexpected bits remaining after decoding blob.; /code When the number of bytes to decode is multiples of 8 (i.e the original number of bytes is multiples of 7), this assert may fail. Incorrect assert condition in OrderedBytes decoding --- Key: HBASE-9893 URL: https://issues.apache.org/jira/browse/HBASE-9893 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0 Reporter: He Liangliang Assignee: He Liangliang Priority: Minor The following assert condition is incorrect when decoding blob var byte array. {code} assert t == 0 : Unexpected bits remaining after decoding blob.; {code} When the number of bytes to decode is multiples of 8 (i.e the original number of bytes is multiples of 7), this assert may fail. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9886) Optimize ServerName#compareTo
[ https://issues.apache.org/jira/browse/HBASE-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813754#comment-13813754 ] Hudson commented on HBASE-9886: --- FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/113/]) HBASE-9886 Optimize ServerName#compareTo (nkeywal: rev 1538678) * /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerName.java Optimize ServerName#compareTo - Key: HBASE-9886 URL: https://issues.apache.org/jira/browse/HBASE-9886 Project: HBase Issue Type: Bug Components: Client, regionserver Affects Versions: 0.98.0, 0.96.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Trivial Fix For: 0.98.0, 0.96.1 Attachments: 9886.v1.patch It shows up in the profiling... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9859) Canary Shouldn't go off if the table being read from is disabled
[ https://issues.apache.org/jira/browse/HBASE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813755#comment-13813755 ] Hudson commented on HBASE-9859: --- FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/113/]) HBASE-9859 Canary Shouldn't go off if the table being read from is disabled (eclark: rev 1538843) * /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java Canary Shouldn't go off if the table being read from is disabled Key: HBASE-9859 URL: https://issues.apache.org/jira/browse/HBASE-9859 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.1 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0, 0.96.1 Attachments: HBASE-9859-0.patch, HBASE-9859-1.patch Disabling a table causes the Canary to go off with an error message. We should make it so that doesn't cause an error. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813757#comment-13813757 ] Hudson commented on HBASE-8942: --- FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/113/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538868) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9880) client.TestAsyncProcess.testWithNoClearOnFail broke on 0.96 by HBASE-9867
[ https://issues.apache.org/jira/browse/HBASE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813758#comment-13813758 ] Hudson commented on HBASE-9880: --- FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/113/]) HBASE-9880 client.TestAsyncProcess.testWithNoClearOnFail broke on 0.96 by HBASE-9867 (nkeywal: rev 1538676) * /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java client.TestAsyncProcess.testWithNoClearOnFail broke on 0.96 by HBASE-9867 -- Key: HBASE-9880 URL: https://issues.apache.org/jira/browse/HBASE-9880 Project: HBase Issue Type: Test Reporter: stack Assignee: Nicolas Liochon Attachments: 9880.v1.patch It looks like the backport of HBASE-9867 broke 0.96 build (fine on trunk). This was my patch. Let me fix. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding
[ https://issues.apache.org/jira/browse/HBASE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Liangliang updated HBASE-9893: - Attachment: HBASE-9893.patch Incorrect assert condition in OrderedBytes decoding --- Key: HBASE-9893 URL: https://issues.apache.org/jira/browse/HBASE-9893 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0 Reporter: He Liangliang Assignee: He Liangliang Priority: Minor Attachments: HBASE-9893.patch The following assert condition is incorrect when decoding blob var byte array. {code} assert t == 0 : Unexpected bits remaining after decoding blob.; {code} When the number of bytes to decode is multiples of 8 (i.e the original number of bytes is multiples of 7), this assert may fail. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9867) Save on array copies with a subclass of LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813756#comment-13813756 ] Hudson commented on HBASE-9867: --- FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/113/]) HBASE-9880 client.TestAsyncProcess.testWithNoClearOnFail broke on 0.96 by HBASE-9867 (nkeywal: rev 1538676) * /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java Save on array copies with a subclass of LiteralByteString - Key: HBASE-9867 URL: https://issues.apache.org/jira/browse/HBASE-9867 Project: HBase Issue Type: Improvement Components: Protobufs Affects Versions: 0.96.0 Reporter: stack Assignee: stack Fix For: 0.98.0, 0.96.1 Attachments: 9867.096.txt, 9867.txt, 9867.txt, 9867v2.txt Any time we add a byte array to a protobuf, it'll copy the byte array. I was playing with the client and noticed how a bunch of CPU and copying was being done just to copy basic arrays doing pb construction. I started to look at ByteString and then remembered a class Benoit sent me a while back that I did not understand from his new AsyncHBase. After looking in ByteString it made now sense. So, rather than copy byte arrays everywhere, do a version of a ByteString that instead wraps the array. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding
[ https://issues.apache.org/jira/browse/HBASE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813760#comment-13813760 ] He Liangliang commented on HBASE-9893: -- [~ndimiduk] minor issue, a quick fix. Incorrect assert condition in OrderedBytes decoding --- Key: HBASE-9893 URL: https://issues.apache.org/jira/browse/HBASE-9893 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0 Reporter: He Liangliang Assignee: He Liangliang Priority: Minor Attachments: HBASE-9893.patch The following assert condition is incorrect when decoding blob var byte array. {code} assert t == 0 : Unexpected bits remaining after decoding blob.; {code} When the number of bytes to decode is multiples of 8 (i.e the original number of bytes is multiples of 7), this assert may fail. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8369) MapReduce over snapshot files
[ https://issues.apache.org/jira/browse/HBASE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813808#comment-13813808 ] Hadoop QA commented on HBASE-8369: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612097/hbase-8369_v7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7735//console This message is automatically generated. MapReduce over snapshot files - Key: HBASE-8369 URL: https://issues.apache.org/jira/browse/HBASE-8369 Project: HBase Issue Type: New Feature Components: mapreduce, snapshots Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0 Attachments: HBASE-8369-0.94.patch, HBASE-8369-0.94_v2.patch, HBASE-8369-0.94_v3.patch, HBASE-8369-0.94_v4.patch, HBASE-8369-0.94_v5.patch, HBASE-8369-trunk_v1.patch, HBASE-8369-trunk_v2.patch, HBASE-8369-trunk_v3.patch, hbase-8369_v0.patch, hbase-8369_v5.patch, hbase-8369_v6.patch, hbase-8369_v7.patch The idea is to add an InputFormat, which can run the mapreduce job over snapshot files directly bypassing hbase server layer. The IF is similar in usage to TableInputFormat, taking a Scan object from the user, but instead of running from an online table, it runs from a table snapshot. We do one split per region in the snapshot, and open an HRegion inside the RecordReader. A RegionScanner is used internally for doing the scan without any HRegionServer bits. Users have been asking and searching for ways to run MR jobs by reading directly from hfiles, so this allows new use cases if reading from stale data is ok: - Take snapshots periodically, and run MR jobs only on snapshots. - Export snapshots to remote hdfs cluster, run the MR jobs at that cluster without HBase cluster. - (Future use case) Combine snapshot data with online hbase data: Scan from yesterday's snapshot, but read today's data from online hbase cluster. -- This message was sent by Atlassian JIRA
[jira] [Created] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
Liang Xie created HBASE-9894: Summary: remove the inappropriate assert statement in Store.getSplitPoint() Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.12, 0.94.6 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
[ https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-9894: - Attachment: HBase-9894-0.94.txt remove the inappropriate assert statement in Store.getSplitPoint() -- Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.6, 0.94.12 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Attachments: HBase-9894-0.94.txt One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
[ https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813824#comment-13813824 ] Liang Xie commented on HBASE-9894: -- HBase version: 0.94.6-cdh4.3.0 java -version: java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) export HBASE_OPTS=-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode To me, that assert statement in getSplitPoint() is inappropriate, and in trunk code, it has been removed already, let's just make a one line remove here, is that OK? [~lhofhansl] remove the inappropriate assert statement in Store.getSplitPoint() -- Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.6, 0.94.12 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Attachments: HBase-9894-0.94.txt One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
[ https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-9894: - Status: Patch Available (was: Open) remove the inappropriate assert statement in Store.getSplitPoint() -- Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.12, 0.94.6 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Attachments: HBase-9894-0.94.txt One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813843#comment-13813843 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in hbase-0.96 #180 (See [https://builds.apache.org/job/hbase-0.96/180/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538868) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9859) Canary Shouldn't go off if the table being read from is disabled
[ https://issues.apache.org/jira/browse/HBASE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813842#comment-13813842 ] Hudson commented on HBASE-9859: --- SUCCESS: Integrated in hbase-0.96 #180 (See [https://builds.apache.org/job/hbase-0.96/180/]) HBASE-9859 Canary Shouldn't go off if the table being read from is disabled (eclark: rev 1538843) * /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java Canary Shouldn't go off if the table being read from is disabled Key: HBASE-9859 URL: https://issues.apache.org/jira/browse/HBASE-9859 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.1 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0, 0.96.1 Attachments: HBASE-9859-0.patch, HBASE-9859-1.patch Disabling a table causes the Canary to go off with an error message. We should make it so that doesn't cause an error. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9859) Canary Shouldn't go off if the table being read from is disabled
[ https://issues.apache.org/jira/browse/HBASE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813851#comment-13813851 ] Hudson commented on HBASE-9859: --- SUCCESS: Integrated in HBase-TRUNK #4668 (See [https://builds.apache.org/job/HBase-TRUNK/4668/]) HBASE-9859 Canary Shouldn't go off if the table being read from is disabled (eclark: rev 1538842) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java Canary Shouldn't go off if the table being read from is disabled Key: HBASE-9859 URL: https://issues.apache.org/jira/browse/HBASE-9859 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.1 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0, 0.96.1 Attachments: HBASE-9859-0.patch, HBASE-9859-1.patch Disabling a table causes the Canary to go off with an error message. We should make it so that doesn't cause an error. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813852#comment-13813852 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in HBase-TRUNK #4668 (See [https://builds.apache.org/job/HBase-TRUNK/4668/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538867) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813853#comment-13813853 ] Hadoop QA commented on HBASE-9890: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612094/HBASE-9890-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7736//console This message is automatically generated. MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at
[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
[ https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813858#comment-13813858 ] Hadoop QA commented on HBASE-9894: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612145/HBase-9894-0.94.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7738//console This message is automatically generated. remove the inappropriate assert statement in Store.getSplitPoint() -- Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.6, 0.94.12 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Attachments: HBase-9894-0.94.txt One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9889) Make sure we clean up scannerReadPoints upon any exceptions
[ https://issues.apache.org/jira/browse/HBASE-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813868#comment-13813868 ] Hadoop QA commented on HBASE-9889: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612003/hbase-9889.diff against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.access.TestNamespaceCommands {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7737//console This message is automatically generated. Make sure we clean up scannerReadPoints upon any exceptions --- Key: HBASE-9889 URL: https://issues.apache.org/jira/browse/HBASE-9889 Project: HBase Issue Type: Sub-task Affects Versions: 0.89-fb, 0.94.12, 0.96.0 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.96.1 Attachments: hbase-9889.diff If there is an exception in the creation of RegionScanner (for example, exception while opening store files) the scanner Read points is not cleaned up. Having an unused old entry in the scannerReadPoints means that flushes and compactions cannot garbage-collect older versions. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
[ https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813985#comment-13813985 ] stack commented on HBASE-9894: -- The assert looks a little silly. Without it, we return null and just do not split the region? remove the inappropriate assert statement in Store.getSplitPoint() -- Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.6, 0.94.12 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Attachments: HBase-9894-0.94.txt One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813991#comment-13813991 ] stack commented on HBASE-9892: -- Patch looks fine. Did you intend to include this in the patch? Index: src/main/java/org/apache/hadoop/hbase/regionserver/RSDumpServlet.java ... and this? Index: src/main/java/org/apache/hadoop/hbase/regionserver/RSStatusServlet.java Should this be public? Can it be package protected? getRegionServerInfoPort Can we write the znode content as protobuf? Will make it easier adding new attributes and in trunk, all znodes are pb: +String nodePath = ZKUtil.joinZNode(watcher.rsZNode, n); +infoPort = Bytes.toInt(ZKUtil.getData(watcher, nodePath)); If you need help, I can help do the trunk patch np. Good stuff. Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
[ https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814011#comment-13814011 ] Lars Hofhansl commented on HBASE-9894: -- Nobody should run in production with asserts enabled. remove the inappropriate assert statement in Store.getSplitPoint() -- Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.6, 0.94.12 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Attachments: HBase-9894-0.94.txt One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9866) Support the mode where REST server authorizes proxy users
[ https://issues.apache.org/jira/browse/HBASE-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814046#comment-13814046 ] Francis Liu commented on HBASE-9866: This will make auditing a bit hard since the real user is lost when it hits the RS. Can we log a doAs message so we can trace back? Given that we're adding doAs support in reset. It's prolly a good idea to provide a way to refresh the ProxyUsers config without restarting the server. BTW do the other webservices support doAs (hdfs's proxy, webhcat, etc)? Support the mode where REST server authorizes proxy users - Key: HBASE-9866 URL: https://issues.apache.org/jira/browse/HBASE-9866 Project: HBase Issue Type: Improvement Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.1 Attachments: 9866-1.txt In one use case, someone was trying to authorize with the REST server as a proxy user. That mode is not supported today. The curl request would be something like (assuming SPNEGO auth) - {noformat} curl -i --negotiate -u : http://HOST:PORT/version/cluster?doas=USER {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9866) Support the mode where REST server authorizes proxy users
[ https://issues.apache.org/jira/browse/HBASE-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814050#comment-13814050 ] Jimmy Xiang commented on HBASE-9866: In case the REST server shares the same configuration with rs/master, can we have a config and turn this feature off by default? Support the mode where REST server authorizes proxy users - Key: HBASE-9866 URL: https://issues.apache.org/jira/browse/HBASE-9866 Project: HBase Issue Type: Improvement Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.1 Attachments: 9866-1.txt In one use case, someone was trying to authorize with the REST server as a proxy user. That mode is not supported today. The curl request would be something like (assuming SPNEGO auth) - {noformat} curl -i --negotiate -u : http://HOST:PORT/version/cluster?doas=USER {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9818: -- Attachment: (was: 9818-v1.txt) NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Attachments: 9818-v2.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9818: -- Attachment: 9818-v2.txt NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Attachments: 9818-v2.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814071#comment-13814071 ] Ted Yu commented on HBASE-9818: --- I am looping TestHRegion and TestAtomicOperation 200 times, respectively. Previously TestAtomicOperation failed at iteration #7. Now the tests reach iteration #17 and are running. NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Attachments: 9818-v2.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} -- This
[jira] [Assigned] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-9818: - Assignee: Ted Yu NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814101#comment-13814101 ] Nick Dimiduk commented on HBASE-9890: - It's also possible to go the other way, ie secured HBase but not secured HDFS: HBASE-9482. MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814111#comment-13814111 ] Sergey Shelukhin commented on HBASE-9818: - From logs and code it seems like compactions are closing the stream via wrapper (via long chain of classes) Beforehand, they are supposed to notify all scanners to rebuild heap, which is a synchronized method (on StoreScanner) and only then close (next/etc are also synchronized so it should all be properly sequenced). But somehow it's not happening I suspect. Also in some cases stacks are during some initialization, not next(), so I haven't looked/not sure how that was supposed to be synched. I didn't have a lot of time, so just looked at the code. Looped testWritesWhileGetting 100 times and it never failed locally, which is sad. Is there any chance/tool to find out when approx these failures started NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814118#comment-13814118 ] Sergey Shelukhin commented on HBASE-9818: - But yeah maybe wrapper method needs to be encapsulated with getting stream together. Maybe there's no real sync issue, and it just needs to not throw, and then on next read it will rebuild the heap. Although it does seem pretty suspect, it has a stream and someone closes it in parallel. NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814126#comment-13814126 ] Hadoop QA commented on HBASE-9818: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612207/9818-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestHLog {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7739//console This message is automatically generated. NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
[jira] [Created] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94
Jeffrey Zhong created HBASE-9895: Summary: 0.96 Import utility can't import an exported file from 0.94 Key: HBASE-9895 URL: https://issues.apache.org/jira/browse/HBASE-9895 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.96.0 Reporter: Jeffrey Zhong Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster cannot import 0.94 exported files. This issue is annoying because a user can't import his old archive files after upgrade or archives from others who are using 0.94. The ideal way is to catch deserialization error and then fall back to 0.94 format for importing. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814142#comment-13814142 ] Ted Yu commented on HBASE-9818: --- From https://builds.apache.org/job/PreCommit-HBASE-Build/7739/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestHLog/testAppendClose/ : {code} Stacktrace java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hdfs.util.LightWeightGSet.init(LightWeightGSet.java:81) at org.apache.hadoop.hdfs.server.namenode.BlocksMap.init(BlocksMap.java:320) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:223) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:569) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:278) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSClusterForTestHLog(HBaseTestingUtility.java:563) at org.apache.hadoop.hbase.regionserver.wal.TestHLog.testAppendClose(TestHLog.java:434) {code} Looks like an environment issue. 'stream closed' exception, added in patch, didn't show up. NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer:
[jira] [Commented] (HBASE-9866) Support the mode where REST server authorizes proxy users
[ https://issues.apache.org/jira/browse/HBASE-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814149#comment-13814149 ] Devaraj Das commented on HBASE-9866: [~toffer], yes, services like webhcat supports doAs, but they have different config knobs for configuring the groups/ip-addresses. Maybe they map these configurations to the underlying Hadoop configurations internally. [~jxiang], okay will add a configuration for turning this feature on/off... Support the mode where REST server authorizes proxy users - Key: HBASE-9866 URL: https://issues.apache.org/jira/browse/HBASE-9866 Project: HBase Issue Type: Improvement Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.1 Attachments: 9866-1.txt In one use case, someone was trying to authorize with the REST server as a proxy user. That mode is not supported today. The curl request would be something like (assuming SPNEGO auth) - {noformat} curl -i --negotiate -u : http://HOST:PORT/version/cluster?doas=USER {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-8018) Add Flaky Testcase Detector tool into dev-tools
[ https://issues.apache.org/jira/browse/HBASE-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-8018: - Description: jenkins-tools = A tool which pulls test case results from Jenkins server. It displays a union of failed test cases from the last 15(by default and actual number of jobs can be less depending on availablity) runs recorded in Jenkins sever and track how each of them are performed for all the last 15 runs(passed, not run or failed) *Pre-requirement(run under folder ./dev-support/jenkins-tools)* Please download jenkins-client from https://github.com/cosmin/jenkins-client 1) git clone git://github.com/cosmin/jenkins-client.git 2) make sure the dependency jenkins-client version in ./buildstats/pom.xml matches the downloaded jenkins-client(current value is 0.1.6-SNAPSHOT) Build command(run under folder jenkins-tools): {code} mvn clean package {code} Usage are: {code} java -jar ./buildstats/target/buildstats.jar Jenkins HTTP URL Job Name [number of last most recent jobs to check] {code} Sample commands are: {code} java -jar ./buildstats/target/buildstats.jar https://builds.apache.org HBase-TRUNK {code} Sample output(where 1 means PASSED, 0 means NOT RUN AT ALL, -1 means FAILED): Failed Test Cases Stats4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 org.apache.hadoop.hbase.backup.testhfilearchiving.testcleaningrace11 111111 -10 org.apache.hadoop.hbase.migration.testnamespaceupgrade.testrenameusingsnapshots 111 -1011111 Skipped Test Cases Stats === 4360 skipped(Or don't have) following test suites === org.apache.hadoop.hbase.replication.testreplicationkillmasterrscompressed org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles org.apache.hadoop.hbase.mapreduce.testmapreduceexamples === 4361 skipped(Or don't have) following test suites === org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles org.apache.hadoop.hbase.mapreduce.testmapreduceexamples === 4362 skipped(Or don't have) following test suites === org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles org.apache.hadoop.hbase.mapreduce.testmapreduceexamples === 4363 skipped(Or don't have) following test suites === org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles org.apache.hadoop.hbase.mapreduce.testmapreduceexamples === 4368 skipped(Or don't have) following test suites === org.apache.hadoop.hbase.client.testadmin org.apache.hadoop.hbase.client.testclonesnapshotfromclient org.apache.hadoop.hbase.mapreduce.testmapreduceexamples was: jenkins-tools = A tool which pulls test case results from Jenkins server. It displays a union of failed test cases from the last 15(by default and actual number of jobs can be less depending on availablity) runs recorded in Jenkins sever and track how each of them are performed for all the last 15 runs(passed, not run or failed) *Pre-requirement(run under folder jenkins-tools)* Please download jenkins-client from https://github.com/cosmin/jenkins-client 1) git clone git://github.com/cosmin/jenkins-client.git 2) make sure the dependency jenkins-client version in ./buildstats/pom.xml matches the downloaded jenkins-client(current value is 0.1.6-SNAPSHOT) Build command(run under folder jenkins-tools): {code} mvn clean package {code} Usage are: {code} java -jar ./buildstats/target/buildstats.jar Jenkins HTTP URL Job Name [number of last most recent jobs to check] {code} Sample commands are: {code} java -jar ./buildstats/target/buildstats.jar https://builds.apache.org HBase-TRUNK {code} Sample output(where 1 means PASSED, 0 means NOT RUN AT ALL, -1 means FAILED): Failed Test Cases Stats4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 org.apache.hadoop.hbase.backup.testhfilearchiving.testcleaningrace11 111111 -10 org.apache.hadoop.hbase.migration.testnamespaceupgrade.testrenameusingsnapshots 111 -1011111 Skipped Test Cases Stats === 4360 skipped(Or don't have) following test suites === org.apache.hadoop.hbase.replication.testreplicationkillmasterrscompressed org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
[jira] [Created] (HBASE-9896) Add an option to have strict number of mapper per job in HBase Streaming
Rishit Shroff created HBASE-9896: Summary: Add an option to have strict number of mapper per job in HBase Streaming Key: HBASE-9896 URL: https://issues.apache.org/jira/browse/HBASE-9896 Project: HBase Issue Type: New Feature Components: mapreduce Affects Versions: 0.89-fb Reporter: Rishit Shroff Assignee: Rishit Shroff Priority: Minor Fix For: 0.89-fb Currently there is only one configuration knob available for controlling the number of mappers per job in HBase Streaming. The current option is number of mappers per region. This options tries to maintain the locality for the mapper and the region servers. However, in certain certain scenarios where the table has a higher number of regions, the number of mapper/region can lead to explosion of the number of mappers. Hence, we need one more option to control the strict number of mappers per job. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814188#comment-13814188 ] Gary Helmling commented on HBASE-9890: -- I've looked through the secure bulk load code in a little more detail, but I still can't say I understand why use of SecureBulkLoadClient in LoadIncrementalHFiles is conditioned on isHBaseSecurityEnabled(), instead of isHadoopSecurityEnabled(). It seems like they should be conditioned on isHadoopSecurityEnabled() instead, since this is all in place to pass through an HDFS delegation token for moving the HFiles on secure Hadoop. [~mbertozzi] Makes sense to me to change the LoadIncrementatlHFiles conditions here as well, assuming that doesn't cascade into broken tests. But I'm also okay with pushing that part into a separate JIRA, since it's somewhat independent of the original issue. The rest of the patch looks good to me. [~toffer] Any insights into why SecureBulkLoadClient usage is conditioned on HBase security being enabled instead of HDFS security? MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814207#comment-13814207 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in HBase-0.94-security #329 (See [https://builds.apache.org/job/HBase-0.94-security/329/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers; REVERT (stack: rev 1538869) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814213#comment-13814213 ] Nick Dimiduk commented on HBASE-9890: - bq. use of SecureBulkLoadClient in LoadIncrementalHFiles is conditioned on isHBaseSecurityEnabled(), instead of isHadoopSecurityEnabled(). I think this is a question of practicality -- LoadIncrementalHFiles can only use the SecureBulkLoadClient when the appropriate coprocessor is available on the RS. It's only available when HBase security is enabled. MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814230#comment-13814230 ] Hudson commented on HBASE-8942: --- FAILURE: Integrated in HBase-0.94 #1195 (See [https://builds.apache.org/job/HBase-0.94/1195/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers; REVERT (stack: rev 1538869) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814247#comment-13814247 ] Dave Latham commented on HBASE-9865: Looks good to me. Thanks, Lars. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
[ https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9863: -- Fix Version/s: 0.98.0 Hadoop Flags: Reviewed Integrated to trunk. Thanks for the reviews. Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs --- Key: HBASE-9863 URL: https://issues.apache.org/jira/browse/HBASE-9863 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt, 9863-v4.txt, 9863-v5.txt, 9863-v6.txt TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes hung. Here were two recent occurrences: https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console There were 9 occurrences of the following in both stack traces: {code} FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 waiting for monitor entry [0x6fdf8000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250) - waiting to lock 0x7f69b5f0 (a org.apache.hadoop.hbase.master.TableNamespaceManager) at org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146) at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) {code} The test hung here: {code} pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() [0x74efe000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436) - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931) at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598) at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116) - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94) - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124) at org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594) at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485) at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9818: -- Attachment: 9818-v3.txt Patch v3 allows TestHRegion and TestAtomicOperation to reach iteration #46. Please comment. NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt, 9818-v3.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814278#comment-13814278 ] churro morales commented on HBASE-9865: --- One thing i noticed in WALEdit, we should be accounting for the ArrayList object as well instead of: {code} public long heapSize() { long ret = 0; {code} this would be correct, although it doesn't matter very much. {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; {code} If you didn't want to bleed the ArrayList implementation that WALEdit uses maybe something like this might work: For WALEdit {code} public void removeIf(PredicateKeyValue predicate) { for (int i = kvs.size()-1; i = 0; i--) { KeyValue kv = kvs.get(i); if (predicate.apply(kv)) { kvs.remove(i); } } if (kvs.size() size()/2) { kvs.trimToSize(); } } {code} And ReplicationSource would change to: {code} protected void removeNonReplicableEdits(WALEdit edit) { final NavigableMapbyte[], Integer scopes = edit.getScopes(); edit.removeIf(new PredicateKeyValue() { @Override public boolean apply(KeyValue keyValue) { return scopes == null || !scopes.containsKey(keyValue.getFamily()); } }); } {code} I don't think it adds much by doing this but it is an alternative if we don't want to bleed that the WALEdit uses an ArrayList. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) {
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814285#comment-13814285 ] Enis Soztutar commented on HBASE-9892: -- Great. Left some comments at RB. Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9836) Intermittent TestRegionObserverScannerOpenHook#testRegionObserverCompactionTimeStacking failure
[ https://issues.apache.org/jira/browse/HBASE-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9836: -- Resolution: Fixed Status: Resolved (was: Patch Available) Intermittent TestRegionObserverScannerOpenHook#testRegionObserverCompactionTimeStacking failure --- Key: HBASE-9836 URL: https://issues.apache.org/jira/browse/HBASE-9836 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.96.1 Attachments: 9836-v1.txt, 9836-v3.txt, 9836-v4.txt, 9836-v5.txt, 9836-v6.txt Here were two recent examples: https://builds.apache.org/job/hbase-0.96-hadoop2/99/testReport/org.apache.hadoop.hbase.coprocessor/TestRegionObserverScannerOpenHook/testRegionObserverCompactionTimeStacking/ https://builds.apache.org/job/PreCommit-HBASE-Build/7616/testReport/junit/org.apache.hadoop.hbase.coprocessor/TestRegionObserverScannerOpenHook/testRegionObserverCompactionTimeStacking/ From the second: {code} 2013-10-24 18:08:10,080 INFO [Priority.RpcServer.handler=1,port=58174] regionserver.HRegionServer(3672): Flushing testRegionObserverCompactionTimeStacking,,1382638088230.e96920e43ea374ba1bd559df115870cf. ... 2013-10-24 18:08:10,544 INFO [Priority.RpcServer.handler=1,port=58174] regionserver.HRegion(1645): Finished memstore flush of ~128.0/128, currentsize=0.0/0 for region testRegionObserverCompactionTimeStacking,,1382638088230.e96920e43ea374ba1bd559df115870cf. in 464ms, sequenceid=5, compaction requested=true 2013-10-24 18:08:10,546 DEBUG [Priority.RpcServer.handler=1,port=58174] regionserver.CompactSplitThread(319): Small Compaction requested: system; Because: Compaction through user triggered flush; compaction_queue=(0:0), split_queue=0, merge_queue=0 2013-10-24 18:08:10,547 DEBUG [RS:0;asf002:58174-smallCompactions-1382638090545] compactions.RatioBasedCompactionPolicy(92): Selecting compaction from 2 store files, 0 compacting, 2 eligible, 10 blocking 2013-10-24 18:08:10,547 DEBUG [pool-1-thread-1] catalog.CatalogTracker(209): Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4be179 2013-10-24 18:08:10,549 DEBUG [RS:0;asf002:58174-smallCompactions-1382638090545] compactions.ExploringCompactionPolicy(112): Exploring compaction algorithm has selected 2 files of size 1999 starting at candidate #0 after considering 1 permutations with 1 in ratio 2013-10-24 18:08:10,551 DEBUG [RS:0;asf002:58174-smallCompactions-1382638090545] regionserver.HStore(1329): e96920e43ea374ba1bd559df115870cf - A: Initiating major compaction 2013-10-24 18:08:10,551 INFO [RS:0;asf002:58174-smallCompactions-1382638090545] regionserver.HRegion(1294): Starting compaction on A in region testRegionObserverCompactionTimeStacking,,1382638088230.e96920e43ea374ba1bd559df115870cf. 2013-10-24 18:08:10,551 INFO [RS:0;asf002:58174-smallCompactions-1382638090545] regionserver.HStore(982): Starting compaction of 2 file(s) in A of testRegionObserverCompactionTimeStacking,,1382638088230.e96920e43ea374ba1bd559df115870cf. into tmpdir=hdfs://localhost:49506/user/jenkins/hbase/data/default/testRegionObserverCompactionTimeStacking/e96920e43ea374ba1bd559df115870cf/.tmp, totalSize=2.0k 2013-10-24 18:08:10,552 DEBUG [RS:0;asf002:58174-smallCompactions-1382638090545] compactions.Compactor(168): Compacting hdfs://localhost:49506/user/jenkins/hbase/data/default/testRegionObserverCompactionTimeStacking/e96920e43ea374ba1bd559df115870cf/A/44f87b94732149c08f20bdba00dd7140, keycount=1, bloomtype=ROW, size=992.0, encoding=NONE, seqNum=3, earliestPutTs=1382638089528 2013-10-24 18:08:10,552 DEBUG [RS:0;asf002:58174-smallCompactions-1382638090545] compactions.Compactor(168): Compacting hdfs://localhost:49506/user/jenkins/hbase/data/default/testRegionObserverCompactionTimeStacking/e96920e43ea374ba1bd559df115870cf/A/0b2e580cbda246718bbf64c21e81cd18, keycount=1, bloomtype=ROW, size=1007.0, encoding=NONE, seqNum=5, earliestPutTs=1382638090053 2013-10-24 18:08:10,564 DEBUG [RS:0;asf002:58174-smallCompactions-1382638090545] util.FSUtils(305): DFS Client does not support most favored nodes create; using default create ... Potentially hanging thread: RS:0;asf002:58174-smallCompactions-1382638090545 java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.ipc.Client.call(Client.java:1099) org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) $Proxy9.complete(Unknown Source) sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814302#comment-13814302 ] Lars Hofhansl commented on HBASE-9865: -- Thanks Churro (and Dave). While we're add it, might as well fix WALEdit.heapSize(). The other change does not help with readability I think. It's not so bad to leak this out of WALEdit, if anything it declares that this is a random access list. I'll make a 0.94 patch as well. Any chance you would try it on a real cluster? WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I
[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814309#comment-13814309 ] Gary Helmling commented on HBASE-9890: -- bq. I think this is a question of practicality – LoadIncrementalHFiles can only use the SecureBulkLoadClient when the appropriate coprocessor is available on the RS. It's only available when HBase security is enabled. Whether {{hbase.security.authentication == kerberos}} has nothing to do with whether SecureBulkLoadEndpoint is loaded on a table's regions. The coprocessor needs to be configured independently (via hbase.coprocessor.region.classes, hbase.coprocessor.user.region.classes, or directly on the table). It does also assume that the AccessController coprocessor is enabled, but that again can be independent of authentication. I may be missing something, but it seems like the main use of SecureBulkLoadEndpoint is to move the bulk load HFiles to a staging directory, proxying to HDFS as the end user. Even the AccessController checks (which should only happen if AccessController is enabled), can be done independently of whether HBase requires kerberos authentication (you can do access control without kerberos auth). So the secure bulk loading seems to me to only be required when HDFS secure auth is enabled, and should be usable in that case regardless of the value of hbase.security.authentication. There is a bigger issue here, in that we are amassing a pile of security configurations that are all exposed (and must be put together) by end users. But I think that is solvable by providing a simpler end user configuration, while still retaining the correct granularity of configuration checks within the code itself. HBASE-4817 is a long standing issue to simplify the end user configuration. MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814312#comment-13814312 ] churro morales commented on HBASE-9865: --- Hi Lars, I'm sure at the very least we will be able to apply it to a few nodes in our cluster and monitor the how this patch affects garbage collection. Upon gathering results, I will be sure to share. Cheers WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these
[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9818: -- Attachment: (was: 9818-v3.txt) NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9865: - Attachment: 9865-trunk-v4.txt Aaaand. Trunk. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9818: -- Attachment: 9818-v3.txt NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt, 9818-v3.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9865: - Attachment: 9865-0.94-v4.txt Updated 0.94 patch. WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
[ https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9865: - Fix Version/s: 0.94.14 0.96.1 0.98.0 WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.94.5, 0.95.0 Reporter: churro morales Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 9865-trunk.txt WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9857) Blockcache prefetch for HFile V3
[ https://issues.apache.org/jira/browse/HBASE-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814352#comment-13814352 ] Andrew Purtell commented on HBASE-9857: --- Thanks for looking at the patch [~ndimiduk]. bq. don't see why it's limited to HFileV3. Can it be made a general feature I put the preload logic into the v3 reader because v3 is 'experimental'. Could trivially go into the v2 reader instead. bq. I think it could be smart about loading the blocks, load either sequentially or over a random distribution until the cache is full Files to be preloaded are queued and scheduled to be handled by a small threadpool. When a thread picks up work for a file, the blocks are loaded sequentially using a non-pread scanner from offset 0 to the end of the index. By random did you mean randomly select work from the file queue? bq. The until full part seems tricky as eviction detection isn't very straight-forward Right. If we had it, I could make use of it. Blockcache prefetch for HFile V3 Key: HBASE-9857 URL: https://issues.apache.org/jira/browse/HBASE-9857 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Priority: Minor Attachments: 9857.patch Attached patch implements a prefetching function for HFile (v3) blocks, if indicated by a column family or regionserver property. The purpose of this change is to as rapidly after region open as reasonable warm the blockcache with all the data and index blocks of (presumably also in-memory) table data, without counting those block loads as cache misses. Great for fast reads and keeping the cache hit ratio high. Can tune the IO impact versus time until all data blocks are in cache. Works a bit like CompactSplitThread. Makes some effort not to stampede. I have been using this for setting up various experiments and thought I'd polish it up a bit and throw it out there. If the data to be preloaded will not fit in blockcache, or if as a percentage of blockcache it is large, this is not a good idea, will just blow out the cache and trigger a lot of useless GC activity. Might be useful as an expert tuning option though. Or not. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9897) Clean up some security configuration checks in LoadIncrementalHFiles
Gary Helmling created HBASE-9897: Summary: Clean up some security configuration checks in LoadIncrementalHFiles Key: HBASE-9897 URL: https://issues.apache.org/jira/browse/HBASE-9897 Project: HBase Issue Type: Task Components: security Reporter: Gary Helmling In LoadIncrementalHFiles, use of SecureBulkLoadClient is conditioned on UserProvider.isHBaseSecurityEnabled() in a couple of places. However, use of secure bulk loading seems to be required more by use of HDFS secure authentication, instead of HBase secure authentication. It should be possible to use secure bulk loading, as long as SecureBulkLoadEndpoint is loaded, and HDFS secure authentication is enabled, regardless of the HBase authentication configuration. In addition, SecureBulkLoadEndpoint does a direct check on permissions by referencing AccessController loaded on the same region, i.e.: {code} getAccessController().prePrepareBulkLoad(env); {code} It seems like this will throw an NPE if AccessController is not configured. We need an additional null check to handle this case gracefully. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814363#comment-13814363 ] Gary Helmling commented on HBASE-9890: -- [~mbertozzi] I created HBASE-9897 to handle any additional LoadIncrementalHFiles changes separately. It seemed to be expanding the scope of this issue. MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814378#comment-13814378 ] Gary Helmling commented on HBASE-9890: -- [~mbertozzi] somehow I missed your earlier comment that you would handle any secure bulk loading changes separately. Feel free to close my issue as a dupe if you've already opened one. +1 on the v1 patch. MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814416#comment-13814416 ] Sergey Shelukhin commented on HBASE-9818: - is retrying intentional? probably we should find root cause and not just retry. NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt, 9818-v3.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA
[jira] [Commented] (HBASE-9870) HFileDataBlockEncoderImpl#diskToCacheFormat uses wrong format
[ https://issues.apache.org/jira/browse/HBASE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814420#comment-13814420 ] Jimmy Xiang commented on HBASE-9870: In BlockCacheKey, we do have the encoding format. However, the equal method doesn't check the encoding format, which may be interesting. HFileDataBlockEncoderImpl#diskToCacheFormat uses wrong format - Key: HBASE-9870 URL: https://issues.apache.org/jira/browse/HBASE-9870 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang In this method, we have {code} if (block.getBlockType() == BlockType.ENCODED_DATA) { if (block.getDataBlockEncodingId() == onDisk.getId()) { // The block is already in the desired in-cache encoding. return block; } {code} This assumes onDisk encoding is the same as that of inCache. This is not true when we change the encoding of a CF. This could be one of the reasons I got data loss with online encoding change? If I make sure onDisk == inCache all the time, my ITBLL with online encoding change worked once for me. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814417#comment-13814417 ] Sergey Shelukhin commented on HBASE-9818: - logging lgtm. What does it say when it fails NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt, 9818-v3.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814428#comment-13814428 ] Sergey Shelukhin commented on HBASE-9818: - actually, I looked at it, encapsulating is not such a good idea (returning boolean and stream together), it will not remove the race with close unless a common lock is added. So root cause might be elsewhere... NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt, 9818-v3.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at
[jira] [Created] (HBASE-9898) Have a way to set a different default compression on HCD
Jean-Daniel Cryans created HBASE-9898: - Summary: Have a way to set a different default compression on HCD Key: HBASE-9898 URL: https://issues.apache.org/jira/browse/HBASE-9898 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Fix For: 0.98.0 I was exploring if there would be a nice way to set the compression by default to a different algorithm but I didn't find any that I can implement right now, dumping my ideas so that others can chime in. I think the best place to take it into account would be on the master's side. Basically you run a check when creating a new table to see if compression wasn't set, and if so then set it to the new default. The important thing is you don't want to replace NONE, because that might be the user's goal to set it like that. The main problem is that the normal HCD constructor calls the deprecated constructor that sets most of the properties to their defaults, including compression, which means that it will always be NONE instead of null. It appears that this constructor has been deprecated since February 2012 (https://github.com/apache/hbase/blame/0.94/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java#L292) so maybe we can remove it in the next major version and make our life easier? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
Sergey Shelukhin created HBASE-9899: --- Summary: for idempotent operation dups, return the result instead of throwing conflict exception Key: HBASE-9899 URL: https://issues.apache.org/jira/browse/HBASE-9899 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin After HBASE-3787, we could store mvcc in operation context, and use it to convert the modification request into read on dups instead of throwing OperationConflictException. MVCC tracking will have to be aware of such MVCC numbers present. Given that scanners are usually relatively short-lived, that would prevent low watermark from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814458#comment-13814458 ] Sergey Shelukhin commented on HBASE-3787: - btw, the patch is ready to review. I filed HBASE-9899 for follow-up work Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.4, 0.95.2 Reporter: dhruba borthakur Assignee: Sergey Shelukhin Priority: Blocker Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, HBASE-3787-v1.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch, HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch, HBASE-3787-v6.patch, HBASE-3787-v7.patch, HBASE-3787-v8.patch The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
[ https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814493#comment-13814493 ] Hudson commented on HBASE-9863: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #827 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/827/]) HBASE-9863 Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs (tedyu: rev 1539129) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableNamespaceManager.java Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs --- Key: HBASE-9863 URL: https://issues.apache.org/jira/browse/HBASE-9863 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt, 9863-v4.txt, 9863-v5.txt, 9863-v6.txt TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes hung. Here were two recent occurrences: https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console There were 9 occurrences of the following in both stack traces: {code} FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 waiting for monitor entry [0x6fdf8000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250) - waiting to lock 0x7f69b5f0 (a org.apache.hadoop.hbase.master.TableNamespaceManager) at org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146) at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) {code} The test hung here: {code} pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() [0x74efe000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436) - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931) at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598) at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116) - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94) - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124) at org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594) at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485) at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
[ https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814505#comment-13814505 ] Liang Xie commented on HBASE-9894: -- [~saint@gmail.com], yes [~lhofhansl], totally agreed with you, i also doesn't understand why he enabled it:) anyway, let's remove it, it should be encounterd even in a debug/dev env with -ea possible right? we shouldn't abort the whole server instance in this case. remove the inappropriate assert statement in Store.getSplitPoint() -- Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.6, 0.94.12 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Attachments: HBase-9894-0.94.txt One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-8541) implement flush-into-stripes in stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8541: Resolution: Fixed Status: Resolved (was: Patch Available) in trunk implement flush-into-stripes in stripe compactions -- Key: HBASE-8541 URL: https://issues.apache.org/jira/browse/HBASE-8541 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8541-latest-with-dependencies.patch, HBASE-8541-latest-with-dependencies.patch, HBASE-8541-latest-with-dependencies.patch, HBASE-8541-latest-with-dependencies.patch, HBASE-8541-v0.patch, HBASE-8541-v1.patch, HBASE-8541-v2.patch, HBASE-8541-v3.patch, HBASE-8541-v4.patch, HBASE-8541-v5.patch Flush will be able to flush into multiple files under this design, avoiding L0 I/O amplification. I have the patch which is missing just one feature - support for concurrent flushes and stripe changes. This can be done via extensive try-locking of stripe changes and flushes, or advisory flags without blocking flushes, dumping conflicting flushes into L0 in case of (very rare) collisions. For file loading for the latter, a set-cover-like problem needs to be solved to determine optimal stripes. That will also address Jimmy's concern of getting rid of metadata, btw. However currently I don't have time for that. I plan to attach the try-locking patch first, but this won't happen for a couple weeks probably and should not block main reviews. Hopefully this will be added on top of main reviews. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
[ https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9894: - Fix Version/s: 0.94.13 remove the inappropriate assert statement in Store.getSplitPoint() -- Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.6, 0.94.12 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Fix For: 0.94.13 Attachments: HBase-9894-0.94.txt One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-7967) implement compactor for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HBASE-7967. - Resolution: Fixed addendum was in long ago implement compactor for stripe compactions -- Key: HBASE-7967 URL: https://issues.apache.org/jira/browse/HBASE-7967 Project: HBase Issue Type: Sub-task Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.98.0 Attachments: HBASE-7967-javadoc-addendum.patch, HBASE-7967-latest-with-dependencies.patch, HBASE-7967-latest-with-dependencies.patch, HBASE-7967-latest-with-dependencies.patch, HBASE-7967-latest-with-dependencies.patch, HBASE-7967-latest-with-dependencies.patch, HBASE-7967-latest-with-dependencies.patch, HBASE-7967-latest-with-dependencies.patch, HBASE-7967-v0.patch, HBASE-7967-v1.patch, HBASE-7967-v10.patch, HBASE-7967-v11.patch, HBASE-7967-v2.patch, HBASE-7967-v3.patch, HBASE-7967-v4.patch, HBASE-7967-v5.patch, HBASE-7967-v6.patch, HBASE-7967-v7.patch, HBASE-7967-v7.patch, HBASE-7967-v7.patch, HBASE-7967-v8.patch, HBASE-7967-v9.patch Compactor needs to be implemented. See details in parent and blocking jira. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Work started] (HBASE-9854) initial documentation for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-9854 started by Sergey Shelukhin. initial documentation for stripe compactions Key: HBASE-9854 URL: https://issues.apache.org/jira/browse/HBASE-9854 Project: HBase Issue Type: Sub-task Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Initial documentation for stripe compactions (distill from attached docs, make up to date, put somewhere like book) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-9890: --- Attachment: HBASE-9890-94-v1.patch In 94 I have to use the HBASE_AUTH_TOKEN string because AuthenticationTokenIdentifier.AUTH_TOKEN_TYPE requires -Psecurity MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()
[ https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9894: - Fix Version/s: (was: 0.94.13) 0.94.14 remove the inappropriate assert statement in Store.getSplitPoint() -- Key: HBASE-9894 URL: https://issues.apache.org/jira/browse/HBASE-9894 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.6, 0.94.12 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Fix For: 0.94.14 Attachments: HBase-9894-0.94.txt One of my friend encountered a RS abort issue frequently during loading data. Here is the log stack: FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server gdc-dn49-formal.i.nease.net,60020,138320 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher java.lang.AssertionError: getSplitPoint() called on a region that can't split! at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
[ https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814551#comment-13814551 ] Hudson commented on HBASE-9863: --- SUCCESS: Integrated in HBase-TRUNK #4669 (See [https://builds.apache.org/job/HBase-TRUNK/4669/]) HBASE-9863 Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs (tedyu: rev 1539129) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableNamespaceManager.java Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs --- Key: HBASE-9863 URL: https://issues.apache.org/jira/browse/HBASE-9863 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt, 9863-v4.txt, 9863-v5.txt, 9863-v6.txt TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes hung. Here were two recent occurrences: https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console There were 9 occurrences of the following in both stack traces: {code} FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 waiting for monitor entry [0x6fdf8000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250) - waiting to lock 0x7f69b5f0 (a org.apache.hadoop.hbase.master.TableNamespaceManager) at org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146) at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) {code} The test hung here: {code} pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() [0x74efe000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436) - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931) at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598) at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116) - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94) - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124) at org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594) at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485) at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-7544) Transparent table/CF encryption
[ https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-7544: -- Attachment: 7544.patch Transparent table/CF encryption --- Key: HBASE-7544 URL: https://issues.apache.org/jira/browse/HBASE-7544 Project: HBase Issue Type: New Feature Components: HFile, io Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 7544p3.patch, 7544p3.patch, 7544p4.patch, historical-7544.patch, historical-7544.pdf, historical-shell.patch Introduce transparent encryption of HBase on disk data. Depends on a separate contribution of an encryption codec framework to Hadoop core and an AES-NI (native code) codec. This is work done in the context of MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and HDFS parts of it. Requirements: - Transparent encryption at the CF or table level - Protect against all data leakage from files at rest - Two-tier key architecture for consistency with best practices for this feature in the RDBMS world - Built-in key management - Flexible and non-intrusive key rotation - Mechanisms not exposed to or modifiable by users - Hardware security module integration (via Java KeyStore) - HBCK support for transparently encrypted files (+ plugin architecture for HBCK) Additional goals: - Shell support for administrative functions - Avoid performance impact for the null crypto codec case - Play nicely with other changes underway: in HFile, block coding, etc. We're aiming for rough parity with Oracle's transparent tablespace encryption feature, described in http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf as {quote} “Transparent Data Encryption uses a 2-tier key architecture for flexible and non-intrusive key rotation and least operational and performance impact: Each application table with at least one encrypted column has its own table key, which is applied to all encrypted columns in that table. Equally, each encrypted tablespace has its own tablespace key. Table keys are stored in the data dictionary of the database, while tablespace keys are stored in the header of the tablespace and additionally, the header of each underlying OS file that makes up the tablespace. Each of these keys is encrypted with the TDE master encryption key, which is stored outside of the database in an external security module: either the Oracle Wallet (a PKCS#12 formatted file that is encrypted using a passphrase supplied either by the designated security administrator or DBA during setup), or a Hardware Security Module (HSM) device for higher assurance […]” {quote} Further design details forthcoming in a design document and patch as soon as we have all of the clearances in place. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-7544) Transparent table/CF encryption
[ https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-7544: -- Status: Patch Available (was: Open) Fix a findbug warning and kick off HadoopQA Transparent table/CF encryption --- Key: HBASE-7544 URL: https://issues.apache.org/jira/browse/HBASE-7544 Project: HBase Issue Type: New Feature Components: HFile, io Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 7544p3.patch, 7544p3.patch, 7544p4.patch, historical-7544.patch, historical-7544.pdf, historical-shell.patch Introduce transparent encryption of HBase on disk data. Depends on a separate contribution of an encryption codec framework to Hadoop core and an AES-NI (native code) codec. This is work done in the context of MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and HDFS parts of it. Requirements: - Transparent encryption at the CF or table level - Protect against all data leakage from files at rest - Two-tier key architecture for consistency with best practices for this feature in the RDBMS world - Built-in key management - Flexible and non-intrusive key rotation - Mechanisms not exposed to or modifiable by users - Hardware security module integration (via Java KeyStore) - HBCK support for transparently encrypted files (+ plugin architecture for HBCK) Additional goals: - Shell support for administrative functions - Avoid performance impact for the null crypto codec case - Play nicely with other changes underway: in HFile, block coding, etc. We're aiming for rough parity with Oracle's transparent tablespace encryption feature, described in http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf as {quote} “Transparent Data Encryption uses a 2-tier key architecture for flexible and non-intrusive key rotation and least operational and performance impact: Each application table with at least one encrypted column has its own table key, which is applied to all encrypted columns in that table. Equally, each encrypted tablespace has its own tablespace key. Table keys are stored in the data dictionary of the database, while tablespace keys are stored in the header of the tablespace and additionally, the header of each underlying OS file that makes up the tablespace. Each of these keys is encrypted with the TDE master encryption key, which is stored outside of the database in an external security module: either the Oracle Wallet (a PKCS#12 formatted file that is encrypted using a passphrase supplied either by the designated security administrator or DBA during setup), or a Hardware Security Module (HSM) device for higher assurance […]” {quote} Further design details forthcoming in a design document and patch as soon as we have all of the clearances in place. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9874) Append and Increment operation drops Tags
[ https://issues.apache.org/jira/browse/HBASE-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-9874: -- Resolution: Fixed Release Note: During Append/Increment operation, new cells will carry tags from old cell as well as tags passed in the cells in Append/Increment. A new CP hook is provided which will be called after the new cell is created and before it is getting written to memstore/WAL. A user can use this hook to possibly change the new cell. The above naive merge of tags may result in duplicates. This hook can decide which all tags should go in finally. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to Trunk. Thanks for the reviews. Append and Increment operation drops Tags - Key: HBASE-9874 URL: https://issues.apache.org/jira/browse/HBASE-9874 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.98.0 Attachments: AccessController.postMutationBeforeWAL.txt, HBASE-9874.patch, HBASE-9874_V2.patch, HBASE-9874_V3.patch We should consider tags in the existing cells as well as tags coming in the cells within Increment/Append -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9818: -- Attachment: 9818-v4.txt Patch v4 unifies FSDataInputStreamWrapper#getStream() and FSDataInputStreamWrapper#shouldUseHBaseChecksum() The tests, on Linux, have reached iteration #82. NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) {noformat} --
[jira] [Commented] (HBASE-7544) Transparent table/CF encryption
[ https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814581#comment-13814581 ] Hadoop QA commented on HBASE-7544: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612313/7544.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 82 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestStripeCompactor Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7742//console This message is automatically generated. Transparent table/CF encryption --- Key: HBASE-7544 URL: https://issues.apache.org/jira/browse/HBASE-7544 Project: HBase Issue Type: New Feature Components: HFile, io Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 7544p3.patch, 7544p3.patch, 7544p4.patch, historical-7544.patch, historical-7544.pdf, historical-shell.patch Introduce transparent encryption of HBase on disk data. Depends on a separate contribution of an encryption codec framework to Hadoop core and an AES-NI (native code) codec. This is work done in the context of MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and HDFS parts of it. Requirements: - Transparent encryption at the CF or table level - Protect against all data leakage from files at rest - Two-tier key architecture for consistency with best practices for this feature in the RDBMS world - Built-in key management - Flexible and non-intrusive key rotation - Mechanisms not exposed to or modifiable by users - Hardware security module integration (via Java KeyStore) - HBCK support for transparently encrypted files (+ plugin architecture for HBCK) Additional goals: - Shell support for administrative functions - Avoid performance impact for the null crypto codec case - Play nicely with other changes underway: in HFile, block coding, etc. We're aiming for rough parity with Oracle's transparent tablespace encryption feature, described in http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf as {quote} “Transparent Data
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814586#comment-13814586 ] Jimmy Xiang commented on HBASE-9818: I was wondering how v4 solves the issue. With the patch, if the stream is closed somewhere, instead of a NPE, we may get an IOException saying the stream is closed. If the stream is not closed somewhere, why is it null? Never initialized? NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:724) 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] regionserver.HRegionServer: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 53438 But the nextCallSeq got from client: 53437; request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: false next_call_seq: 53437 at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at
[jira] [Updated] (HBASE-7663) [Per-KV security] Visibility labels
[ https://issues.apache.org/jira/browse/HBASE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-7663: -- Attachment: HBASE-7663_V6.patch Rebased patch for latest trunk. [Per-KV security] Visibility labels --- Key: HBASE-7663 URL: https://issues.apache.org/jira/browse/HBASE-7663 Project: HBase Issue Type: Sub-task Components: Coprocessors, security Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Anoop Sam John Fix For: 0.98.0 Attachments: HBASE-7663.patch, HBASE-7663_V2.patch, HBASE-7663_V3.patch, HBASE-7663_V4.patch, HBASE-7663_V5.patch, HBASE-7663_V6.patch Implement Accumulo-style visibility labels. Consider the following design principles: - Coprocessor based implementation - Minimal to no changes to core code - Use KeyValue tags (HBASE-7448) to carry labels - Use OperationWithAttributes# {get,set}Attribute for handling visibility labels in the API - Implement a new filter for evaluating visibility labels as KVs are streamed through. This approach would be consistent in deployment and API details with other per-KV security work, supporting environments where they might be both be employed, even stacked on some tables. See the parent issue for more discussion. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814592#comment-13814592 ] Francis Liu commented on HBASE-9890: Sorry late to the party here. Went through the patch looks good. We should probably address the case where we're talking to more than one hbase cluster hence more than one hbase DT token. We should probably support the mechanism hbase provided via QUORUM_ADDRESS. As well as oozie outright retrieve a bunch of hbase delegation tokens and us just making sure that gets passed onto the job. MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user
[ https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814594#comment-13814594 ] Francis Liu commented on HBASE-9890: To answer the question why I chose SecureBulkLoad to be keyed on isHBaseSecurity enabled. It was mainly because I wanted to keep things simple, I was under the assumption that most would choose to secure the entire stack or secure none. For the non-secure hbase + secure hdfs, I'd expect the user to just run chmod 777 before calling LoadIncrementalHFiles. Having said that it prolly won't hurt as a convenience to the user if we key it on isHadoopSecurityEnabled. MR jobs are not working if started by a delegated user -- Key: HBASE-9890 URL: https://issues.apache.org/jira/browse/HBASE-9890 Project: HBase Issue Type: Bug Components: mapreduce, security Affects Versions: 0.98.0, 0.94.12, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, HBASE-9890-v0.patch, HBASE-9890-v1.patch If Map-Reduce jobs are started with by a proxy user that has already the delegation tokens, we get an exception on obtain token since the proxy user doesn't have the kerberos auth. For example: * If we use oozie to execute RowCounter - oozie will get the tokens required (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter tries to obtain the token, it will get an exception. * If we use oozie to execute LoadIncrementalHFiles - oozie will get the tokens required (HDFS_DELEGATION_TOKEN) and it will start the LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the token, it will get an exception. {code} org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only allowed for Kerberos authenticated clients at org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87) {code} {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814608#comment-13814608 ] chunhui shen commented on HBASE-9000: - [~stepinto] I understand the scenario which the patch is used for. Should we do the same thing in StoreFileScanner? If so, why not do this in StoreScanner, for example, call next some times before call reseek... As personal view, such a action seems a little rude +0 from me Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-7544) Transparent table/CF encryption
[ https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-7544: -- Status: Patch Available (was: Open) Remove an unwanted change in TestStripeCompactor and resubmit. Checked the FindBugs report, and locally prior to patch submission, and didn't see new items on account of this patch. Transparent table/CF encryption --- Key: HBASE-7544 URL: https://issues.apache.org/jira/browse/HBASE-7544 Project: HBase Issue Type: New Feature Components: HFile, io Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 7544.patch, 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 7544p3.patch, 7544p3.patch, 7544p4.patch, historical-7544.patch, historical-7544.pdf, historical-shell.patch Introduce transparent encryption of HBase on disk data. Depends on a separate contribution of an encryption codec framework to Hadoop core and an AES-NI (native code) codec. This is work done in the context of MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and HDFS parts of it. Requirements: - Transparent encryption at the CF or table level - Protect against all data leakage from files at rest - Two-tier key architecture for consistency with best practices for this feature in the RDBMS world - Built-in key management - Flexible and non-intrusive key rotation - Mechanisms not exposed to or modifiable by users - Hardware security module integration (via Java KeyStore) - HBCK support for transparently encrypted files (+ plugin architecture for HBCK) Additional goals: - Shell support for administrative functions - Avoid performance impact for the null crypto codec case - Play nicely with other changes underway: in HFile, block coding, etc. We're aiming for rough parity with Oracle's transparent tablespace encryption feature, described in http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf as {quote} “Transparent Data Encryption uses a 2-tier key architecture for flexible and non-intrusive key rotation and least operational and performance impact: Each application table with at least one encrypted column has its own table key, which is applied to all encrypted columns in that table. Equally, each encrypted tablespace has its own tablespace key. Table keys are stored in the data dictionary of the database, while tablespace keys are stored in the header of the tablespace and additionally, the header of each underlying OS file that makes up the tablespace. Each of these keys is encrypted with the TDE master encryption key, which is stored outside of the database in an external security module: either the Oracle Wallet (a PKCS#12 formatted file that is encrypted using a passphrase supplied either by the designated security administrator or DBA during setup), or a Hardware Security Module (HSM) device for higher assurance […]” {quote} Further design details forthcoming in a design document and patch as soon as we have all of the clearances in place. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9000) Linear reseek in Memstore
[ https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814609#comment-13814609 ] chunhui shen commented on HBASE-9000: - I'm sorry I have no better idea to optimize performance for this scenariofor Linear reseek in Memstore - Key: HBASE-9000 URL: https://issues.apache.org/jira/browse/HBASE-9000 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Shane Hogan Priority: Minor Fix For: 0.89-fb Attachments: hbase-9000-benchmark-program.patch, hbase-9000-port-fb.patch, hbase-9000.patch This is to address the linear reseek in MemStoreScanner. Currently reseek iterates over the kvset and the snapshot linearly by just calling next repeatedly. The new solution is to do this linear seek up to a configurable maximum amount of times then if the seek is not yet complete fall back to logarithmic seek. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-7544) Transparent table/CF encryption
[ https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-7544: -- Status: Open (was: Patch Available) Transparent table/CF encryption --- Key: HBASE-7544 URL: https://issues.apache.org/jira/browse/HBASE-7544 Project: HBase Issue Type: New Feature Components: HFile, io Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 7544.patch, 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 7544p3.patch, 7544p3.patch, 7544p4.patch, historical-7544.patch, historical-7544.pdf, historical-shell.patch Introduce transparent encryption of HBase on disk data. Depends on a separate contribution of an encryption codec framework to Hadoop core and an AES-NI (native code) codec. This is work done in the context of MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and HDFS parts of it. Requirements: - Transparent encryption at the CF or table level - Protect against all data leakage from files at rest - Two-tier key architecture for consistency with best practices for this feature in the RDBMS world - Built-in key management - Flexible and non-intrusive key rotation - Mechanisms not exposed to or modifiable by users - Hardware security module integration (via Java KeyStore) - HBCK support for transparently encrypted files (+ plugin architecture for HBCK) Additional goals: - Shell support for administrative functions - Avoid performance impact for the null crypto codec case - Play nicely with other changes underway: in HFile, block coding, etc. We're aiming for rough parity with Oracle's transparent tablespace encryption feature, described in http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf as {quote} “Transparent Data Encryption uses a 2-tier key architecture for flexible and non-intrusive key rotation and least operational and performance impact: Each application table with at least one encrypted column has its own table key, which is applied to all encrypted columns in that table. Equally, each encrypted tablespace has its own tablespace key. Table keys are stored in the data dictionary of the database, while tablespace keys are stored in the header of the tablespace and additionally, the header of each underlying OS file that makes up the tablespace. Each of these keys is encrypted with the TDE master encryption key, which is stored outside of the database in an external security module: either the Oracle Wallet (a PKCS#12 formatted file that is encrypted using a passphrase supplied either by the designated security administrator or DBA during setup), or a Hardware Security Module (HSM) device for higher assurance […]” {quote} Further design details forthcoming in a design document and patch as soon as we have all of the clearances in place. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-7544) Transparent table/CF encryption
[ https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-7544: -- Attachment: 7544.patch Transparent table/CF encryption --- Key: HBASE-7544 URL: https://issues.apache.org/jira/browse/HBASE-7544 Project: HBase Issue Type: New Feature Components: HFile, io Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 7544.patch, 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 7544p3.patch, 7544p3.patch, 7544p4.patch, historical-7544.patch, historical-7544.pdf, historical-shell.patch Introduce transparent encryption of HBase on disk data. Depends on a separate contribution of an encryption codec framework to Hadoop core and an AES-NI (native code) codec. This is work done in the context of MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and HDFS parts of it. Requirements: - Transparent encryption at the CF or table level - Protect against all data leakage from files at rest - Two-tier key architecture for consistency with best practices for this feature in the RDBMS world - Built-in key management - Flexible and non-intrusive key rotation - Mechanisms not exposed to or modifiable by users - Hardware security module integration (via Java KeyStore) - HBCK support for transparently encrypted files (+ plugin architecture for HBCK) Additional goals: - Shell support for administrative functions - Avoid performance impact for the null crypto codec case - Play nicely with other changes underway: in HFile, block coding, etc. We're aiming for rough parity with Oracle's transparent tablespace encryption feature, described in http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf as {quote} “Transparent Data Encryption uses a 2-tier key architecture for flexible and non-intrusive key rotation and least operational and performance impact: Each application table with at least one encrypted column has its own table key, which is applied to all encrypted columns in that table. Equally, each encrypted tablespace has its own tablespace key. Table keys are stored in the data dictionary of the database, while tablespace keys are stored in the header of the tablespace and additionally, the header of each underlying OS file that makes up the tablespace. Each of these keys is encrypted with the TDE master encryption key, which is stored outside of the database in an external security module: either the Oracle Wallet (a PKCS#12 formatted file that is encrypted using a passphrase supplied either by the designated security administrator or DBA during setup), or a Hardware Security Module (HSM) device for higher assurance […]” {quote} Further design details forthcoming in a design document and patch as soon as we have all of the clearances in place. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset
[ https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814611#comment-13814611 ] Hadoop QA commented on HBASE-9818: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612315/9818-v4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7743//console This message is automatically generated. NPE in HFileBlock#AbstractFSReader#readAtOffset --- Key: HBASE-9818 URL: https://issues.apache.org/jira/browse/HBASE-9818 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Ted Yu Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt HFileBlock#istream seems to be null. I was wondering should we hide FSDataInputStreamWrapper#useHBaseChecksum. By the way, this happened when online schema change is enabled (encoding) {noformat} 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166) at