date:20131105


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813737#comment-13813737
 ] 

Hadoop QA commented on HBASE-9818:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612102/9818-v1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7734//console

This message is automatically generated.

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
 Attachments: 9818-v1.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at

[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

[
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813739#comment-13813739
]

stack commented on HBASE-9892:
--

bq. Now, there is no data in regionserver's ephemeral node. It's a good idea to
write static attributes like info port there.

It is a suggestion.

Could be tricky setting this value then triggering watches. Will have to reset
them.

Maybe znode is not the right place? It is too awkward and if only this one
attribute, its a bit of work adding it there.

It could be added to the server JMX bean but you'd have to do rmi to find it
which requires a port (IIRC)

There is the RS heartbeat. Currently we send load. Seems a bit silly sending
over constant attributes on each heartbeat but might be easy to do.

Add info port to ServerName to support multi instances in a node

Key: HBASE-9892
URL: https://issues.apache.org/jira/browse/HBASE-9892
Project: HBase
Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
Attachments: HBASE-9892-0.94-v1.diff

The full GC time of regionserver with big heap( 30G ) usually can not be
controlled in 30s. At the same time, the servers with 64G memory are normal.
So we try to deploy multi rs instances(2-3 ) in a single node and the heap of
each rs is about 20G ~ 24G.
Most of the things works fine, except the hbase web ui. The master get the RS
info port from conf, which is suitable for this situation of multi rs
instances in a node. So we add info port to ServerName.
a. at the startup, rs report it's info port to Hmaster.
b, For root region, rs write the servername with info port ro the zookeeper
root-region-server node.
c, For meta regions, rs write the servername with info port to root region
d. For user regions, rs write the servername with info port to meta regions
So hmaster and client can get info port from the servername.
To test this feature, I change the rs num from 1 to 3 in standalone mode, so
we can test it in standalone mode,
I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows
how Hoya handle this problem?
PS: There are different formats for servername in zk node and meta table, i
think we need to unify it and refactor the code.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9886) Optimize ServerName#compareTo


[ 
https://issues.apache.org/jira/browse/HBASE-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813743#comment-13813743
 ] 

Hudson commented on HBASE-9886:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #826 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/826/])
HBASE-9886 Optimize ServerName#compareTo (nkeywal: rev 1538679)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerName.java


 Optimize ServerName#compareTo
 -

 Key: HBASE-9886
 URL: https://issues.apache.org/jira/browse/HBASE-9886
 Project: HBase
  Issue Type: Bug
  Components: Client, regionserver
Affects Versions: 0.98.0, 0.96.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Trivial
 Fix For: 0.98.0, 0.96.1

 Attachments: 9886.v1.patch


 It shows up in the profiling...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9859) Canary Shouldn't go off if the table being read from is disabled


[ 
https://issues.apache.org/jira/browse/HBASE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813744#comment-13813744
 ] 

Hudson commented on HBASE-9859:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #826 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/826/])
HBASE-9859 Canary Shouldn't go off if the table being read from is disabled 
(eclark: rev 1538842)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java


 Canary Shouldn't go off if the table being read from is disabled
 

 Key: HBASE-9859
 URL: https://issues.apache.org/jira/browse/HBASE-9859
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.96.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.96.1

 Attachments: HBASE-9859-0.patch, HBASE-9859-1.patch


 Disabling a table causes the Canary to go off with an error message.  We 
 should make it so that doesn't cause an error.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813745#comment-13813745
 ] 

Hudson commented on HBASE-8942:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #826 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/826/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (stack: rev 1538867)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-05 Thread Liu Shaohui (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liu Shaohui updated HBASE-9892:
---

Attachment: HBASE-9892-0.94-v2.diff

New patch for hbase 0.94

a. Write rs info port to it's ephemeral node
b. RegionServerTracker in HMaster watch regionservers node and keep a map:
servername- infoport
c, web ui in hmaster gets rs's info port from RegionServerTracker through
hmaster.

Add info port to ServerName to support multi instances in a node

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-05 Thread Liu Shaohui (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813753#comment-13813753
]

Liu Shaohui commented on HBASE-9892:

{quote}
Could be tricky setting this value then triggering watches. Will have to reset
them.
{quote}
No need to reset them. RegionServerTracker only get data from zk once.

{quote}
It could be added to the server JMX bean but you'd have to do rmi to find it
which requires a port (IIRC)
There is the RS heartbeat. Currently we send load. Seems a bit silly sending
over constant attributes on each heartbeat but might be easy to do.
{quote}
I think the info port of a rs will not change after it starts up, so no need to
send over constant attributes on each heartbeat.

Add info port to ServerName to support multi instances in a node

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding

He Liangliang created HBASE-9893:


 Summary: Incorrect assert condition in OrderedBytes decoding
 Key: HBASE-9893
 URL: https://issues.apache.org/jira/browse/HBASE-9893
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: He Liangliang
Assignee: He Liangliang
Priority: Minor


The following assert condition is incorrect when decoding blob var byte array.

assert t == 0 : Unexpected bits remaining after decoding blob.;

When the number of bytes to decode is multiples of 8 (i.e the original number 
of bytes is multiples of 7), this assert may fail.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding


 [ 
https://issues.apache.org/jira/browse/HBASE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Liangliang updated HBASE-9893:
-

Description: 
The following assert condition is incorrect when decoding blob var byte array.
code
assert t == 0 : Unexpected bits remaining after decoding blob.;
/code
When the number of bytes to decode is multiples of 8 (i.e the original number 
of bytes is multiples of 7), this assert may fail.

  was:
The following assert condition is incorrect when decoding blob var byte array.

assert t == 0 : Unexpected bits remaining after decoding blob.;

When the number of bytes to decode is multiples of 8 (i.e the original number 
of bytes is multiples of 7), this assert may fail.


 Incorrect assert condition in OrderedBytes decoding
 ---

 Key: HBASE-9893
 URL: https://issues.apache.org/jira/browse/HBASE-9893
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: He Liangliang
Assignee: He Liangliang
Priority: Minor

 The following assert condition is incorrect when decoding blob var byte array.
 code
 assert t == 0 : Unexpected bits remaining after decoding blob.;
 /code
 When the number of bytes to decode is multiples of 8 (i.e the original number 
 of bytes is multiples of 7), this assert may fail.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding


 [ 
https://issues.apache.org/jira/browse/HBASE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Liangliang updated HBASE-9893:
-

Description: 
The following assert condition is incorrect when decoding blob var byte array.
{code}
assert t == 0 : Unexpected bits remaining after decoding blob.;
{code}
When the number of bytes to decode is multiples of 8 (i.e the original number 
of bytes is multiples of 7), this assert may fail.

  was:
The following assert condition is incorrect when decoding blob var byte array.
code
assert t == 0 : Unexpected bits remaining after decoding blob.;
/code
When the number of bytes to decode is multiples of 8 (i.e the original number 
of bytes is multiples of 7), this assert may fail.


 Incorrect assert condition in OrderedBytes decoding
 ---

 Key: HBASE-9893
 URL: https://issues.apache.org/jira/browse/HBASE-9893
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: He Liangliang
Assignee: He Liangliang
Priority: Minor

 The following assert condition is incorrect when decoding blob var byte array.
 {code}
 assert t == 0 : Unexpected bits remaining after decoding blob.;
 {code}
 When the number of bytes to decode is multiples of 8 (i.e the original number 
 of bytes is multiples of 7), this assert may fail.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9886) Optimize ServerName#compareTo


[ 
https://issues.apache.org/jira/browse/HBASE-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813754#comment-13813754
 ] 

Hudson commented on HBASE-9886:
---

FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/113/])
HBASE-9886 Optimize ServerName#compareTo (nkeywal: rev 1538678)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerName.java


 Optimize ServerName#compareTo
 -

 Key: HBASE-9886
 URL: https://issues.apache.org/jira/browse/HBASE-9886
 Project: HBase
  Issue Type: Bug
  Components: Client, regionserver
Affects Versions: 0.98.0, 0.96.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Trivial
 Fix For: 0.98.0, 0.96.1

 Attachments: 9886.v1.patch


 It shows up in the profiling...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9859) Canary Shouldn't go off if the table being read from is disabled


[ 
https://issues.apache.org/jira/browse/HBASE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813755#comment-13813755
 ] 

Hudson commented on HBASE-9859:
---

FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/113/])
HBASE-9859 Canary Shouldn't go off if the table being read from is disabled 
(eclark: rev 1538843)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java


 Canary Shouldn't go off if the table being read from is disabled
 

 Key: HBASE-9859
 URL: https://issues.apache.org/jira/browse/HBASE-9859
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.96.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.96.1

 Attachments: HBASE-9859-0.patch, HBASE-9859-1.patch


 Disabling a table causes the Canary to go off with an error message.  We 
 should make it so that doesn't cause an error.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813757#comment-13813757
 ] 

Hudson commented on HBASE-8942:
---

FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/113/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (stack: rev 1538868)
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9880) client.TestAsyncProcess.testWithNoClearOnFail broke on 0.96 by HBASE-9867


[ 
https://issues.apache.org/jira/browse/HBASE-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813758#comment-13813758
 ] 

Hudson commented on HBASE-9880:
---

FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/113/])
HBASE-9880 client.TestAsyncProcess.testWithNoClearOnFail broke on 0.96 by 
HBASE-9867 (nkeywal: rev 1538676)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java


 client.TestAsyncProcess.testWithNoClearOnFail broke on 0.96 by HBASE-9867 
 --

 Key: HBASE-9880
 URL: https://issues.apache.org/jira/browse/HBASE-9880
 Project: HBase
  Issue Type: Test
Reporter: stack
Assignee: Nicolas Liochon
 Attachments: 9880.v1.patch


 It looks like the backport of HBASE-9867 broke 0.96 build (fine on trunk).  
 This was my patch.  Let me fix.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding


 [ 
https://issues.apache.org/jira/browse/HBASE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Liangliang updated HBASE-9893:
-

Attachment: HBASE-9893.patch

 Incorrect assert condition in OrderedBytes decoding
 ---

 Key: HBASE-9893
 URL: https://issues.apache.org/jira/browse/HBASE-9893
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: He Liangliang
Assignee: He Liangliang
Priority: Minor
 Attachments: HBASE-9893.patch


 The following assert condition is incorrect when decoding blob var byte array.
 {code}
 assert t == 0 : Unexpected bits remaining after decoding blob.;
 {code}
 When the number of bytes to decode is multiples of 8 (i.e the original number 
 of bytes is multiples of 7), this assert may fail.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9867) Save on array copies with a subclass of LiteralByteString


[ 
https://issues.apache.org/jira/browse/HBASE-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813756#comment-13813756
 ] 

Hudson commented on HBASE-9867:
---

FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/113/])
HBASE-9880 client.TestAsyncProcess.testWithNoClearOnFail broke on 0.96 by 
HBASE-9867 (nkeywal: rev 1538676)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java


 Save on array copies with a subclass of LiteralByteString
 -

 Key: HBASE-9867
 URL: https://issues.apache.org/jira/browse/HBASE-9867
 Project: HBase
  Issue Type: Improvement
  Components: Protobufs
Affects Versions: 0.96.0
Reporter: stack
Assignee: stack
 Fix For: 0.98.0, 0.96.1

 Attachments: 9867.096.txt, 9867.txt, 9867.txt, 9867v2.txt


 Any time we add a byte array to a protobuf, it'll copy the byte array.
 I was playing with the client and noticed how a bunch of CPU and copying was 
 being done just to copy basic arrays doing pb construction.  I started to 
 look at ByteString and then remembered a class Benoit sent me a while back 
 that I did not understand from his new AsyncHBase.  After looking in 
 ByteString it made now sense.  So, rather than copy byte arrays everywhere, 
 do a version of a ByteString that instead wraps the array.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9893) Incorrect assert condition in OrderedBytes decoding


[ 
https://issues.apache.org/jira/browse/HBASE-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813760#comment-13813760
 ] 

He Liangliang commented on HBASE-9893:
--

[~ndimiduk] minor issue, a quick fix.

 Incorrect assert condition in OrderedBytes decoding
 ---

 Key: HBASE-9893
 URL: https://issues.apache.org/jira/browse/HBASE-9893
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: He Liangliang
Assignee: He Liangliang
Priority: Minor
 Attachments: HBASE-9893.patch


 The following assert condition is incorrect when decoding blob var byte array.
 {code}
 assert t == 0 : Unexpected bits remaining after decoding blob.;
 {code}
 When the number of bytes to decode is multiples of 8 (i.e the original number 
 of bytes is multiples of 7), this assert may fail.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8369) MapReduce over snapshot files

[
https://issues.apache.org/jira/browse/HBASE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813808#comment-13813808
]

Hadoop QA commented on HBASE-8369:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12612097/hbase-8369_v7.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 16 new
or modified tests.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 7 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 site{color}. The patch appears to cause mvn site goal to
fail.

{color:red}-1 core tests{color}. The patch failed these unit tests:

{color:red}-1 core zombie tests{color}. There are 1 zombie test(s):
at
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/7735//console

This message is automatically generated.

MapReduce over snapshot files
-

Key: HBASE-8369
URL: https://issues.apache.org/jira/browse/HBASE-8369
Project: HBase
Issue Type: New Feature
Components: mapreduce, snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Fix For: 0.98.0

Attachments: HBASE-8369-0.94.patch, HBASE-8369-0.94_v2.patch,
HBASE-8369-0.94_v3.patch, HBASE-8369-0.94_v4.patch, HBASE-8369-0.94_v5.patch,
HBASE-8369-trunk_v1.patch, HBASE-8369-trunk_v2.patch,
HBASE-8369-trunk_v3.patch, hbase-8369_v0.patch, hbase-8369_v5.patch,
hbase-8369_v6.patch, hbase-8369_v7.patch

The idea is to add an InputFormat, which can run the mapreduce job over
snapshot files directly bypassing hbase server layer. The IF is similar in
usage to TableInputFormat, taking a Scan object from the user, but instead of
running from an online table, it runs from a table snapshot. We do one split
per region in the snapshot, and open an HRegion inside the RecordReader. A
RegionScanner is used internally for doing the scan without any HRegionServer
bits.
Users have been asking and searching for ways to run MR jobs by reading
directly from hfiles, so this allows new use cases if reading from stale data
is ok:
- Take snapshots periodically, and run MR jobs only on snapshots.
- Export snapshots to remote hdfs cluster, run the MR jobs at that cluster
without HBase cluster.
- (Future use case) Combine snapshot data with online hbase data: Scan from
yesterday's snapshot, but read today's data from online hbase cluster.

--
This message was sent by Atlassian JIRA

[jira] [Created] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()

Liang Xie created HBASE-9894:


 Summary: remove the inappropriate assert statement in 
Store.getSplitPoint()
 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.12, 0.94.6
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor


One of my friend encountered a RS abort issue frequently during loading data. 
Here is the log stack:
FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
server gdc-dn49-formal.i.nease.net,60020,138320
3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
java.lang.AssertionError: getSplitPoint() called on a region that can't split!
at 
org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
at 
org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
at 
org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()


 [ 
https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-9894:
-

Attachment: HBase-9894-0.94.txt

 remove the inappropriate assert statement in Store.getSplitPoint()
 --

 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.6, 0.94.12
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HBase-9894-0.94.txt


 One of my friend encountered a RS abort issue frequently during loading data. 
 Here is the log stack:
 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
 server gdc-dn49-formal.i.nease.net,60020,138320
 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
 java.lang.AssertionError: getSplitPoint() called on a region that can't split!
 at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
 at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()


[ 
https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813824#comment-13813824
 ] 

Liang Xie commented on HBASE-9894:
--

HBase version:  0.94.6-cdh4.3.0

java -version：
 
java version 1.6.0_26
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

export HBASE_OPTS=-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

To me, that assert statement in getSplitPoint() is inappropriate, and in trunk 
code, it has been removed already, let's just make a one line remove here, is 
that OK? [~lhofhansl]

 remove the inappropriate assert statement in Store.getSplitPoint()
 --

 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.6, 0.94.12
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HBase-9894-0.94.txt


 One of my friend encountered a RS abort issue frequently during loading data. 
 Here is the log stack:
 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
 server gdc-dn49-formal.i.nease.net,60020,138320
 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
 java.lang.AssertionError: getSplitPoint() called on a region that can't split!
 at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
 at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()


 [ 
https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-9894:
-

Status: Patch Available  (was: Open)

 remove the inappropriate assert statement in Store.getSplitPoint()
 --

 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.12, 0.94.6
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HBase-9894-0.94.txt


 One of my friend encountered a RS abort issue frequently during loading data. 
 Here is the log stack:
 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
 server gdc-dn49-formal.i.nease.net,60020,138320
 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
 java.lang.AssertionError: getSplitPoint() called on a region that can't split!
 at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
 at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813843#comment-13813843
 ] 

Hudson commented on HBASE-8942:
---

SUCCESS: Integrated in hbase-0.96 #180 (See 
[https://builds.apache.org/job/hbase-0.96/180/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (stack: rev 1538868)
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9859) Canary Shouldn't go off if the table being read from is disabled


[ 
https://issues.apache.org/jira/browse/HBASE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813842#comment-13813842
 ] 

Hudson commented on HBASE-9859:
---

SUCCESS: Integrated in hbase-0.96 #180 (See 
[https://builds.apache.org/job/hbase-0.96/180/])
HBASE-9859 Canary Shouldn't go off if the table being read from is disabled 
(eclark: rev 1538843)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java


 Canary Shouldn't go off if the table being read from is disabled
 

 Key: HBASE-9859
 URL: https://issues.apache.org/jira/browse/HBASE-9859
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.96.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.96.1

 Attachments: HBASE-9859-0.patch, HBASE-9859-1.patch


 Disabling a table causes the Canary to go off with an error message.  We 
 should make it so that doesn't cause an error.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9859) Canary Shouldn't go off if the table being read from is disabled


[ 
https://issues.apache.org/jira/browse/HBASE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813851#comment-13813851
 ] 

Hudson commented on HBASE-9859:
---

SUCCESS: Integrated in HBase-TRUNK #4668 (See 
[https://builds.apache.org/job/HBase-TRUNK/4668/])
HBASE-9859 Canary Shouldn't go off if the table being read from is disabled 
(eclark: rev 1538842)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java


 Canary Shouldn't go off if the table being read from is disabled
 

 Key: HBASE-9859
 URL: https://issues.apache.org/jira/browse/HBASE-9859
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.96.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.96.1

 Attachments: HBASE-9859-0.patch, HBASE-9859-1.patch


 Disabling a table causes the Canary to go off with an error message.  We 
 should make it so that doesn't cause an error.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813852#comment-13813852
 ] 

Hudson commented on HBASE-8942:
---

SUCCESS: Integrated in HBase-TRUNK #4668 (See 
[https://builds.apache.org/job/HBase-TRUNK/4668/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers (stack: rev 1538867)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user

[
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813853#comment-13813853
]

Hadoop QA commented on HBASE-9890:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12612094/HBASE-9890-v1.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 2 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 site{color}. The patch appears to cause mvn site goal to
fail.

{color:green}+1 core tests{color}. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/7736//console

This message is automatically generated.

MR jobs are not working if started by a delegated user
--

Key: HBASE-9890
URL: https://issues.apache.org/jira/browse/HBASE-9890
Project: HBase
Issue Type: Bug
Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Fix For: 0.98.0, 0.94.13, 0.96.1

Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch,
HBASE-9890-v1.patch

If Map-Reduce jobs are started with by a proxy user that has already the
delegation tokens, we get an exception on obtain token since the proxy user
doesn't have the kerberos auth.
For example:
* If we use oozie to execute RowCounter - oozie will get the tokens required
(HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter
tries to obtain the token, it will get an exception.
* If we use oozie to execute LoadIncrementalHFiles - oozie will get the
tokens required (HDFS_DELEGATION_TOKEN) and it will start the
LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the
token, it will get an exception.
{code}
org.apache.hadoop.hbase.security.AccessDeniedException: Token generation
only allowed for Kerberos authenticated clients
at
org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
{code}
{code}
org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token
can be issued only with kerberos or web authentication
at
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
at

[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()


[ 
https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813858#comment-13813858
 ] 

Hadoop QA commented on HBASE-9894:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612145/HBase-9894-0.94.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7738//console

This message is automatically generated.

 remove the inappropriate assert statement in Store.getSplitPoint()
 --

 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.6, 0.94.12
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HBase-9894-0.94.txt


 One of my friend encountered a RS abort issue frequently during loading data. 
 Here is the log stack:
 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
 server gdc-dn49-formal.i.nease.net,60020,138320
 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
 java.lang.AssertionError: getSplitPoint() called on a region that can't split!
 at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
 at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9889) Make sure we clean up scannerReadPoints upon any exceptions


[ 
https://issues.apache.org/jira/browse/HBASE-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813868#comment-13813868
 ] 

Hadoop QA commented on HBASE-9889:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612003/hbase-9889.diff
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestNamespaceCommands

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7737//console

This message is automatically generated.

 Make sure we clean up scannerReadPoints upon any exceptions
 ---

 Key: HBASE-9889
 URL: https://issues.apache.org/jira/browse/HBASE-9889
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.89-fb, 0.94.12, 0.96.0
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.96.1

 Attachments: hbase-9889.diff


 If there is an exception in the creation of RegionScanner (for example, 
 exception while opening store files) the scanner Read points is not cleaned 
 up.
 Having an unused old entry in the scannerReadPoints means that flushes and 
 compactions cannot garbage-collect older versions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()


[ 
https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813985#comment-13813985
 ] 

stack commented on HBASE-9894:
--

The assert looks a little silly.  Without it, we return null and just do not 
split the region?

 remove the inappropriate assert statement in Store.getSplitPoint()
 --

 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.6, 0.94.12
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HBase-9894-0.94.txt


 One of my friend encountered a RS abort issue frequently during loading data. 
 Here is the log stack:
 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
 server gdc-dn49-formal.i.nease.net,60020,138320
 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
 java.lang.AssertionError: getSplitPoint() called on a region that can't split!
 at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
 at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node


[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813991#comment-13813991
 ] 

stack commented on HBASE-9892:
--

Patch looks fine.

Did you intend to include this in the patch?

Index: src/main/java/org/apache/hadoop/hbase/regionserver/RSDumpServlet.java

... and this?

Index: src/main/java/org/apache/hadoop/hbase/regionserver/RSStatusServlet.java

Should this be public?  Can it be package protected?

getRegionServerInfoPort

Can we write the znode content as protobuf?  Will make it easier adding new 
attributes and in trunk, all znodes are pb:

+String nodePath = ZKUtil.joinZNode(watcher.rsZNode, n);
+infoPort = Bytes.toInt(ZKUtil.getData(watcher, nodePath));

If you need help, I can help do the trunk patch np.

Good stuff.



 Add info port to ServerName to support multi instances in a node
 

 Key: HBASE-9892
 URL: https://issues.apache.org/jira/browse/HBASE-9892
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff


 The full GC time of  regionserver with big heap( 30G ) usually  can not be 
 controlled in 30s. At the same time, the servers with 64G memory are normal. 
 So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
 each rs is about 20G ~ 24G.
 Most of the things works fine, except the hbase web ui. The master get the RS 
 info port from conf, which is suitable for this situation of multi rs  
 instances in a node. So we add info port to ServerName.
 a. at the startup, rs report it's info port to Hmaster.
 b, For root region, rs write the servername with info port ro the zookeeper 
 root-region-server node.
 c, For meta regions, rs write the servername with info port to root region 
 d. For user regions,  rs write the servername with info port to meta regions 
 So hmaster and client can get info port from the servername.
 To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
 we can test it in standalone mode,
 I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
 how Hoya handle this problem?
 PS: There are  different formats for servername in zk node and meta table, i 
 think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()


[ 
https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814011#comment-13814011
 ] 

Lars Hofhansl commented on HBASE-9894:
--

Nobody should run in production with asserts enabled.

 remove the inappropriate assert statement in Store.getSplitPoint()
 --

 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.6, 0.94.12
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HBase-9894-0.94.txt


 One of my friend encountered a RS abort issue frequently during loading data. 
 Here is the log stack:
 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
 server gdc-dn49-formal.i.nease.net,60020,138320
 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
 java.lang.AssertionError: getSplitPoint() called on a region that can't split!
 at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
 at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9866) Support the mode where REST server authorizes proxy users

2013-11-05 Thread Francis Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814046#comment-13814046
 ] 

Francis Liu commented on HBASE-9866:


This will make auditing a bit hard since the real user is lost when it hits the 
RS. Can we log a doAs message so we can trace back?

Given that we're adding doAs support in reset. It's prolly a good idea to 
provide a way to refresh the ProxyUsers config without restarting the server.

BTW do the other webservices support doAs (hdfs's proxy, webhcat, etc)?

 Support the mode where REST server authorizes proxy users
 -

 Key: HBASE-9866
 URL: https://issues.apache.org/jira/browse/HBASE-9866
 Project: HBase
  Issue Type: Improvement
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.1

 Attachments: 9866-1.txt


 In one use case, someone was trying to authorize with the REST server as a 
 proxy user. That mode is not supported today. 
 The curl request would be something like (assuming SPNEGO auth) - 
 {noformat}
 curl -i --negotiate -u : http://HOST:PORT/version/cluster?doas=USER
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9866) Support the mode where REST server authorizes proxy users

2013-11-05 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814050#comment-13814050
 ] 

Jimmy Xiang commented on HBASE-9866:


In case the REST server shares the same configuration with rs/master, can we 
have a config and turn this feature off by default?

 Support the mode where REST server authorizes proxy users
 -

 Key: HBASE-9866
 URL: https://issues.apache.org/jira/browse/HBASE-9866
 Project: HBase
  Issue Type: Improvement
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.1

 Attachments: 9866-1.txt


 In one use case, someone was trying to authorize with the REST server as a 
 proxy user. That mode is not supported today. 
 The curl request would be something like (assuming SPNEGO auth) - 
 {noformat}
 curl -i --negotiate -u : http://HOST:PORT/version/cluster?doas=USER
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Attachment: (was: 9818-v1.txt)

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
 Attachments: 9818-v2.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Attachment: 9818-v2.txt

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
 Attachments: 9818-v2.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814071#comment-13814071
 ] 

Ted Yu commented on HBASE-9818:
---

I am looping TestHRegion and TestAtomicOperation 200 times, respectively.
Previously TestAtomicOperation failed at iteration #7. Now the tests reach 
iteration #17 and are running.

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
 Attachments: 9818-v2.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This

[jira] [Assigned] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-9818:
-

Assignee: Ted Yu

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user

2013-11-05 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814101#comment-13814101
 ] 

Nick Dimiduk commented on HBASE-9890:
-

It's also possible to go the other way, ie secured HBase but not secured HDFS: 
HBASE-9482.

 MR jobs are not working if started by a delegated user
 --

 Key: HBASE-9890
 URL: https://issues.apache.org/jira/browse/HBASE-9890
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, 
 HBASE-9890-v1.patch


 If Map-Reduce jobs are started with by a proxy user that has already the 
 delegation tokens, we get an exception on obtain token since the proxy user 
 doesn't have the kerberos auth.
 For example:
  * If we use oozie to execute RowCounter - oozie will get the tokens required 
 (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
 tries to obtain the token, it will get an exception.
  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
 tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
 LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
 token, it will get an exception.
 {code}
  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
 only allowed for Kerberos authenticated clients
 at 
 org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
 {code}
 {code}
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
 can be issued only with kerberos or web authentication
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
   at 
 org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
   at 
 org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
   at 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
   at 
 org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814111#comment-13814111
 ] 

Sergey Shelukhin commented on HBASE-9818:
-

From logs and code it seems like compactions are closing the stream via 
wrapper (via long chain of classes)
Beforehand, they are supposed to notify all scanners to rebuild heap, which is 
a synchronized method (on StoreScanner) and only then close (next/etc are also 
synchronized so it should all be properly sequenced). But somehow it's not 
happening I suspect.
Also in some cases stacks are during some initialization, not next(), so I 
haven't looked/not sure how that was supposed to be synched.
 I didn't have a lot of time, so just looked at the code. Looped 
testWritesWhileGetting 100 times and it never failed locally, which is sad.
Is there any chance/tool to find out when approx these failures started

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814118#comment-13814118
 ] 

Sergey Shelukhin commented on HBASE-9818:
-

But yeah maybe wrapper method needs to be encapsulated with getting stream 
together. Maybe there's no real sync issue, and it just needs to not throw, and 
then on next read it will rebuild the heap. Although it does seem pretty 
suspect, it has a stream and someone closes it in parallel.

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814126#comment-13814126
 ] 

Hadoop QA commented on HBASE-9818:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612207/9818-v2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.wal.TestHLog

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739//console

This message is automatically generated.

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)

[jira] [Created] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-05 Thread Jeffrey Zhong (JIRA)

Jeffrey Zhong created HBASE-9895:


 Summary: 0.96 Import utility can't import an exported file from 
0.94
 Key: HBASE-9895
 URL: https://issues.apache.org/jira/browse/HBASE-9895
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.96.0
Reporter: Jeffrey Zhong


Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
cannot import 0.94 exported files. This issue is annoying because a user can't 
import his old archive files after upgrade or archives from others who are 
using 0.94.

The ideal way is to catch deserialization error and then fall back to 0.94 
format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814142#comment-13814142
 ] 

Ted Yu commented on HBASE-9818:
---

From 
https://builds.apache.org/job/PreCommit-HBASE-Build/7739/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestHLog/testAppendClose/
 :
{code}
Stacktrace

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.hdfs.util.LightWeightGSet.init(LightWeightGSet.java:81)
at 
org.apache.hadoop.hdfs.server.namenode.BlocksMap.init(BlocksMap.java:320)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:223)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:569)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:278)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSClusterForTestHLog(HBaseTestingUtility.java:563)
at 
org.apache.hadoop.hbase.regionserver.wal.TestHLog.testAppendClose(TestHLog.java:434)
{code}
Looks like an environment issue.
'stream closed' exception, added in patch, didn't show up.

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:

[jira] [Commented] (HBASE-9866) Support the mode where REST server authorizes proxy users

2013-11-05 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814149#comment-13814149
 ] 

Devaraj Das commented on HBASE-9866:


[~toffer], yes, services like webhcat supports doAs, but they have different 
config knobs for configuring the groups/ip-addresses. Maybe they map these 
configurations to the underlying Hadoop configurations internally.

[~jxiang], okay will add a configuration for turning this feature on/off...

 Support the mode where REST server authorizes proxy users
 -

 Key: HBASE-9866
 URL: https://issues.apache.org/jira/browse/HBASE-9866
 Project: HBase
  Issue Type: Improvement
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.1

 Attachments: 9866-1.txt


 In one use case, someone was trying to authorize with the REST server as a 
 proxy user. That mode is not supported today. 
 The curl request would be something like (assuming SPNEGO auth) - 
 {noformat}
 curl -i --negotiate -u : http://HOST:PORT/version/cluster?doas=USER
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-8018) Add Flaky Testcase Detector tool into dev-tools

2013-11-05 Thread Jeffrey Zhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8018:
-

Description: 
jenkins-tools
=

A tool which pulls test case results from Jenkins server. It displays a union 
of failed test cases 
from the last 15(by default and actual number of jobs can be less depending on 
availablity) runs 
recorded in Jenkins sever and track how each of them are performed for all the 
last 15 runs(passed, 
not run or failed)

*Pre-requirement(run under folder ./dev-support/jenkins-tools)*
   Please download jenkins-client from 
https://github.com/cosmin/jenkins-client
   1) git clone git://github.com/cosmin/jenkins-client.git
   2) make sure the dependency jenkins-client version in 
./buildstats/pom.xml matches the 
  downloaded jenkins-client(current value is 0.1.6-SNAPSHOT)
   
Build command(run under folder jenkins-tools):
{code}
   mvn clean package
{code}
Usage are: 
{code}
   java -jar ./buildstats/target/buildstats.jar Jenkins HTTP URL Job 
Name [number of last most recent jobs to check]
{code}
Sample commands are:
{code}
   java -jar ./buildstats/target/buildstats.jar https://builds.apache.org 
HBase-TRUNK
{code}
Sample output(where 1 means PASSED, 0 means NOT RUN AT ALL, -1 means 
FAILED):

Failed Test Cases Stats4360 4361 4362 4363 4364 4365 4366 4367 4368 4369

org.apache.hadoop.hbase.backup.testhfilearchiving.testcleaningrace11
111111   -10
org.apache.hadoop.hbase.migration.testnamespaceupgrade.testrenameusingsnapshots 
   111   -1011111

Skipped Test Cases Stats
=== 4360 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.replication.testreplicationkillmasterrscompressed
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples
=== 4361 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples
=== 4362 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples
=== 4363 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples
=== 4368 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.client.testadmin
org.apache.hadoop.hbase.client.testclonesnapshotfromclient
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples

  was:
jenkins-tools
=

A tool which pulls test case results from Jenkins server. It displays a union 
of failed test cases 
from the last 15(by default and actual number of jobs can be less depending on 
availablity) runs 
recorded in Jenkins sever and track how each of them are performed for all the 
last 15 runs(passed, 
not run or failed)

*Pre-requirement(run under folder jenkins-tools)*
   Please download jenkins-client from 
https://github.com/cosmin/jenkins-client
   1) git clone git://github.com/cosmin/jenkins-client.git
   2) make sure the dependency jenkins-client version in 
./buildstats/pom.xml matches the 
  downloaded jenkins-client(current value is 0.1.6-SNAPSHOT)
   
Build command(run under folder jenkins-tools):
{code}
   mvn clean package
{code}
Usage are: 
{code}
   java -jar ./buildstats/target/buildstats.jar Jenkins HTTP URL Job 
Name [number of last most recent jobs to check]
{code}
Sample commands are:
{code}
   java -jar ./buildstats/target/buildstats.jar https://builds.apache.org 
HBase-TRUNK
{code}
Sample output(where 1 means PASSED, 0 means NOT RUN AT ALL, -1 means 
FAILED):

Failed Test Cases Stats4360 4361 4362 4363 4364 4365 4366 4367 4368 4369

org.apache.hadoop.hbase.backup.testhfilearchiving.testcleaningrace11
111111   -10
org.apache.hadoop.hbase.migration.testnamespaceupgrade.testrenameusingsnapshots 
   111   -1011111

Skipped Test Cases Stats
=== 4360 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.replication.testreplicationkillmasterrscompressed
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery

[jira] [Created] (HBASE-9896) Add an option to have strict number of mapper per job in HBase Streaming

2013-11-05 Thread Rishit Shroff (JIRA)

Rishit Shroff created HBASE-9896:


 Summary: Add an option to have strict number of mapper per job in 
HBase Streaming
 Key: HBASE-9896
 URL: https://issues.apache.org/jira/browse/HBASE-9896
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Affects Versions: 0.89-fb
Reporter: Rishit Shroff
Assignee: Rishit Shroff
Priority: Minor
 Fix For: 0.89-fb


Currently there is only one configuration knob available for controlling the 
number of mappers per job in HBase Streaming. The current option is number of 
mappers per region. This options tries to maintain the locality for the mapper 
and the region servers.

However, in certain certain scenarios where the table has a higher number of 
regions, the number of mapper/region can lead to explosion of the number of 
mappers. Hence, we need one more option to control the strict number of mappers 
per job.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user


[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814188#comment-13814188
 ] 

Gary Helmling commented on HBASE-9890:
--

I've looked through the secure bulk load code in a little more detail, but I 
still can't say I understand why use of SecureBulkLoadClient in 
LoadIncrementalHFiles is conditioned on isHBaseSecurityEnabled(), instead of 
isHadoopSecurityEnabled().  It seems like they should be conditioned on 
isHadoopSecurityEnabled() instead, since this is all in place to pass through 
an HDFS delegation token for moving the HFiles on secure Hadoop.

[~mbertozzi] Makes sense to me to change the LoadIncrementatlHFiles conditions 
here as well, assuming that doesn't cascade into broken tests.  But I'm also 
okay with pushing that part into a separate JIRA, since it's somewhat 
independent of the original issue.  The rest of the patch looks good to me.

[~toffer] Any insights into why SecureBulkLoadClient usage is conditioned on 
HBase security being enabled instead of HDFS security?

 MR jobs are not working if started by a delegated user
 --

 Key: HBASE-9890
 URL: https://issues.apache.org/jira/browse/HBASE-9890
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, 
 HBASE-9890-v1.patch


 If Map-Reduce jobs are started with by a proxy user that has already the 
 delegation tokens, we get an exception on obtain token since the proxy user 
 doesn't have the kerberos auth.
 For example:
  * If we use oozie to execute RowCounter - oozie will get the tokens required 
 (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
 tries to obtain the token, it will get an exception.
  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
 tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
 LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
 token, it will get an exception.
 {code}
  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
 only allowed for Kerberos authenticated clients
 at 
 org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
 {code}
 {code}
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
 can be issued only with kerberos or web authentication
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
   at 
 org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
   at 
 org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
   at 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
   at 
 org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814207#comment-13814207
 ] 

Hudson commented on HBASE-8942:
---

SUCCESS: Integrated in HBase-0.94-security #329 (See 
[https://builds.apache.org/job/HBase-0.94-security/329/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers; REVERT (stack: rev 1538869)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user

2013-11-05 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814213#comment-13814213
 ] 

Nick Dimiduk commented on HBASE-9890:
-

bq. use of SecureBulkLoadClient in LoadIncrementalHFiles is conditioned on 
isHBaseSecurityEnabled(), instead of isHadoopSecurityEnabled().

I think this is a question of practicality -- LoadIncrementalHFiles can only 
use the SecureBulkLoadClient when the appropriate coprocessor is available on 
the RS. It's only available when HBase security is enabled.

 MR jobs are not working if started by a delegated user
 --

 Key: HBASE-9890
 URL: https://issues.apache.org/jira/browse/HBASE-9890
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, 
 HBASE-9890-v1.patch


 If Map-Reduce jobs are started with by a proxy user that has already the 
 delegation tokens, we get an exception on obtain token since the proxy user 
 doesn't have the kerberos auth.
 For example:
  * If we use oozie to execute RowCounter - oozie will get the tokens required 
 (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
 tries to obtain the token, it will get an exception.
  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
 tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
 LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
 token, it will get an exception.
 {code}
  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
 only allowed for Kerberos authenticated clients
 at 
 org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
 {code}
 {code}
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
 can be issued only with kerberos or web authentication
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
   at 
 org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
   at 
 org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
   at 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
   at 
 org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers


[ 
https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814230#comment-13814230
 ] 

Hudson commented on HBASE-8942:
---

FAILURE: Integrated in HBase-0.94 #1195 (See 
[https://builds.apache.org/job/HBase-0.94/1195/])
HBASE-8942 DFS errors during a read operation (get/scan), may cause write 
outliers; REVERT (stack: rev 1538869)
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 DFS errors during a read operation (get/scan), may cause write outliers
 ---

 Key: HBASE-8942
 URL: https://issues.apache.org/jira/browse/HBASE-8942
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.95.2
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14

 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt


 This is a similar issue as discussed in HBASE-8228
 1) A scanner holds the Store.ReadLock() while opening the store files ... 
 encounters errors. Thus, takes a long time to finish.
 2) A flush is completed, in the mean while. It needs the write lock to 
 commit(), and update scanners. Hence ends up waiting.
 3+) All Puts (and also Gets) to the CF, which will need a read lock, will 
 have to wait for 1) and 2) to complete. Thus blocking updates to the system 
 for the DFS timeout.
 Fix:
  Open Store files outside the read lock. getScanners() already tries to do 
 this optimisation. However, Store.getScanner() which calls this functions 
 through the StoreScanner constructor, redundantly tries to grab the readLock. 
 Causing the readLock to be held while the storeFiles are being opened, and 
 seeked.
  We should get rid of the readLock() in Store.getScanner(). This is not 
 required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). 
 This has the required locking already.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-05 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814247#comment-13814247
 ] 

Dave Latham commented on HBASE-9865:


Looks good to me.  Thanks, Lars.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs


 [ 
https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9863:
--

Fix Version/s: 0.98.0
 Hadoop Flags: Reviewed

Integrated to trunk.

Thanks for the reviews.

 Intermittently 
 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
 ---

 Key: HBASE-9863
 URL: https://issues.apache.org/jira/browse/HBASE-9863
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.98.0

 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt, 9863-v4.txt, 
 9863-v5.txt, 9863-v6.txt


 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes 
 hung.
 Here were two recent occurrences:
 https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console
 https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console
 There were 9 occurrences of the following in both stack traces:
 {code}
 FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 
 waiting for monitor entry [0x6fdf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250)
   - waiting to lock 0x7f69b5f0 (a 
 org.apache.hadoop.hbase.master.TableNamespaceManager)
   at 
 org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146)
   at 
 org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
 {code}
 The test hung here:
 {code}
 pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() 
 [0x74efe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436)
   - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485)
   at 
 org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Attachment: 9818-v3.txt

Patch v3 allows TestHRegion and TestAtomicOperation to reach iteration #46.
Please comment.

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt, 9818-v3.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-05 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814278#comment-13814278
 ] 

churro morales commented on HBASE-9865:
---

One thing i noticed in WALEdit, we should be accounting for the ArrayList 
object as well
instead of:
{code}
public long heapSize() {
long ret = 0;
{code}
this would be correct, although it doesn't matter very much.
{code}
public long heapSize() {
long ret = ClassSize.ARRAYLIST;
{code}

If you didn't want to bleed the ArrayList implementation that WALEdit uses 
maybe something like this might work:
For WALEdit
{code}
public void removeIf(PredicateKeyValue predicate) {
for (int i = kvs.size()-1; i = 0; i--) {
  KeyValue kv = kvs.get(i);
  if (predicate.apply(kv)) {
kvs.remove(i);
  }
}
if (kvs.size()  size()/2) {
   kvs.trimToSize();
}
  }
{code}

And ReplicationSource would change to:
{code}
protected void removeNonReplicableEdits(WALEdit edit) {
final NavigableMapbyte[], Integer scopes = edit.getScopes();
edit.removeIf(new PredicateKeyValue() {
  @Override
  public boolean apply(KeyValue keyValue) {
return scopes == null || !scopes.containsKey(keyValue.getFamily());
  }
});
}
{code}

I don't think it adds much by doing this but it is an alternative if we don't 
want to bleed that the WALEdit uses an ArrayList. 

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {

[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-05 Thread Enis Soztutar (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814285#comment-13814285
]

Enis Soztutar commented on HBASE-9892:
--

Great. Left some comments at RB.

Add info port to ServerName to support multi instances in a node

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9836) Intermittent TestRegionObserverScannerOpenHook#testRegionObserverCompactionTimeStacking failure


 [ 
https://issues.apache.org/jira/browse/HBASE-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9836:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Intermittent 
 TestRegionObserverScannerOpenHook#testRegionObserverCompactionTimeStacking 
 failure
 ---

 Key: HBASE-9836
 URL: https://issues.apache.org/jira/browse/HBASE-9836
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.98.0, 0.96.1

 Attachments: 9836-v1.txt, 9836-v3.txt, 9836-v4.txt, 9836-v5.txt, 
 9836-v6.txt


 Here were two recent examples:
 https://builds.apache.org/job/hbase-0.96-hadoop2/99/testReport/org.apache.hadoop.hbase.coprocessor/TestRegionObserverScannerOpenHook/testRegionObserverCompactionTimeStacking/
 https://builds.apache.org/job/PreCommit-HBASE-Build/7616/testReport/junit/org.apache.hadoop.hbase.coprocessor/TestRegionObserverScannerOpenHook/testRegionObserverCompactionTimeStacking/
 From the second:
 {code}
 2013-10-24 18:08:10,080 INFO  [Priority.RpcServer.handler=1,port=58174] 
 regionserver.HRegionServer(3672): Flushing 
 testRegionObserverCompactionTimeStacking,,1382638088230.e96920e43ea374ba1bd559df115870cf.
 ...
 2013-10-24 18:08:10,544 INFO  [Priority.RpcServer.handler=1,port=58174] 
 regionserver.HRegion(1645): Finished memstore flush of ~128.0/128, 
 currentsize=0.0/0 for region 
 testRegionObserverCompactionTimeStacking,,1382638088230.e96920e43ea374ba1bd559df115870cf.
  in 464ms, sequenceid=5, compaction requested=true
 2013-10-24 18:08:10,546 DEBUG [Priority.RpcServer.handler=1,port=58174] 
 regionserver.CompactSplitThread(319): Small Compaction requested: system; 
 Because: Compaction through user triggered flush; compaction_queue=(0:0), 
 split_queue=0, merge_queue=0
 2013-10-24 18:08:10,547 DEBUG 
 [RS:0;asf002:58174-smallCompactions-1382638090545] 
 compactions.RatioBasedCompactionPolicy(92): Selecting compaction from 2 store 
 files, 0 compacting, 2 eligible, 10 blocking
 2013-10-24 18:08:10,547 DEBUG [pool-1-thread-1] catalog.CatalogTracker(209): 
 Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4be179
 2013-10-24 18:08:10,549 DEBUG 
 [RS:0;asf002:58174-smallCompactions-1382638090545] 
 compactions.ExploringCompactionPolicy(112): Exploring compaction algorithm 
 has selected 2 files of size 1999 starting at candidate #0 after considering 
 1 permutations with 1 in ratio
 2013-10-24 18:08:10,551 DEBUG 
 [RS:0;asf002:58174-smallCompactions-1382638090545] regionserver.HStore(1329): 
 e96920e43ea374ba1bd559df115870cf - A: Initiating major compaction
 2013-10-24 18:08:10,551 INFO  
 [RS:0;asf002:58174-smallCompactions-1382638090545] 
 regionserver.HRegion(1294): Starting compaction on A in region 
 testRegionObserverCompactionTimeStacking,,1382638088230.e96920e43ea374ba1bd559df115870cf.
 2013-10-24 18:08:10,551 INFO  
 [RS:0;asf002:58174-smallCompactions-1382638090545] regionserver.HStore(982): 
 Starting compaction of 2 file(s) in A of 
 testRegionObserverCompactionTimeStacking,,1382638088230.e96920e43ea374ba1bd559df115870cf.
  into 
 tmpdir=hdfs://localhost:49506/user/jenkins/hbase/data/default/testRegionObserverCompactionTimeStacking/e96920e43ea374ba1bd559df115870cf/.tmp,
  totalSize=2.0k
 2013-10-24 18:08:10,552 DEBUG 
 [RS:0;asf002:58174-smallCompactions-1382638090545] 
 compactions.Compactor(168): Compacting 
 hdfs://localhost:49506/user/jenkins/hbase/data/default/testRegionObserverCompactionTimeStacking/e96920e43ea374ba1bd559df115870cf/A/44f87b94732149c08f20bdba00dd7140,
  keycount=1, bloomtype=ROW, size=992.0, encoding=NONE, seqNum=3, 
 earliestPutTs=1382638089528
 2013-10-24 18:08:10,552 DEBUG 
 [RS:0;asf002:58174-smallCompactions-1382638090545] 
 compactions.Compactor(168): Compacting 
 hdfs://localhost:49506/user/jenkins/hbase/data/default/testRegionObserverCompactionTimeStacking/e96920e43ea374ba1bd559df115870cf/A/0b2e580cbda246718bbf64c21e81cd18,
  keycount=1, bloomtype=ROW, size=1007.0, encoding=NONE, seqNum=5, 
 earliestPutTs=1382638090053
 2013-10-24 18:08:10,564 DEBUG 
 [RS:0;asf002:58174-smallCompactions-1382638090545] util.FSUtils(305): DFS 
 Client does not support most favored nodes create; using default create
 ...
 Potentially hanging thread: RS:0;asf002:58174-smallCompactions-1382638090545
   java.lang.Object.wait(Native Method)
   java.lang.Object.wait(Object.java:485)
   org.apache.hadoop.ipc.Client.call(Client.java:1099)
   org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
   $Proxy9.complete(Unknown Source)
   sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
   
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   java.lang.reflect.Method.invoke(Method.java:597)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814302#comment-13814302
 ] 

Lars Hofhansl commented on HBASE-9865:
--

Thanks Churro (and Dave). While we're add it, might as well fix 
WALEdit.heapSize().
The other change does not help with readability I think. It's not so bad to 
leak this out of WALEdit, if anything it declares that this is a random access 
list.

I'll make a 0.94 patch as well. Any chance you would try it on a real cluster?

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I

[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user


[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814309#comment-13814309
 ] 

Gary Helmling commented on HBASE-9890:
--

bq. I think this is a question of practicality – LoadIncrementalHFiles can only 
use the SecureBulkLoadClient when the appropriate coprocessor is available on 
the RS. It's only available when HBase security is enabled.

Whether {{hbase.security.authentication == kerberos}} has nothing to do with 
whether SecureBulkLoadEndpoint is loaded on a table's regions.  The coprocessor 
needs to be configured independently (via hbase.coprocessor.region.classes, 
hbase.coprocessor.user.region.classes, or directly on the table).  It does also 
assume that the AccessController coprocessor is enabled, but that again can be 
independent of authentication.

I may be missing something, but it seems like the main use of 
SecureBulkLoadEndpoint is to move the bulk load HFiles to a staging directory, 
proxying to HDFS as the end user.  Even the AccessController checks (which 
should only happen if AccessController is enabled), can be done independently 
of whether HBase requires kerberos authentication (you can do access control 
without kerberos auth).  So the secure bulk loading seems to me to only be 
required when HDFS secure auth is enabled, and should be usable in that case 
regardless of the value of hbase.security.authentication.

There is a bigger issue here, in that we are amassing a pile of security 
configurations that are all exposed (and must be put together) by end users.  
But I think that is solvable by providing a simpler end user configuration, 
while still retaining the correct granularity of configuration checks within 
the code itself. HBASE-4817 is a long standing issue to simplify the end user 
configuration.

 MR jobs are not working if started by a delegated user
 --

 Key: HBASE-9890
 URL: https://issues.apache.org/jira/browse/HBASE-9890
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, 
 HBASE-9890-v1.patch


 If Map-Reduce jobs are started with by a proxy user that has already the 
 delegation tokens, we get an exception on obtain token since the proxy user 
 doesn't have the kerberos auth.
 For example:
  * If we use oozie to execute RowCounter - oozie will get the tokens required 
 (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
 tries to obtain the token, it will get an exception.
  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
 tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
 LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
 token, it will get an exception.
 {code}
  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
 only allowed for Kerberos authenticated clients
 at 
 org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
 {code}
 {code}
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
 can be issued only with kerberos or web authentication
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
   at 
 org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
   at 
 org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
   at 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
   at 
 org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-05 Thread churro morales (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814312#comment-13814312
 ] 

churro morales commented on HBASE-9865:
---

Hi Lars, 

I'm sure at the very least we will be able to apply it to a few nodes in our 
cluster and monitor the how this patch affects garbage collection.  Upon 
gathering results, I will be sure to share.
Cheers


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these

[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Attachment: (was: 9818-v3.txt)

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM


 [ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9865:
-

Attachment: 9865-trunk-v4.txt

Aaaand. Trunk.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 
 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 
 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Attachment: 9818-v3.txt

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt, 9818-v3.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM


 [ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9865:
-

Attachment: 9865-0.94-v4.txt

Updated 0.94 patch.

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 
 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 
 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM


 [ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9865:
-

Fix Version/s: 0.94.14
   0.96.1
   0.98.0

 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM
 

 Key: HBASE-9865
 URL: https://issues.apache.org/jira/browse/HBASE-9865
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5, 0.95.0
Reporter: churro morales
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: 9865-0.94-v2.txt, 9865-0.94-v4.txt, 9865-sample-1.txt, 
 9865-sample.txt, 9865-trunk-v2.txt, 9865-trunk-v3.txt, 9865-trunk-v4.txt, 
 9865-trunk.txt


 WALEdit.heapSize() is incorrect in certain replication scenarios which may 
 cause RegionServers to go OOM.
 A little background on this issue.  We noticed that our source replication 
 regionservers would get into gc storms and sometimes even OOM. 
 We noticed a case where it showed that there were around 25k WALEdits to 
 replicate, each one with an ArrayList of KeyValues.  The array list had a 
 capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
 entries.
 When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
 WALEdit it removes all kv's that are scoped other than local.  
 But in doing so we don't account for the capacity of the ArrayList when 
 determining heapSize for a WALEdit.  The logic for shipping a batch is 
 whether you have hit a size capacity or number of entries capacity.  
 Therefore if have a WALEdit with 25k entries and suppose all are removed: 
 The size of the arrayList is 0 (we don't even count the collection's heap 
 size currently) but the capacity is ignored.
 This will yield a heapSize() of 0 bytes while in the best case it would be at 
 least 10 bytes (provided you pass initialCapacity and you have 32 bit 
 JVM) 
 I have some ideas on how to address this problem and want to know everyone's 
 thoughts:
 1. We use a probabalistic counter such as HyperLogLog and create something 
 like:
   * class CapacityEstimateArrayList implements ArrayList
   ** this class overrides all additive methods to update the 
 probabalistic counts
   ** it includes one additional method called estimateCapacity 
 (we would take estimateCapacity - size() and fill in sizes for all references)
   * Then we can do something like this in WALEdit.heapSize:
   
 {code}
   public long heapSize() {
 long ret = ClassSize.ARRAYLIST;
 for (KeyValue kv : kvs) {
   ret += kv.heapSize();
 }
 long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
 ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
 if (scopes != null) {
   ret += ClassSize.TREEMAP;
   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
   // TODO this isn't quite right, need help here
 }
 return ret;
   }   
 {code}
 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
 array originally, and we provide some percentage threshold.  When that 
 threshold is met (50% of the entries have been removed) we can call 
 kvs.trimToSize()
 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
 shoot me for this) to grab the actual capacity of the list.  Doing something 
 like this:
 {code}
 public int getArrayListCapacity()  {
 try {
   Field f = ArrayList.class.getDeclaredField(elementData);
   f.setAccessible(true);
   return ((Object[]) f.get(kvs)).length;
 } catch (Exception e) {
   log.warn(Exception in trying to get capacity on ArrayList, e);
   return kvs.size();
 }
 {code}
 I am partial to (1) using HyperLogLog and creating a 
 CapacityEstimateArrayList, this is reusable throughout the code for other 
 classes that implement HeapSize which contains ArrayLists.  The memory 
 footprint is very small and it is very fast.  The issue is that this is an 
 estimate, although we can configure the precision we most likely always be 
 conservative.  The estimateCapacity will always be less than the 
 actualCapacity, but it will be close. I think that putting the logic in 
 removeNonReplicableEdits will work, but this only solves the heapSize problem 
 in this particular scenario.  Solution 3 is slow and horrible but that gives 
 us the exact answer.
 I would love to hear if anyone else has any other ideas on how to remedy this 
 problem?  I have code for trunk and 0.94 for all 3 ideas and can provide a 
 patch if the community thinks any of these approaches is a viable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9857) Blockcache prefetch for HFile V3

[
https://issues.apache.org/jira/browse/HBASE-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814352#comment-13814352
]

Andrew Purtell commented on HBASE-9857:
---

Thanks for looking at the patch [~ndimiduk].

bq. don't see why it's limited to HFileV3. Can it be made a general feature

I put the preload logic into the v3 reader because v3 is 'experimental'. Could
trivially go into the v2 reader instead.

bq. I think it could be smart about loading the blocks, load either
sequentially or over a random distribution until the cache is full

Files to be preloaded are queued and scheduled to be handled by a small
threadpool. When a thread picks up work for a file, the blocks are loaded
sequentially using a non-pread scanner from offset 0 to the end of the index.

By random did you mean randomly select work from the file queue?

bq. The until full part seems tricky as eviction detection isn't very
straight-forward

Right. If we had it, I could make use of it.

Blockcache prefetch for HFile V3

Key: HBASE-9857
URL: https://issues.apache.org/jira/browse/HBASE-9857
Project: HBase
Issue Type: Improvement
Reporter: Andrew Purtell
Priority: Minor
Attachments: 9857.patch

Attached patch implements a prefetching function for HFile (v3) blocks, if
indicated by a column family or regionserver property. The purpose of this
change is to as rapidly after region open as reasonable warm the blockcache
with all the data and index blocks of (presumably also in-memory) table data,
without counting those block loads as cache misses. Great for fast reads and
keeping the cache hit ratio high. Can tune the IO impact versus time until
all data blocks are in cache. Works a bit like CompactSplitThread. Makes some
effort not to stampede.
I have been using this for setting up various experiments and thought I'd
polish it up a bit and throw it out there. If the data to be preloaded will
not fit in blockcache, or if as a percentage of blockcache it is large, this
is not a good idea, will just blow out the cache and trigger a lot of useless
GC activity. Might be useful as an expert tuning option though. Or not.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HBASE-9897) Clean up some security configuration checks in LoadIncrementalHFiles

Gary Helmling created HBASE-9897:


 Summary: Clean up some security configuration checks in 
LoadIncrementalHFiles
 Key: HBASE-9897
 URL: https://issues.apache.org/jira/browse/HBASE-9897
 Project: HBase
  Issue Type: Task
  Components: security
Reporter: Gary Helmling


In LoadIncrementalHFiles, use of SecureBulkLoadClient is conditioned on 
UserProvider.isHBaseSecurityEnabled() in a couple of places.  However, use of 
secure bulk loading seems to be required more by use of HDFS secure 
authentication, instead of HBase secure authentication.  It should be possible 
to use secure bulk loading, as long as SecureBulkLoadEndpoint is loaded, and 
HDFS secure authentication is enabled, regardless of the HBase authentication 
configuration.

In addition, SecureBulkLoadEndpoint does a direct check on permissions by 
referencing AccessController loaded on the same region, i.e.:
{code}
  getAccessController().prePrepareBulkLoad(env);
{code}

It seems like this will throw an NPE if AccessController is not configured.  We 
need an additional null check to handle this case gracefully.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user


[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814363#comment-13814363
 ] 

Gary Helmling commented on HBASE-9890:
--

[~mbertozzi] I created HBASE-9897 to handle any additional 
LoadIncrementalHFiles changes separately.  It seemed to be expanding the scope 
of this issue.

 MR jobs are not working if started by a delegated user
 --

 Key: HBASE-9890
 URL: https://issues.apache.org/jira/browse/HBASE-9890
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, 
 HBASE-9890-v1.patch


 If Map-Reduce jobs are started with by a proxy user that has already the 
 delegation tokens, we get an exception on obtain token since the proxy user 
 doesn't have the kerberos auth.
 For example:
  * If we use oozie to execute RowCounter - oozie will get the tokens required 
 (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
 tries to obtain the token, it will get an exception.
  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
 tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
 LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
 token, it will get an exception.
 {code}
  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
 only allowed for Kerberos authenticated clients
 at 
 org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
 {code}
 {code}
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
 can be issued only with kerberos or web authentication
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
   at 
 org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
   at 
 org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
   at 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
   at 
 org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user


[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814378#comment-13814378
 ] 

Gary Helmling commented on HBASE-9890:
--

[~mbertozzi] somehow I missed your earlier comment that you would handle any 
secure bulk loading changes separately.  Feel free to close my issue as a dupe 
if you've already opened one.

+1 on the v1 patch.

 MR jobs are not working if started by a delegated user
 --

 Key: HBASE-9890
 URL: https://issues.apache.org/jira/browse/HBASE-9890
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-v0.patch, 
 HBASE-9890-v1.patch


 If Map-Reduce jobs are started with by a proxy user that has already the 
 delegation tokens, we get an exception on obtain token since the proxy user 
 doesn't have the kerberos auth.
 For example:
  * If we use oozie to execute RowCounter - oozie will get the tokens required 
 (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
 tries to obtain the token, it will get an exception.
  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
 tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
 LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
 token, it will get an exception.
 {code}
  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
 only allowed for Kerberos authenticated clients
 at 
 org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
 {code}
 {code}
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
 can be issued only with kerberos or web authentication
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
   at 
 org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
   at 
 org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
   at 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
   at 
 org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814416#comment-13814416
 ] 

Sergey Shelukhin commented on HBASE-9818:
-

is retrying intentional? probably we should find root cause and not just retry.

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt, 9818-v3.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HBASE-9870) HFileDataBlockEncoderImpl#diskToCacheFormat uses wrong format

2013-11-05 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814420#comment-13814420
 ] 

Jimmy Xiang commented on HBASE-9870:


In BlockCacheKey, we do have the encoding format. However, the equal method 
doesn't check the encoding format, which may be interesting.

 HFileDataBlockEncoderImpl#diskToCacheFormat uses wrong format
 -

 Key: HBASE-9870
 URL: https://issues.apache.org/jira/browse/HBASE-9870
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 In this method, we have
 {code}
 if (block.getBlockType() == BlockType.ENCODED_DATA) {
   if (block.getDataBlockEncodingId() == onDisk.getId()) {
 // The block is already in the desired in-cache encoding.
 return block;
   }
 {code}
 This assumes onDisk encoding is the same as that of inCache.  This is not 
 true when we change the encoding of a CF.  This could be one of the reasons I 
 got data loss with online encoding change?
 If I make sure onDisk == inCache all the time, my ITBLL with online encoding 
 change worked once for me.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814417#comment-13814417
 ] 

Sergey Shelukhin commented on HBASE-9818:
-

logging lgtm. What does it say when it fails

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt, 9818-v3.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814428#comment-13814428
 ] 

Sergey Shelukhin commented on HBASE-9818:
-

actually, I looked at it, encapsulating is not such a good idea (returning 
boolean and stream together), it will not remove the race with close unless a 
common lock is added. So root cause might be elsewhere...

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt, 9818-v3.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at

[jira] [Created] (HBASE-9898) Have a way to set a different default compression on HCD

2013-11-05 Thread Jean-Daniel Cryans (JIRA)

Jean-Daniel Cryans created HBASE-9898:
-

 Summary: Have a way to set a different default compression on HCD
 Key: HBASE-9898
 URL: https://issues.apache.org/jira/browse/HBASE-9898
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
 Fix For: 0.98.0


I was exploring if there would be a nice way to set the compression by default 
to a different algorithm but I didn't find any that I can implement right now, 
dumping my ideas so that others can chime in.

I think the best place to take it into account would be on the master's side. 
Basically you run a check when creating a new table to see if compression 
wasn't set, and if so then set it to the new default. The important thing is 
you don't want to replace NONE, because that might be the user's goal to set it 
like that.

The main problem is that the normal HCD constructor calls the deprecated 
constructor that sets most of the properties to their defaults, including 
compression, which means that it will always be NONE instead of null.

It appears that this constructor has been deprecated since February 2012 
(https://github.com/apache/hbase/blame/0.94/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java#L292)
 so maybe we can remove it in the next major version and make our life easier?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

Sergey Shelukhin created HBASE-9899:
---

 Summary: for idempotent operation dups, return the result instead 
of throwing conflict exception
 Key: HBASE-9899
 URL: https://issues.apache.org/jira/browse/HBASE-9899
 Project: HBase
  Issue Type: Improvement
Reporter: Sergey Shelukhin


After HBASE-3787, we could store mvcc in operation context, and use it to 
convert the modification request into read on dups instead of throwing 
OperationConflictException.
MVCC tracking will have to be aware of such MVCC numbers present. Given that 
scanners are usually relatively short-lived, that would prevent low watermark 
from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814458#comment-13814458
 ] 

Sergey Shelukhin commented on HBASE-3787:
-

btw, the patch is ready to review. I filed HBASE-9899 for follow-up work

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.4, 0.95.2
Reporter: dhruba borthakur
Assignee: Sergey Shelukhin
Priority: Blocker
 Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, 
 HBASE-3787-v1.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch, 
 HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch, 
 HBASE-3787-v6.patch, HBASE-3787-v7.patch, HBASE-3787-v8.patch


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs


[ 
https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814493#comment-13814493
 ] 

Hudson commented on HBASE-9863:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #827 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/827/])
HBASE-9863 Intermittently 
TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs 
(tedyu: rev 1539129)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableNamespaceManager.java


 Intermittently 
 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
 ---

 Key: HBASE-9863
 URL: https://issues.apache.org/jira/browse/HBASE-9863
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.98.0

 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt, 9863-v4.txt, 
 9863-v5.txt, 9863-v6.txt


 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes 
 hung.
 Here were two recent occurrences:
 https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console
 https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console
 There were 9 occurrences of the following in both stack traces:
 {code}
 FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 
 waiting for monitor entry [0x6fdf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250)
   - waiting to lock 0x7f69b5f0 (a 
 org.apache.hadoop.hbase.master.TableNamespaceManager)
   at 
 org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146)
   at 
 org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
 {code}
 The test hung here:
 {code}
 pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() 
 [0x74efe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436)
   - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485)
   at 
 org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()


[ 
https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814505#comment-13814505
 ] 

Liang Xie commented on HBASE-9894:
--

[~saint@gmail.com], yes
[~lhofhansl], totally agreed with you, i also doesn't understand why he enabled 
it:)  anyway, let's remove it, it should be encounterd even in a debug/dev env 
with -ea possible right? we shouldn't abort the whole server instance in this 
case.

 remove the inappropriate assert statement in Store.getSplitPoint()
 --

 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.6, 0.94.12
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HBase-9894-0.94.txt


 One of my friend encountered a RS abort issue frequently during loading data. 
 Here is the log stack:
 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
 server gdc-dn49-formal.i.nease.net,60020,138320
 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
 java.lang.AssertionError: getSplitPoint() called on a region that can't split!
 at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
 at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-8541) implement flush-into-stripes in stripe compactions


 [ 
https://issues.apache.org/jira/browse/HBASE-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-8541:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

in trunk

 implement flush-into-stripes in stripe compactions
 --

 Key: HBASE-8541
 URL: https://issues.apache.org/jira/browse/HBASE-8541
 Project: HBase
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8541-latest-with-dependencies.patch, 
 HBASE-8541-latest-with-dependencies.patch, 
 HBASE-8541-latest-with-dependencies.patch, 
 HBASE-8541-latest-with-dependencies.patch, HBASE-8541-v0.patch, 
 HBASE-8541-v1.patch, HBASE-8541-v2.patch, HBASE-8541-v3.patch, 
 HBASE-8541-v4.patch, HBASE-8541-v5.patch


 Flush will be able to flush into multiple files under this design, avoiding 
 L0 I/O amplification.
 I have the patch which is missing just one feature - support for concurrent 
 flushes and stripe changes. This can be done via extensive try-locking of 
 stripe changes and flushes, or advisory flags without blocking flushes, 
 dumping conflicting flushes into L0 in case of (very rare) collisions. For 
 file loading for the latter, a set-cover-like problem needs to be solved to 
 determine optimal stripes. That will also address Jimmy's concern of getting 
 rid of metadata, btw. However currently I don't have time for that. I plan to 
 attach the try-locking patch first, but this won't happen for a couple weeks 
 probably and should not block main reviews. Hopefully this will be added on 
 top of main reviews.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()


 [ 
https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9894:
-

Fix Version/s: 0.94.13

 remove the inappropriate assert statement in Store.getSplitPoint()
 --

 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.6, 0.94.12
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Fix For: 0.94.13

 Attachments: HBase-9894-0.94.txt


 One of my friend encountered a RS abort issue frequently during loading data. 
 Here is the log stack:
 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
 server gdc-dn49-formal.i.nease.net,60020,138320
 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
 java.lang.AssertionError: getSplitPoint() called on a region that can't split!
 at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
 at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (HBASE-7967) implement compactor for stripe compactions


 [ 
https://issues.apache.org/jira/browse/HBASE-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HBASE-7967.
-

Resolution: Fixed

addendum was in long ago

 implement compactor for stripe compactions
 --

 Key: HBASE-7967
 URL: https://issues.apache.org/jira/browse/HBASE-7967
 Project: HBase
  Issue Type: Sub-task
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.98.0

 Attachments: HBASE-7967-javadoc-addendum.patch, 
 HBASE-7967-latest-with-dependencies.patch, 
 HBASE-7967-latest-with-dependencies.patch, 
 HBASE-7967-latest-with-dependencies.patch, 
 HBASE-7967-latest-with-dependencies.patch, 
 HBASE-7967-latest-with-dependencies.patch, 
 HBASE-7967-latest-with-dependencies.patch, 
 HBASE-7967-latest-with-dependencies.patch, HBASE-7967-v0.patch, 
 HBASE-7967-v1.patch, HBASE-7967-v10.patch, HBASE-7967-v11.patch, 
 HBASE-7967-v2.patch, HBASE-7967-v3.patch, HBASE-7967-v4.patch, 
 HBASE-7967-v5.patch, HBASE-7967-v6.patch, HBASE-7967-v7.patch, 
 HBASE-7967-v7.patch, HBASE-7967-v7.patch, HBASE-7967-v8.patch, 
 HBASE-7967-v9.patch


 Compactor needs to be implemented. See details in parent and blocking jira.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Work started] (HBASE-9854) initial documentation for stripe compactions


 [ 
https://issues.apache.org/jira/browse/HBASE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-9854 started by Sergey Shelukhin.

 initial documentation for stripe compactions
 

 Key: HBASE-9854
 URL: https://issues.apache.org/jira/browse/HBASE-9854
 Project: HBase
  Issue Type: Sub-task
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 Initial documentation for stripe compactions (distill from attached docs, 
 make up to date, put somewhere like book)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9890) MR jobs are not working if started by a delegated user

2013-11-05 Thread Matteo Bertozzi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-9890:
---

Attachment: HBASE-9890-94-v1.patch

In 94 I have to use the HBASE_AUTH_TOKEN string because 
AuthenticationTokenIdentifier.AUTH_TOKEN_TYPE requires -Psecurity

 MR jobs are not working if started by a delegated user
 --

 Key: HBASE-9890
 URL: https://issues.apache.org/jira/browse/HBASE-9890
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, 
 HBASE-9890-v0.patch, HBASE-9890-v1.patch


 If Map-Reduce jobs are started with by a proxy user that has already the 
 delegation tokens, we get an exception on obtain token since the proxy user 
 doesn't have the kerberos auth.
 For example:
  * If we use oozie to execute RowCounter - oozie will get the tokens required 
 (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
 tries to obtain the token, it will get an exception.
  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
 tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
 LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
 token, it will get an exception.
 {code}
  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
 only allowed for Kerberos authenticated clients
 at 
 org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
 {code}
 {code}
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
 can be issued only with kerberos or web authentication
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
   at 
 org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
   at 
 org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
   at 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
   at 
 org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9894) remove the inappropriate assert statement in Store.getSplitPoint()


 [ 
https://issues.apache.org/jira/browse/HBASE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9894:
-

Fix Version/s: (was: 0.94.13)
   0.94.14

 remove the inappropriate assert statement in Store.getSplitPoint()
 --

 Key: HBASE-9894
 URL: https://issues.apache.org/jira/browse/HBASE-9894
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.6, 0.94.12
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Fix For: 0.94.14

 Attachments: HBase-9894-0.94.txt


 One of my friend encountered a RS abort issue frequently during loading data. 
 Here is the log stack:
 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
 server gdc-dn49-formal.i.nease.net,60020,138320
 3883151: Uncaught exception in service thread regionserver60020.cacheFlusher
 java.lang.AssertionError: getSplitPoint() called on a region that can't split!
 at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1926)
 at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:5603)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:415)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:387)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:250)
 at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs


[ 
https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814551#comment-13814551
 ] 

Hudson commented on HBASE-9863:
---

SUCCESS: Integrated in HBase-TRUNK #4669 (See 
[https://builds.apache.org/job/HBase-TRUNK/4669/])
HBASE-9863 Intermittently 
TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs 
(tedyu: rev 1539129)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableNamespaceManager.java


 Intermittently 
 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs
 ---

 Key: HBASE-9863
 URL: https://issues.apache.org/jira/browse/HBASE-9863
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.98.0

 Attachments: 9863-v1.txt, 9863-v2.txt, 9863-v3.txt, 9863-v4.txt, 
 9863-v5.txt, 9863-v6.txt


 TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes 
 hung.
 Here were two recent occurrences:
 https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console
 https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console
 There were 9 occurrences of the following in both stack traces:
 {code}
 FifoRpcScheduler.handler1-thread-5 daemon prio=10 tid=0x09df8800 nid=0xc17 
 waiting for monitor entry [0x6fdf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250)
   - waiting to lock 0x7f69b5f0 (a 
 org.apache.hadoop.hbase.master.TableNamespaceManager)
   at 
 org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146)
   at 
 org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743)
   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
 {code}
 The test hung here:
 {code}
 pool-1-thread-1 prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() 
 [0x74efe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436)
   - locked 0xcc848348 (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598)
   at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
   - locked 0x7faa26d0 (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485)
   at 
 org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-7544) Transparent table/CF encryption


 [ 
https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-7544:
--

Attachment: 7544.patch

 Transparent table/CF encryption
 ---

 Key: HBASE-7544
 URL: https://issues.apache.org/jira/browse/HBASE-7544
 Project: HBase
  Issue Type: New Feature
  Components: HFile, io
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 
 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 7544p3.patch, 
 7544p3.patch, 7544p4.patch, historical-7544.patch, historical-7544.pdf, 
 historical-shell.patch


 Introduce transparent encryption of HBase on disk data.
 Depends on a separate contribution of an encryption codec framework to Hadoop 
 core and an AES-NI (native code) codec. This is work done in the context of 
 MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and 
 HDFS parts of it.
 Requirements:
 - Transparent encryption at the CF or table level
 - Protect against all data leakage from files at rest
 - Two-tier key architecture for consistency with best practices for this 
 feature in the RDBMS world
 - Built-in key management
 - Flexible and non-intrusive key rotation
 - Mechanisms not exposed to or modifiable by users
 - Hardware security module integration (via Java KeyStore)
 - HBCK support for transparently encrypted files (+ plugin architecture for 
 HBCK)
 Additional goals:
 - Shell support for administrative functions
 - Avoid performance impact for the null crypto codec case
 - Play nicely with other changes underway: in HFile, block coding, etc.
 We're aiming for rough parity with Oracle's transparent tablespace encryption 
 feature, described in 
 http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf
  as
 {quote}
 “Transparent Data Encryption uses a 2-tier key architecture for flexible and 
 non-intrusive key rotation and least operational and performance impact: Each 
 application table with at least one encrypted column has its own table key, 
 which is applied to all encrypted columns in that table. Equally, each 
 encrypted tablespace has its own tablespace key. Table keys are stored in the 
 data dictionary of the database, while tablespace keys are stored in the 
 header of the tablespace and additionally, the header of each underlying OS 
 file that makes up the tablespace.  Each of these keys is encrypted with the 
 TDE master encryption key, which is stored outside of the database in an 
 external security module: either the Oracle Wallet (a PKCS#12 formatted file 
 that is encrypted using a passphrase supplied either by the designated 
 security administrator or DBA during setup),  or a Hardware Security Module 
 (HSM) device for higher assurance […]”
 {quote}
 Further design details forthcoming in a design document and patch as soon as 
 we have all of the clearances in place.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-7544) Transparent table/CF encryption


 [ 
https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-7544:
--

Status: Patch Available  (was: Open)

Fix a findbug warning and kick off HadoopQA

 Transparent table/CF encryption
 ---

 Key: HBASE-7544
 URL: https://issues.apache.org/jira/browse/HBASE-7544
 Project: HBase
  Issue Type: New Feature
  Components: HFile, io
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 
 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 7544p3.patch, 
 7544p3.patch, 7544p4.patch, historical-7544.patch, historical-7544.pdf, 
 historical-shell.patch


 Introduce transparent encryption of HBase on disk data.
 Depends on a separate contribution of an encryption codec framework to Hadoop 
 core and an AES-NI (native code) codec. This is work done in the context of 
 MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and 
 HDFS parts of it.
 Requirements:
 - Transparent encryption at the CF or table level
 - Protect against all data leakage from files at rest
 - Two-tier key architecture for consistency with best practices for this 
 feature in the RDBMS world
 - Built-in key management
 - Flexible and non-intrusive key rotation
 - Mechanisms not exposed to or modifiable by users
 - Hardware security module integration (via Java KeyStore)
 - HBCK support for transparently encrypted files (+ plugin architecture for 
 HBCK)
 Additional goals:
 - Shell support for administrative functions
 - Avoid performance impact for the null crypto codec case
 - Play nicely with other changes underway: in HFile, block coding, etc.
 We're aiming for rough parity with Oracle's transparent tablespace encryption 
 feature, described in 
 http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf
  as
 {quote}
 “Transparent Data Encryption uses a 2-tier key architecture for flexible and 
 non-intrusive key rotation and least operational and performance impact: Each 
 application table with at least one encrypted column has its own table key, 
 which is applied to all encrypted columns in that table. Equally, each 
 encrypted tablespace has its own tablespace key. Table keys are stored in the 
 data dictionary of the database, while tablespace keys are stored in the 
 header of the tablespace and additionally, the header of each underlying OS 
 file that makes up the tablespace.  Each of these keys is encrypted with the 
 TDE master encryption key, which is stored outside of the database in an 
 external security module: either the Oracle Wallet (a PKCS#12 formatted file 
 that is encrypted using a passphrase supplied either by the designated 
 security administrator or DBA during setup),  or a Hardware Security Module 
 (HSM) device for higher assurance […]”
 {quote}
 Further design details forthcoming in a design document and patch as soon as 
 we have all of the clearances in place.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9874) Append and Increment operation drops Tags

2013-11-05 Thread Anoop Sam John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-9874:
--

  Resolution: Fixed
Release Note: 
During Append/Increment operation, new cells will carry tags from old cell as 
well as tags passed in the cells in Append/Increment. 
A new CP hook is provided which will be called after the new cell is created 
and before it is getting written to memstore/WAL. A user can use this hook to 
possibly change the new cell. The above naive merge of tags may result in 
duplicates. This hook can decide which all tags should go in finally.
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to Trunk.  Thanks for the reviews.

 Append and Increment operation drops Tags
 -

 Key: HBASE-9874
 URL: https://issues.apache.org/jira/browse/HBASE-9874
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.98.0

 Attachments: AccessController.postMutationBeforeWAL.txt, 
 HBASE-9874.patch, HBASE-9874_V2.patch, HBASE-9874_V3.patch


 We should consider tags in the existing cells as well as tags coming in the 
 cells within Increment/Append



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset


 [ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9818:
--

Attachment: 9818-v4.txt

Patch v4 unifies FSDataInputStreamWrapper#getStream() and 
FSDataInputStreamWrapper#shouldUseHBaseChecksum()

The tests, on Linux, have reached iteration #82.

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--

[jira] [Commented] (HBASE-7544) Transparent table/CF encryption

[
https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814581#comment-13814581
]

Hadoop QA commented on HBASE-7544:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12612313/7544.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 82 new
or modified tests.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 5 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 site{color}. The patch appears to cause mvn site goal to
fail.

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestStripeCompactor

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/7742//console

This message is automatically generated.

Transparent table/CF encryption
---

Key: HBASE-7544
URL: https://issues.apache.org/jira/browse/HBASE-7544
Project: HBase
Issue Type: New Feature
Components: HFile, io
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Fix For: 0.98.0

Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch,
7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 7544p3.patch,
7544p3.patch, 7544p4.patch, historical-7544.patch, historical-7544.pdf,
historical-shell.patch

Introduce transparent encryption of HBase on disk data.
Depends on a separate contribution of an encryption codec framework to Hadoop
core and an AES-NI (native code) codec. This is work done in the context of
MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and
HDFS parts of it.
Requirements:
- Transparent encryption at the CF or table level
- Protect against all data leakage from files at rest
- Two-tier key architecture for consistency with best practices for this
feature in the RDBMS world
- Built-in key management
- Flexible and non-intrusive key rotation
- Mechanisms not exposed to or modifiable by users
- Hardware security module integration (via Java KeyStore)
- HBCK support for transparently encrypted files (+ plugin architecture for
HBCK)
Additional goals:
- Shell support for administrative functions
- Avoid performance impact for the null crypto codec case
- Play nicely with other changes underway: in HFile, block coding, etc.
We're aiming for rough parity with Oracle's transparent tablespace encryption
feature, described in
http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf
as
{quote}
“Transparent Data

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset

2013-11-05 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814586#comment-13814586
 ] 

Jimmy Xiang commented on HBASE-9818:


I was wondering how v4 solves the issue. With the patch, if the stream is 
closed somewhere, instead of a NPE, we may get an IOException saying the stream 
is closed. If the stream is not closed somewhere, why is it null? Never 
initialized?

 NPE in HFileBlock#AbstractFSReader#readAtOffset
 ---

 Key: HBASE-9818
 URL: https://issues.apache.org/jira/browse/HBASE-9818
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Ted Yu
 Attachments: 9818-v2.txt, 9818-v3.txt, 9818-v4.txt


 HFileBlock#istream seems to be null.  I was wondering should we hide 
 FSDataInputStreamWrapper#useHBaseChecksum.
 By the way, this happened when online schema change is enabled (encoding)
 {noformat}
 2013-10-22 10:58:43,321 ERROR [RpcServer.handler=28,port=36020] 
 regionserver.HRegionServer:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1200)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1436)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1318)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:359)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:503)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:245)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:166)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:361)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:293)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:258)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:476)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3616)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3494)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3485)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3079)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
 at java.lang.Thread.run(Thread.java:724)
 2013-10-22 10:58:43,665 ERROR [RpcServer.handler=23,port=36020] 
 regionserver.HRegionServer:
 org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected 
 nextCallSeq: 53438 But the nextCallSeq got from client: 53437; 
 request=scanner_id: 1252577470624375060 number_of_rows: 100 close_scanner: 
 false next_call_seq: 53437
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3030)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:27022)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1979)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
 at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
 at

[jira] [Updated] (HBASE-7663) [Per-KV security] Visibility labels

2013-11-05 Thread Anoop Sam John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-7663:
--

Attachment: HBASE-7663_V6.patch

Rebased patch for latest trunk.

 [Per-KV security] Visibility labels
 ---

 Key: HBASE-7663
 URL: https://issues.apache.org/jira/browse/HBASE-7663
 Project: HBase
  Issue Type: Sub-task
  Components: Coprocessors, security
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Anoop Sam John
 Fix For: 0.98.0

 Attachments: HBASE-7663.patch, HBASE-7663_V2.patch, 
 HBASE-7663_V3.patch, HBASE-7663_V4.patch, HBASE-7663_V5.patch, 
 HBASE-7663_V6.patch


 Implement Accumulo-style visibility labels. Consider the following design 
 principles:
 - Coprocessor based implementation
 - Minimal to no changes to core code
 - Use KeyValue tags (HBASE-7448) to carry labels
 - Use OperationWithAttributes# {get,set}Attribute for handling visibility 
 labels in the API
 - Implement a new filter for evaluating visibility labels as KVs are streamed 
 through.
 This approach would be consistent in deployment and API details with other 
 per-KV security work, supporting environments where they might be both be 
 employed, even stacked on some tables.
 See the parent issue for more discussion.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user

2013-11-05 Thread Francis Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814592#comment-13814592
 ] 

Francis Liu commented on HBASE-9890:


Sorry late to the party here. Went through the patch looks good. 

We should probably address the case where we're talking to more than one hbase 
cluster hence more than one hbase DT token. We should probably support the 
mechanism hbase provided via QUORUM_ADDRESS. As well as oozie outright retrieve 
a bunch of hbase delegation tokens and us just making sure that gets passed 
onto the job.

 MR jobs are not working if started by a delegated user
 --

 Key: HBASE-9890
 URL: https://issues.apache.org/jira/browse/HBASE-9890
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, 
 HBASE-9890-v0.patch, HBASE-9890-v1.patch


 If Map-Reduce jobs are started with by a proxy user that has already the 
 delegation tokens, we get an exception on obtain token since the proxy user 
 doesn't have the kerberos auth.
 For example:
  * If we use oozie to execute RowCounter - oozie will get the tokens required 
 (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
 tries to obtain the token, it will get an exception.
  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
 tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
 LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
 token, it will get an exception.
 {code}
  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
 only allowed for Kerberos authenticated clients
 at 
 org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
 {code}
 {code}
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
 can be issued only with kerberos or web authentication
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
   at 
 org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
   at 
 org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
   at 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
   at 
 org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9890) MR jobs are not working if started by a delegated user

2013-11-05 Thread Francis Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814594#comment-13814594
 ] 

Francis Liu commented on HBASE-9890:


To answer the question why I chose SecureBulkLoad to be keyed on 
isHBaseSecurity enabled. It was mainly because I wanted to keep things simple, 
I was under the assumption that most would choose to secure the entire stack or 
secure none. For the non-secure hbase + secure hdfs, I'd expect the user to 
just run chmod 777 before calling LoadIncrementalHFiles. Having said that it 
prolly won't hurt as a convenience to the user if we key it on 
isHadoopSecurityEnabled.

 MR jobs are not working if started by a delegated user
 --

 Key: HBASE-9890
 URL: https://issues.apache.org/jira/browse/HBASE-9890
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, security
Affects Versions: 0.98.0, 0.94.12, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: HBASE-9890-94-v0.patch, HBASE-9890-94-v1.patch, 
 HBASE-9890-v0.patch, HBASE-9890-v1.patch


 If Map-Reduce jobs are started with by a proxy user that has already the 
 delegation tokens, we get an exception on obtain token since the proxy user 
 doesn't have the kerberos auth.
 For example:
  * If we use oozie to execute RowCounter - oozie will get the tokens required 
 (HBASE_AUTH_TOKEN) and it will start the RowCounter. Once the RowCounter 
 tries to obtain the token, it will get an exception.
  * If we use oozie to execute LoadIncrementalHFiles - oozie will get the 
 tokens required (HDFS_DELEGATION_TOKEN) and it will start the 
 LoadIncrementalHFiles. Once the LoadIncrementalHFiles tries to obtain the 
 token, it will get an exception.
 {code}
  org.apache.hadoop.hbase.security.AccessDeniedException: Token generation 
 only allowed for Kerberos authenticated clients
 at 
 org.apache.hadoop.hbase.security.token.TokenProvider.getAuthenticationToken(TokenProvider.java:87)
 {code}
 {code}
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
 can be issued only with kerberos or web authentication
   at 
 org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:783)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:868)
   at 
 org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:509)
   at 
 org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
   at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
   at 
 org.apache.hadoop.filecache.TrackerDistributedCacheManager.getDelegationTokens(TrackerDistributedCacheManager.java:949)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:854)
   at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
   at 
 org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:173)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-11-05 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814608#comment-13814608
 ] 

chunhui shen commented on HBASE-9000:
-

[~stepinto]
I understand the scenario which the patch is used for.
Should we do the same thing in StoreFileScanner?  
If so, why not do this in StoreScanner, for example, call next some times 
before call reseek...

As personal view, such a action seems a little rude

+0 from me


 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-7544) Transparent table/CF encryption


 [ 
https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-7544:
--

Status: Patch Available  (was: Open)

Remove an unwanted change in TestStripeCompactor and resubmit.

Checked the FindBugs report, and locally prior to patch submission, and didn't 
see new items on account of this patch.

 Transparent table/CF encryption
 ---

 Key: HBASE-7544
 URL: https://issues.apache.org/jira/browse/HBASE-7544
 Project: HBase
  Issue Type: New Feature
  Components: HFile, io
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 
 7544.patch, 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 
 7544p3.patch, 7544p3.patch, 7544p4.patch, historical-7544.patch, 
 historical-7544.pdf, historical-shell.patch


 Introduce transparent encryption of HBase on disk data.
 Depends on a separate contribution of an encryption codec framework to Hadoop 
 core and an AES-NI (native code) codec. This is work done in the context of 
 MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and 
 HDFS parts of it.
 Requirements:
 - Transparent encryption at the CF or table level
 - Protect against all data leakage from files at rest
 - Two-tier key architecture for consistency with best practices for this 
 feature in the RDBMS world
 - Built-in key management
 - Flexible and non-intrusive key rotation
 - Mechanisms not exposed to or modifiable by users
 - Hardware security module integration (via Java KeyStore)
 - HBCK support for transparently encrypted files (+ plugin architecture for 
 HBCK)
 Additional goals:
 - Shell support for administrative functions
 - Avoid performance impact for the null crypto codec case
 - Play nicely with other changes underway: in HFile, block coding, etc.
 We're aiming for rough parity with Oracle's transparent tablespace encryption 
 feature, described in 
 http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf
  as
 {quote}
 “Transparent Data Encryption uses a 2-tier key architecture for flexible and 
 non-intrusive key rotation and least operational and performance impact: Each 
 application table with at least one encrypted column has its own table key, 
 which is applied to all encrypted columns in that table. Equally, each 
 encrypted tablespace has its own tablespace key. Table keys are stored in the 
 data dictionary of the database, while tablespace keys are stored in the 
 header of the tablespace and additionally, the header of each underlying OS 
 file that makes up the tablespace.  Each of these keys is encrypted with the 
 TDE master encryption key, which is stored outside of the database in an 
 external security module: either the Oracle Wallet (a PKCS#12 formatted file 
 that is encrypted using a passphrase supplied either by the designated 
 security administrator or DBA during setup),  or a Hardware Security Module 
 (HSM) device for higher assurance […]”
 {quote}
 Further design details forthcoming in a design document and patch as soon as 
 we have all of the clearances in place.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9000) Linear reseek in Memstore

2013-11-05 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814609#comment-13814609
 ] 

chunhui shen commented on HBASE-9000:
-

I'm sorry I have no better idea to optimize performance for this scenariofor

 Linear reseek in Memstore
 -

 Key: HBASE-9000
 URL: https://issues.apache.org/jira/browse/HBASE-9000
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Minor
 Fix For: 0.89-fb

 Attachments: hbase-9000-benchmark-program.patch, 
 hbase-9000-port-fb.patch, hbase-9000.patch


 This is to address the linear reseek in MemStoreScanner. Currently reseek 
 iterates over the kvset and the snapshot linearly by just calling next 
 repeatedly. The new solution is to do this linear seek up to a configurable 
 maximum amount of times then if the seek is not yet complete fall back to 
 logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-7544) Transparent table/CF encryption


 [ 
https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-7544:
--

Status: Open  (was: Patch Available)

 Transparent table/CF encryption
 ---

 Key: HBASE-7544
 URL: https://issues.apache.org/jira/browse/HBASE-7544
 Project: HBase
  Issue Type: New Feature
  Components: HFile, io
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 
 7544.patch, 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 
 7544p3.patch, 7544p3.patch, 7544p4.patch, historical-7544.patch, 
 historical-7544.pdf, historical-shell.patch


 Introduce transparent encryption of HBase on disk data.
 Depends on a separate contribution of an encryption codec framework to Hadoop 
 core and an AES-NI (native code) codec. This is work done in the context of 
 MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and 
 HDFS parts of it.
 Requirements:
 - Transparent encryption at the CF or table level
 - Protect against all data leakage from files at rest
 - Two-tier key architecture for consistency with best practices for this 
 feature in the RDBMS world
 - Built-in key management
 - Flexible and non-intrusive key rotation
 - Mechanisms not exposed to or modifiable by users
 - Hardware security module integration (via Java KeyStore)
 - HBCK support for transparently encrypted files (+ plugin architecture for 
 HBCK)
 Additional goals:
 - Shell support for administrative functions
 - Avoid performance impact for the null crypto codec case
 - Play nicely with other changes underway: in HFile, block coding, etc.
 We're aiming for rough parity with Oracle's transparent tablespace encryption 
 feature, described in 
 http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf
  as
 {quote}
 “Transparent Data Encryption uses a 2-tier key architecture for flexible and 
 non-intrusive key rotation and least operational and performance impact: Each 
 application table with at least one encrypted column has its own table key, 
 which is applied to all encrypted columns in that table. Equally, each 
 encrypted tablespace has its own tablespace key. Table keys are stored in the 
 data dictionary of the database, while tablespace keys are stored in the 
 header of the tablespace and additionally, the header of each underlying OS 
 file that makes up the tablespace.  Each of these keys is encrypted with the 
 TDE master encryption key, which is stored outside of the database in an 
 external security module: either the Oracle Wallet (a PKCS#12 formatted file 
 that is encrypted using a passphrase supplied either by the designated 
 security administrator or DBA during setup),  or a Hardware Security Module 
 (HSM) device for higher assurance […]”
 {quote}
 Further design details forthcoming in a design document and patch as soon as 
 we have all of the clearances in place.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HBASE-7544) Transparent table/CF encryption


 [ 
https://issues.apache.org/jira/browse/HBASE-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-7544:
--

Attachment: 7544.patch

 Transparent table/CF encryption
 ---

 Key: HBASE-7544
 URL: https://issues.apache.org/jira/browse/HBASE-7544
 Project: HBase
  Issue Type: New Feature
  Components: HFile, io
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 7544.patch, 7544.patch, 7544.patch, 7544.patch, 
 7544.patch, 7544p1.patch, 7544p1.patch, 7544p2.patch, 7544p2.patch, 
 7544p3.patch, 7544p3.patch, 7544p4.patch, historical-7544.patch, 
 historical-7544.pdf, historical-shell.patch


 Introduce transparent encryption of HBase on disk data.
 Depends on a separate contribution of an encryption codec framework to Hadoop 
 core and an AES-NI (native code) codec. This is work done in the context of 
 MAPREDUCE-4491 but I'd gather there will be additional JIRAs for common and 
 HDFS parts of it.
 Requirements:
 - Transparent encryption at the CF or table level
 - Protect against all data leakage from files at rest
 - Two-tier key architecture for consistency with best practices for this 
 feature in the RDBMS world
 - Built-in key management
 - Flexible and non-intrusive key rotation
 - Mechanisms not exposed to or modifiable by users
 - Hardware security module integration (via Java KeyStore)
 - HBCK support for transparently encrypted files (+ plugin architecture for 
 HBCK)
 Additional goals:
 - Shell support for administrative functions
 - Avoid performance impact for the null crypto codec case
 - Play nicely with other changes underway: in HFile, block coding, etc.
 We're aiming for rough parity with Oracle's transparent tablespace encryption 
 feature, described in 
 http://www.oracle.com/technetwork/database/owp-security-advanced-security-11gr-133411.pdf
  as
 {quote}
 “Transparent Data Encryption uses a 2-tier key architecture for flexible and 
 non-intrusive key rotation and least operational and performance impact: Each 
 application table with at least one encrypted column has its own table key, 
 which is applied to all encrypted columns in that table. Equally, each 
 encrypted tablespace has its own tablespace key. Table keys are stored in the 
 data dictionary of the database, while tablespace keys are stored in the 
 header of the tablespace and additionally, the header of each underlying OS 
 file that makes up the tablespace.  Each of these keys is encrypted with the 
 TDE master encryption key, which is stored outside of the database in an 
 external security module: either the Oracle Wallet (a PKCS#12 formatted file 
 that is encrypted using a passphrase supplied either by the designated 
 security administrator or DBA during setup),  or a Hardware Security Module 
 (HSM) device for higher assurance […]”
 {quote}
 Further design details forthcoming in a design document and patch as soon as 
 we have all of the clearances in place.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9818) NPE in HFileBlock#AbstractFSReader#readAtOffset