[jira] [Commented] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort

2015-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611580#comment-14611580
 ] 

Hudson commented on HBASE-13895:


FAILURE: Integrated in HBase-TRUNK #6624 (See 
[https://builds.apache.org/job/HBase-TRUNK/6624/])
HBASE-13895 DATALOSS: Region assigned before WAL replay when abort (Enis 
Soztutar) -- REAPPLY (stack: rev 20e855f2824d3d39c13560fedabbd985f3ae5d13)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAbortedException.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerStoppedException.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestWALPlayer.java


 DATALOSS: Region assigned before WAL replay when abort
 --

 Key: HBASE-13895
 URL: https://issues.apache.org/jira/browse/HBASE-13895
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13895.branch-1.2.txt, 13895.master.patch, 
 hbase-13895_addendum-master.patch, hbase-13895_addendum.patch, 
 hbase-13895_v1-branch-1.1.patch


 Opening a place holder till finish analysis.
 I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious 
 culprit is the double-assignment that I can see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13500) Deprecate KVComparator and move to CellComparator

2015-07-02 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611589#comment-14611589
 ] 

Anoop Sam John commented on HBASE-13500:


[~ram_krish]  we can close this main jira now.

 Deprecate KVComparator and move to CellComparator
 -

 Key: HBASE-13500
 URL: https://issues.apache.org/jira/browse/HBASE-13500
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611530#comment-14611530
 ] 

Hadoop QA commented on HBASE-13832:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743215/hbase-13832-v3.patch
  against master branch at commit f0e29c49a1f5f3773ba03b822805d863c149b443.
  ATTACHMENT ID: 12743215

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1898 checkstyle errors (more than the master's current 1897 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn post-site goal 
to fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestProcessBasedCluster
  org.apache.hadoop.hbase.mapreduce.TestImportExport
  org.apache.hadoop.hbase.TestRegionRebalancing
  
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.hadoop.hbase.mapreduce.TestTableSnapshotInputFormat.testInitTableSnapshotMapperJobConfig(TestTableSnapshotInputFormat.java:146)
at 
org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScanBase.testScan(TestTableInputFormatScanBase.java:244)
at 
org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan2.testScanOBBToQPP(TestTableInputFormatScan2.java:57)
at 
org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:133)
at 
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testRegionCrossingRowColBloom(TestLoadIncrementalHFiles.java:142)
at 
org.apache.hadoop.hbase.mapreduce.TestImportExport.testExportScannerBatching(TestImportExport.java:271)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14648//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14648//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14648//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14648//console

This message is automatically generated.

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, 
 hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 

[jira] [Commented] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort

2015-07-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611546#comment-14611546
 ] 

stack commented on HBASE-13895:
---

Ok. Added missing patch and the addendum that fixes failing 
TestAssignmentManagerOnCluster tests. Agree with fix for UT (I love unit tests).

For branch-1+ I applied addendum and checked I got all patch this time.

On branch-2, I applied the original patch plus version of master addendum. I 
made master same as branch-1s. The master addendum makes logic different. Why 
[~enis]? I'll addendum the master is intended. I am talking about this hunk in 
master addendum patch:

{code}
 14 @@ -891,12 +891,16 @@ public class AssignmentManager {
 15  LOG.warn(Server  + server +  region CLOSE RPC returned false 
for  +
 16region.getRegionNameAsString());
 17} catch (Throwable t) {
 18 +long sleepTime = 0;
 19 +Configuration conf = this.server.getConfiguration();
 20  if (t instanceof RemoteException) {
 21t = ((RemoteException)t).unwrapRemoteException();
 22  }
 23 -if (t instanceof NotServingRegionException
 24 +if (t instanceof RegionServerAbortedException
 25  || t instanceof RegionServerStoppedException
 26  || t instanceof ServerNotRunningYetException) {
 27 +
 28 +} else if (t instanceof NotServingRegionException) {
 29LOG.debug(Offline  + region.getRegionNameAsString()
 30  + , it's not any more on  + server, t);
 31regionStates.updateRegionState(region, State.OFFLINE);
{code}

whereas in original patch we have this (set a sleeptime...)

{code}
411 @@ -1866,11 +1867,19 @@ public class AssignmentManager extends 
ZooKeeperListener {
412  LOG.warn(Server  + server +  region CLOSE RPC returned false 
for  +
413region.getRegionNameAsString());
414} catch (Throwable t) {
415 +long sleepTime = 0;
416 +Configuration conf = this.server.getConfiguration();
417  if (t instanceof RemoteException) {
418t = ((RemoteException)t).unwrapRemoteException();
419  }
420  boolean logRetries = true;
421 -if (t instanceof NotServingRegionException
422 +if (t instanceof RegionServerAbortedException) {
423 +  // RS is aborting, we cannot offline the region since the region 
may need to do WAL
424 +  // recovery. Until we see  the RS expiration, we should retry.
425 +  sleepTime = 1 + conf.getInt(RpcClient.FAILED_SERVER_EXPIRY_KEY,
426 +RpcClient.FAILED_SERVER_EXPIRY_DEFAULT);
427 +
428 +} else if (t instanceof NotServingRegionException
429  || t instanceof RegionServerStoppedException
430  || t instanceof ServerNotRunningYetException) {

{code}

Thanks for catching my misapply.

 DATALOSS: Region assigned before WAL replay when abort
 --

 Key: HBASE-13895
 URL: https://issues.apache.org/jira/browse/HBASE-13895
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13895.master.patch, hbase-13895_addendum-master.patch, 
 hbase-13895_addendum.patch, hbase-13895_v1-branch-1.1.patch


 Opening a place holder till finish analysis.
 I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious 
 culprit is the double-assignment that I can see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort

2015-07-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13895:
--
Attachment: 13895.branch-1.2.txt

branch-1.2 needed same change in TestAssignmentManager as branch-1. Applied 
this. Hopefully that is it.

 DATALOSS: Region assigned before WAL replay when abort
 --

 Key: HBASE-13895
 URL: https://issues.apache.org/jira/browse/HBASE-13895
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13895.branch-1.2.txt, 13895.master.patch, 
 hbase-13895_addendum-master.patch, hbase-13895_addendum.patch, 
 hbase-13895_v1-branch-1.1.patch


 Opening a place holder till finish analysis.
 I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious 
 culprit is the double-assignment that I can see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort

2015-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611592#comment-14611592
 ] 

Hudson commented on HBASE-13895:


SUCCESS: Integrated in HBase-1.1 #571 (See 
[https://builds.apache.org/job/HBase-1.1/571/])
HBASE-13895 DATALOSS: Region assigned before WAL replay when abort (Enis 
Soztutar) -- ADDENDUM (stack: rev a9cecf32a99caed3fccd0b6b00aca6d42d7979d3)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAbortedException.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


 DATALOSS: Region assigned before WAL replay when abort
 --

 Key: HBASE-13895
 URL: https://issues.apache.org/jira/browse/HBASE-13895
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13895.branch-1.2.txt, 13895.master.patch, 
 hbase-13895_addendum-master.patch, hbase-13895_addendum.patch, 
 hbase-13895_v1-branch-1.1.patch


 Opening a place holder till finish analysis.
 I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious 
 culprit is the double-assignment that I can see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611559#comment-14611559
 ] 

stack commented on HBASE-13977:
---

What does this mean: Gets the current key in the form of a cell. That there 
is no value returned?

In getKeyAsCell, why bother with a ByteBuffer when all you are doing is passing 
an array?

getKeyAsCell is defined in multiple Interfaces? Can we avoid that?

Here we are creating a Cell every time:

  if (getComparator().compare(splitCell, getKeyAsCell()) = 0) {

Previous we were passing current array, no allocation and no KeyValue creation 
(if I am reading this right). Do we have to do this? Anything we can do better 
here?

Ditto in next hunk of changes.

Otherwise, I like these changes.


 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort

2015-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611541#comment-14611541
 ] 

Hudson commented on HBASE-13895:


FAILURE: Integrated in HBase-TRUNK #6623 (See 
[https://builds.apache.org/job/HBase-TRUNK/6623/])
HBASE-13895 DATALOSS: Region assigned before WAL replay when abort (Enis 
(stack: rev fca725a899984f57a6ad48ce9ae9cbc34e8ce752)
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestWALPlayer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAbortedException.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerStoppedException.java
Revert HBASE-13895 DATALOSS: Region assigned before WAL replay when abort 
(Enis (stack: rev f0e29c49a1f5f3773ba03b822805d863c149b443)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestWALPlayer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerStoppedException.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAbortedException.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java


 DATALOSS: Region assigned before WAL replay when abort
 --

 Key: HBASE-13895
 URL: https://issues.apache.org/jira/browse/HBASE-13895
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13895.master.patch, hbase-13895_addendum-master.patch, 
 hbase-13895_addendum.patch, hbase-13895_v1-branch-1.1.patch


 Opening a place holder till finish analysis.
 I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious 
 culprit is the double-assignment that I can see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14010) TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced

2015-07-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14010:
--
Attachment: 14010.txt

Passed twice. Try a third time.


 TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster 
 not balanced
 -

 Key: HBASE-14010
 URL: https://issues.apache.org/jira/browse/HBASE-14010
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Attachments: 14010.txt, 14010.txt, 14010.txt


 java.lang.AssertionError: null
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144)
 from recent build 
 https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/14639/testReport/junit/org.apache.hadoop.hbase/TestRegionRebalancing/testRebalanceOnRegionServerNumberChange_0_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13998) Remove CellComparator#compareRows(byte[] left, int loffset, int llength, byte[] right, int roffset, int rlength)

2015-07-02 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-13998:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Added the code comments in MetaCache.
Thanks for the reviews Ram and Stack.

 Remove CellComparator#compareRows(byte[] left, int loffset, int llength, 
 byte[] right, int roffset, int rlength)
 

 Key: HBASE-13998
 URL: https://issues.apache.org/jira/browse/HBASE-13998
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13998.patch


 A public API in CellComparator which takes old style byte[], offset, length 
 alone is not correct.  CellComparator supposed to compare cell(s).  At least 
 one side param has to be a cell.. This is the agreement we discussed in 
 HBASE-10800.  Still we could not remove the above one method because it was 
 getting used from multiple places.  Now most of the usage is removed.  This 
 jira aims at removing it fully and replace the usage with other APIs.
 Note: The CellComparator is added in 2.0 only so removing the public API is 
 not creating any BC issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13998) Remove CellComparator#compareRows(byte[] left, int loffset, int llength, byte[] right, int roffset, int rlength)

2015-07-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611581#comment-14611581
 ] 

Hudson commented on HBASE-13998:


FAILURE: Integrated in HBase-TRUNK #6624 (See 
[https://builds.apache.org/job/HBase-TRUNK/6624/])
HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int 
llength, byte[] right, int roffset, int rlength). (anoopsamjohn: rev 
62f56944919b436036dcac740d8a21c56289a164)
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetaCache.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/SyncTable.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestClientNoCluster.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StripeStoreFileManager.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellComparator.java


 Remove CellComparator#compareRows(byte[] left, int loffset, int llength, 
 byte[] right, int roffset, int rlength)
 

 Key: HBASE-13998
 URL: https://issues.apache.org/jira/browse/HBASE-13998
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13998.patch


 A public API in CellComparator which takes old style byte[], offset, length 
 alone is not correct.  CellComparator supposed to compare cell(s).  At least 
 one side param has to be a cell.. This is the agreement we discussed in 
 HBASE-10800.  Still we could not remove the above one method because it was 
 getting used from multiple places.  Now most of the usage is removed.  This 
 jira aims at removing it fully and replace the usage with other APIs.
 Note: The CellComparator is added in 2.0 only so removing the public API is 
 not creating any BC issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14010) TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced

2015-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611608#comment-14611608
 ] 

Hadoop QA commented on HBASE-14010:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743230/14010.txt
  against master branch at commit f0e29c49a1f5f3773ba03b822805d863c149b443.
  ATTACHMENT ID: 12743230

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14649//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14649//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14649//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14649//console

This message is automatically generated.

 TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster 
 not balanced
 -

 Key: HBASE-14010
 URL: https://issues.apache.org/jira/browse/HBASE-14010
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Attachments: 14010.txt, 14010.txt


 java.lang.AssertionError: null
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144)
 from recent build 
 https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/14639/testReport/junit/org.apache.hadoop.hbase/TestRegionRebalancing/testRebalanceOnRegionServerNumberChange_0_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort

2015-07-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-13895:
--
Attachment: 13895.master.addendum2.txt

A fix for master compile failure. So, on master, we did revert, then applied 
new patch... and then added his addendum.

 DATALOSS: Region assigned before WAL replay when abort
 --

 Key: HBASE-13895
 URL: https://issues.apache.org/jira/browse/HBASE-13895
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13895.branch-1.2.txt, 13895.master.addendum2.txt, 
 13895.master.patch, hbase-13895_addendum-master.patch, 
 hbase-13895_addendum.patch, hbase-13895_v1-branch-1.1.patch


 Opening a place holder till finish analysis.
 I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious 
 culprit is the double-assignment that I can see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14008) REST - Throw an appropriate error during schema POST

2015-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611630#comment-14611630
 ] 

Hadoop QA commented on HBASE-14008:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743241/14008.patch
  against master branch at commit f0e29c49a1f5f3773ba03b822805d863c149b443.
  ATTACHMENT ID: 12743241

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 2 zombie test(s):   
at 
org.apache.hadoop.hbase.filter.TestFilterWithScanLimits.testScanWithLimit(TestFilterWithScanLimits.java:71)
at 
org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd.testEndToEnd(TestFuzzyRowFilterEndToEnd.java:140)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14650//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14650//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14650//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14650//console

This message is automatically generated.

 REST - Throw an appropriate error during schema POST
 

 Key: HBASE-14008
 URL: https://issues.apache.org/jira/browse/HBASE-14008
 Project: HBase
  Issue Type: Bug
  Components: REST
Affects Versions: 0.98.13, 1.1.1
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
Priority: Minor
  Labels: REST
 Fix For: 1.2.0, 1.1.2

 Attachments: 14008.patch, HBASE-14008.patch


 When an update is done on the schema through REST and an error occurs, the 
 actual reason is not thrown back to the client. Right now we get a 
 javax.ws.rs.WebApplicationException instead of the actual error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles

2015-07-02 Thread Francesco MDE (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611718#comment-14611718
 ] 

Francesco MDE commented on HBASE-14005:
---

Apparently this patch was already present in HBASE-8495 but one LOC got lost

 Set permission to .top hfile in LoadIncrementalHFiles
 -

 Key: HBASE-14005
 URL: https://issues.apache.org/jira/browse/HBASE-14005
 Project: HBase
  Issue Type: Bug
Reporter: Francesco MDE
Priority: Trivial
 Attachments: HBASE-14005.patch


 Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611805#comment-14611805
 ] 

ramkrishna.s.vasudevan commented on HBASE-13970:


I have lost my server for some time.  I think you can commit it. It may take 
some more time for me to test. So +1.

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13970-v1.patch, HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13977:
---
Attachment: HBASE-13977_4.patch

Updated patch as per Stack's comments.

 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13977:
---
Status: Patch Available  (was: Open)

 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13977:
---
Status: Open  (was: Patch Available)

 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13986) HMaster instance always returns false for isAborted() check.

2015-07-02 Thread Abhishek Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar updated HBASE-13986:
---
Attachment: HBASE-13986.patch

attaching patch for review that sets the 'abortRequested' to true for master 
instance, so that all master flows using this flag (or isAborted method) gets 
abort flag value set.

 HMaster instance always returns false for isAborted() check.
 

 Key: HBASE-13986
 URL: https://issues.apache.org/jira/browse/HBASE-13986
 Project: HBase
  Issue Type: Bug
Reporter: Abhishek Kumar
Assignee: Abhishek Kumar
Priority: Minor
 Attachments: HBASE-13986.patch


 It seems that HMaster never set abortRequested flag to true as done by 
 HRegionServer in its abort() method.We can see isAborted method being used in 
 few places for HMaster instance (like in HMasterCommandLine.startMaster) 
 where code flow being determined based on the result of isAborted() call.
 We can set this abortRequested flag in Hmaster's abort() method as well like 
 in HRegionServer's abort method, let me know if it seems ok. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13986) HMaster instance always returns false for isAborted() check.

2015-07-02 Thread Abhishek Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar updated HBASE-13986:
---
Status: Patch Available  (was: Open)

 HMaster instance always returns false for isAborted() check.
 

 Key: HBASE-13986
 URL: https://issues.apache.org/jira/browse/HBASE-13986
 Project: HBase
  Issue Type: Bug
Reporter: Abhishek Kumar
Assignee: Abhishek Kumar
Priority: Minor
 Attachments: HBASE-13986.patch


 It seems that HMaster never set abortRequested flag to true as done by 
 HRegionServer in its abort() method.We can see isAborted method being used in 
 few places for HMaster instance (like in HMasterCommandLine.startMaster) 
 where code flow being determined based on the result of isAborted() call.
 We can set this abortRequested flag in Hmaster's abort() method as well like 
 in HRegionServer's abort method, let me know if it seems ok. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles

2015-07-02 Thread Francesco MDE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco MDE updated HBASE-14005:
--
Attachment: HBASE-14005.patch

 Set permission to .top hfile in LoadIncrementalHFiles
 -

 Key: HBASE-14005
 URL: https://issues.apache.org/jira/browse/HBASE-14005
 Project: HBase
  Issue Type: Bug
Reporter: Francesco MDE
Priority: Trivial
 Attachments: HBASE-14005.patch


 Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13977:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HBASE-13500

 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13500) Deprecate KVComparator and move to CellComparator

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611796#comment-14611796
 ] 

ramkrishna.s.vasudevan commented on HBASE-13500:


Will close this after HBASE-13977 is done.

 Deprecate KVComparator and move to CellComparator
 -

 Key: HBASE-13500
 URL: https://issues.apache.org/jira/browse/HBASE-13500
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-07-02 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611801#comment-14611801
 ] 

Anoop Sam John commented on HBASE-13970:


[~Apache9]  You have 2  +1s on this Jira..  It is ready for commit.  

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13970-v1.patch, HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611828#comment-14611828
 ] 

ramkrishna.s.vasudevan commented on HBASE-13977:


Oh yes. I thought you were asking if we really need to do a copy. Sorry about 
that. 

 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles

2015-07-02 Thread Francesco MDE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco MDE updated HBASE-14005:
--
Attachment: (was: HBASE-14005.patch)

 Set permission to .top hfile in LoadIncrementalHFiles
 -

 Key: HBASE-14005
 URL: https://issues.apache.org/jira/browse/HBASE-14005
 Project: HBase
  Issue Type: Bug
Reporter: Francesco MDE
Priority: Trivial

 Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611800#comment-14611800
 ] 

ramkrishna.s.vasudevan commented on HBASE-13977:


I went thro the changes once again
bq.Gets the current key in the form of a cell. That there is no value 
returned?
Yes. only the key part. Earlier it was a BB view of the key - now it is a Cell.
bq.In getKeyAsCell, why bother with a ByteBuffer when all you are doing is 
passing an array?
Ya, you are right. I changed that to just use the System.arrayCopy in 
BufferedDataEncoder.getKeyAsCell.
bq.getKeyAsCell is defined in multiple Interfaces? Can we avoid that?
For the DBE cases I think we cannot do it now because the entire seeker is now 
the BuffereddataEncoder.  So we need some API in the DatablockEncoder to be 
used. May be another JIRA if it is possible?
bq.if (getComparator().compare(splitCell, getKeyAsCell()) = 0) {
Valid point. But previously for creating a BB we were creating a BB object but 
now we are creating a cell every time.  But thinking in terms of 
BufferedBackedCell it would be better if it had been a cell. The current code 
is trying to do 
{code}
- ByteBuffer bb = getKey();
-  if (getComparator().compare(splitCell, bb.array(), bb.arrayOffset(),
-  bb.limit()) = 0) {
{code}
After BufferedBackedcells come - we cannot have it the above way as array() and 
arrayOffset() are not expected to be used.  Hence making it as cell would be 
encapsulate us of this inner detail.
I thought of using the instance level keyOnlyKv in the HFileScannerImpl - but 
since the HalfStorefileReader is trying to cache the firstKey we cannot use 
that instance level variable in HFileScanner and use that to just set the 
byte[] every time.  

 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611806#comment-14611806
 ] 

Anoop Sam John commented on HBASE-13977:


bq.In getKeyAsCell, why bother with a ByteBuffer when all you are doing is 
passing an array?
Same thing I also asked in comments Ram.

bq.No need to go with BB create now.. Directly make the Cell out of 
current.keyBuffer? Do we need to clone that (if so also byte[] copy and create 
KV)?


 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14010) TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced

2015-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611745#comment-14611745
 ] 

Hadoop QA commented on HBASE-14010:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743251/14010.txt
  against master branch at commit 272b025b25fed979da0e59ffd41615bbb9e105ea.
  ATTACHMENT ID: 12743251

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14651//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14651//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14651//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14651//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14651//console

This message is automatically generated.

 TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster 
 not balanced
 -

 Key: HBASE-14010
 URL: https://issues.apache.org/jira/browse/HBASE-14010
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Attachments: 14010.txt, 14010.txt, 14010.txt


 java.lang.AssertionError: null
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144)
 from recent build 
 https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/14639/testReport/junit/org.apache.hadoop.hbase/TestRegionRebalancing/testRebalanceOnRegionServerNumberChange_0_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12224) Facilitate using ByteBuffer backed Cells in the HFileReader

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12224:
---
Summary: Facilitate using ByteBuffer backed Cells in the HFileReader  (was: 
Facilitate using DBBs in the HFileReaders V2 and V3.)

 Facilitate using ByteBuffer backed Cells in the HFileReader
 ---

 Key: HBASE-12224
 URL: https://issues.apache.org/jira/browse/HBASE-12224
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611963#comment-14611963
 ] 

stack commented on HBASE-13977:
---

Ok. +1 if hadoopqa passes and good by [~anoop.hbase]


 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14011) MultiByteBuffer position based reads does not work correctly

2015-07-02 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611866#comment-14611866
 ] 

Anoop Sam John commented on HBASE-14011:


+1

 MultiByteBuffer position based reads does not work correctly
 

 Key: HBASE-14011
 URL: https://issues.apache.org/jira/browse/HBASE-14011
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-14011.patch


 The positional based reads in MBBs are having some issues when we try to read 
 the first element from the 2nd BB when the MBB is formed with multiple BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14011) MultiByteBuffer position based reads does not work correctly

2015-07-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611949#comment-14611949
 ] 

stack commented on HBASE-14011:
---

+1

 MultiByteBuffer position based reads does not work correctly
 

 Key: HBASE-14011
 URL: https://issues.apache.org/jira/browse/HBASE-14011
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-14011.patch


 The positional based reads in MBBs are having some issues when we try to read 
 the first element from the 2nd BB when the MBB is formed with multiple BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14011) MultiByteBuffer position based reads does not work correctly

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-14011:
--

 Summary: MultiByteBuffer position based reads does not work 
correctly
 Key: HBASE-14011
 URL: https://issues.apache.org/jira/browse/HBASE-14011
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0


The positional based reads in MBBs are having some issues when we try to read 
the first element from the 2 MBB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14011) MultiByteBuffer position based reads does not work correctly

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-14011:
---
Status: Patch Available  (was: Open)

 MultiByteBuffer position based reads does not work correctly
 

 Key: HBASE-14011
 URL: https://issues.apache.org/jira/browse/HBASE-14011
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-14011.patch


 The positional based reads in MBBs are having some issues when we try to read 
 the first element from the 2nd BB when the MBB is formed with multiple BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12213) HFileBlock backed by Array of ByteBuffers

2015-07-02 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611862#comment-14611862
 ] 

Anoop Sam John commented on HBASE-12213:


Gone through the patch quickly once.  Some immediate comments
ByteBufferUtils - already added getLong/getInt etc methods..  Named them as 
toInt/toLong to be consistent with Bytes.java..   So avoid..  Also looks like 
some methods moved from one place to another..  All these unwanted changes pls 
avoid.
UnsafeAccess -  Here also methods are there to read int/long etc and making 
use of that in BBUtils.  Also in compare(BB,BB) also making use of this Unsafe 
based way..  Pls avoid the compare kind of logic from this class.

Will do more closer look at other area.. Thanks

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14011) MultiByteBuffer position based reads does not work correctly

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-14011:
---
Attachment: HBASE-14011.patch

Patch with UT.

 MultiByteBuffer position based reads does not work correctly
 

 Key: HBASE-14011
 URL: https://issues.apache.org/jira/browse/HBASE-14011
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-14011.patch


 The positional based reads in MBBs are having some issues when we try to read 
 the first element from the 2 MBB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14011) MultiByteBuffer position based reads does not work correctly

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-14011:
---
Description: The positional based reads in MBBs are having some issues when 
we try to read the first element from the 2nd BB when the MBB is formed with 
multiple BBs.  (was: The positional based reads in MBBs are having some issues 
when we try to read the first element from the 2 MBB.)

 MultiByteBuffer position based reads does not work correctly
 

 Key: HBASE-14011
 URL: https://issues.apache.org/jira/browse/HBASE-14011
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-14011.patch


 The positional based reads in MBBs are having some issues when we try to read 
 the first element from the 2nd BB when the MBB is formed with multiple BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13970) NPE during compaction in trunk

2015-07-02 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-13970:
--
   Resolution: Fixed
 Assignee: Duo Zhang  (was: ramkrishna.s.vasudevan)
 Hadoop Flags: Reviewed
Fix Version/s: (was: 1.0.2)
   Status: Resolved  (was: Patch Available)

Pushed to all branches except branch-1.0(HBASE-8329 has not applied to 
branch-1.0).

Thanks [~anoopsamjohn] and [~ram_krish].

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1
Reporter: ramkrishna.s.vasudevan
Assignee: Duo Zhang
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13970-v1.patch, HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: 

[jira] [Updated] (HBASE-12213) HFileBlock backed by Array of ByteBuffers

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12213:
---
Attachment: HBASE-12213_1.patch

Trying for QA. I got a clean run locally. Let me see what QA bot says.

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14010) TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced

2015-07-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611980#comment-14611980
 ] 

stack commented on HBASE-14010:
---

Can I have a +1 here? It seems to help with the TestRegionBalancing failures.

 TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster 
 not balanced
 -

 Key: HBASE-14010
 URL: https://issues.apache.org/jira/browse/HBASE-14010
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
Assignee: stack
 Attachments: 14010.txt, 14010.txt, 14010.txt


 java.lang.AssertionError: null
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144)
 from recent build 
 https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/14639/testReport/junit/org.apache.hadoop.hbase/TestRegionRebalancing/testRebalanceOnRegionServerNumberChange_0_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12213) HFileBlock backed by Array of ByteBuffers

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12213:
---
Status: Patch Available  (was: Open)

Uses MBB in the read path.  
- Since HBASE-12295 is not there we cannot allow the Bucket Cache to use the 
actual MBB based Blocks without doing copy. For now we have copied the 
Bucketcache and created an MBB out of it.
- the HFileReader uses MBB and the relative reads have been replaced with 
absolute position based reads.  
The absolute position based reads uses the UnsafeAccess APIs. So better to make 
use of them. Tried to do a small micro benchmark mimicing the logic in 
blockSeek with and without positional reads using MBB.
- There are some TODOs that will get changed after BufferBackedCells come in 
to place.  
- PrefixTree, blooms need to be handled to work with MBB that are offheap.  
That can be done later.

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles

2015-07-02 Thread Francesco MDE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco MDE updated HBASE-14005:
--
Status: Open  (was: Patch Available)

 Set permission to .top hfile in LoadIncrementalHFiles
 -

 Key: HBASE-14005
 URL: https://issues.apache.org/jira/browse/HBASE-14005
 Project: HBase
  Issue Type: Bug
Reporter: Francesco MDE
Priority: Trivial
 Attachments: HBASE-14005.patch


 Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles

2015-07-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14005:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch, Francesco

 Set permission to .top hfile in LoadIncrementalHFiles
 -

 Key: HBASE-14005
 URL: https://issues.apache.org/jira/browse/HBASE-14005
 Project: HBase
  Issue Type: Bug
Reporter: Francesco MDE
Assignee: Francesco MDE
Priority: Trivial
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-14005.patch


 Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles

2015-07-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14005:
---
 Assignee: Francesco MDE
 Hadoop Flags: Reviewed
Fix Version/s: 1.3.0
   1.1.2
   1.2.0
   1.0.2
   0.98.14
   2.0.0

 Set permission to .top hfile in LoadIncrementalHFiles
 -

 Key: HBASE-14005
 URL: https://issues.apache.org/jira/browse/HBASE-14005
 Project: HBase
  Issue Type: Bug
Reporter: Francesco MDE
Assignee: Francesco MDE
Priority: Trivial
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-14005.patch


 Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles

2015-07-02 Thread Francesco MDE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco MDE updated HBASE-14005:
--
Status: Patch Available  (was: Open)

 Set permission to .top hfile in LoadIncrementalHFiles
 -

 Key: HBASE-14005
 URL: https://issues.apache.org/jira/browse/HBASE-14005
 Project: HBase
  Issue Type: Bug
Reporter: Francesco MDE
Priority: Trivial
 Attachments: HBASE-14005.patch


 Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-07-02 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612488#comment-14612488
 ] 

Enis Soztutar commented on HBASE-13832:
---

bq. to get the same behavior you need to force running to false when you set 
syncException. so you prevent other procedure to be added.
Not sure whether we gain by ensuring that running is set to false before the 
next execution for syncLoop. Wal store will abort when the master calls abort. 
Before this happens, concurrent calls to {{pushData()}} will still get the 
exception because the exception from sync is not cleared at all. So the 
semantics is that if {{snyc()}} + wal roll fails, we effectively start 
rejecting all requests for {{pushData()}}, which is kind of similar to making 
sure to check isRunning(). 

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, 
 hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14008) REST - Throw an appropriate error during schema POST

2015-07-02 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-14008:

Fix Version/s: (was: 1.1.2)
   (was: 1.2.0)
   1.3.0
   Status: Open  (was: Patch Available)

Changing the error that comes back will break operational compatibility, so I'm 
re-targeting this to avoid patch releases in 1.y.

please add a test that you get the expected error.

 REST - Throw an appropriate error during schema POST
 

 Key: HBASE-14008
 URL: https://issues.apache.org/jira/browse/HBASE-14008
 Project: HBase
  Issue Type: Bug
  Components: REST
Affects Versions: 1.1.1, 0.98.13
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
Priority: Minor
  Labels: REST
 Fix For: 1.3.0

 Attachments: 14008.patch, HBASE-14008.patch


 When an update is done on the schema through REST and an error occurs, the 
 actual reason is not thrown back to the client. Right now we get a 
 javax.ws.rs.WebApplicationException instead of the actual error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13667) Backport HBASE-12975 to 1.0 and 0.98 without changing coprocessors hooks

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13667:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 Backport HBASE-12975 to 1.0 and 0.98 without changing coprocessors hooks
 

 Key: HBASE-13667
 URL: https://issues.apache.org/jira/browse/HBASE-13667
 Project: HBase
  Issue Type: Bug
Reporter: Rajeshbabu Chintaguntla
Assignee: Rajeshbabu Chintaguntla
 Fix For: 0.98.14, 1.0.3


 We can backport Split transaction, region merge transaction interfaces to 
 branch 1.0 and 0.98 without changing coprocessor hooks. Then it should be 
 compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13857) Slow WAL Append count in ServerMetricsTmpl.jamon is hardcoded to zero

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13857:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 Slow WAL Append count in ServerMetricsTmpl.jamon is hardcoded to zero
 -

 Key: HBASE-13857
 URL: https://issues.apache.org/jira/browse/HBASE-13857
 Project: HBase
  Issue Type: Bug
  Components: regionserver, UI
Affects Versions: 0.98.0
Reporter: Lars George
  Labels: beginner
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 The template has this:
 {noformat}
  tr
 ...
 thSlow WAL Append Count/th
 /tr
 tr
 
 td% 0 %/td
 /tr
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14015) Allow setting a richer state value when toString a pv2

2015-07-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14015:
--
Fix Version/s: 1.3.0
   1.2.0
   2.0.0
   Status: Patch Available  (was: Open)

 Allow setting a richer state value when toString a pv2
 --

 Key: HBASE-14015
 URL: https://issues.apache.org/jira/browse/HBASE-14015
 Project: HBase
  Issue Type: Improvement
  Components: proc-v2
Reporter: stack
Assignee: stack
Priority: Minor
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 
 0001-HBASE-14015-Allow-setting-a-richer-state-value-when-.patch


 Debugging, my procedure after a crash was loaded out of the store and its 
 state was RUNNING. It would help if I knew in which of the states of a 
 StateMachineProcedure it was going to start RUNNING at.
 Chatting w/ Matteo, he suggested allowing Procedures customize the String.
 Here is patch that makes it so StateMachineProcedure will now print out the 
 base state -- RUNNING, FINISHED -- followed by a ':' and then the 
 StateMachineProcedure state: e.g. SimpleStateMachineProcedure 
 state=RUNNABLE:SERVER_CRASH_ASSIGN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13267) Deprecate or remove isFileDeletable from SnapshotHFileCleaner

2015-07-02 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612699#comment-14612699
 ] 

Dave Latham commented on HBASE-13267:
-

To be fair, I'm not suggesting removing it - I left it in there to begin with 
for the same reason you mentioned in your last comment.  I was just providing a 
way to do it if desired.

 Deprecate or remove isFileDeletable from SnapshotHFileCleaner
 -

 Key: HBASE-13267
 URL: https://issues.apache.org/jira/browse/HBASE-13267
 Project: HBase
  Issue Type: Task
Reporter: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 The isFileDeletable method in SnapshotHFileCleaner became vestigial after 
 HBASE-12627, lets remove it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-07-02 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612713#comment-14612713
 ] 

Enis Soztutar commented on HBASE-13832:
---

bq. we know that we must die there. why not exit from that loop?
We can call {{stop(true)}} directly from Sync loop it is fine. It was not there 
in your original patch, that is why I did not change it. 
bq. with the actual implementation of abort we know that running will be false 
after a sendAbortProcessSignal() but that may not be the case in the future
The store can cause an abort to the whole procedure executor or the master 
itself. Right now, it does this through the ProcedureStoreListener calls. I'm 
fine with sending an {{Abortable}} directly to the store. These parts are 
mainly coming from the intiail proc v2 patch. 

Does the test rely on 1s / 2s timing? It may end up being flaky in slow jenkins 
hosts. Other than that +1 for the v4 patch. If you want to do the abort 
changes, we can do it here or a follow up. 

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, 
 hbase-13832-test-hang.patch, hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion

2015-07-02 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14017:
---
Description: 
[~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
don't have an exclusive lock before deleting the table
{noformat}
Thread 1: Create table is running - the queue is empty and wlock is false 
Thread 2: markTableAsDeleted see the queue empty and wlock= false
Thread 1: tryWrite() set wlock=true; too late
Thread 2: delete the queue
Thread 1: never able to release the lock - NPE when trying to get the queue
{noformat}

  was:
[~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
don't have an exclusive lock before deleting the table
{noformat}
Thread 1: Create table is running - tryWrite() acquire the lock, before set 
wlock=true;
Thread 2: markTableAsDeleted see the queue empty and wlock= false
Thread 1: set wlock=true; too late
Thread 2: delete the queue
Thread 1: never able to release the lock
{noformat}


 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.0.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion

2015-07-02 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612787#comment-14612787
 ] 

Matteo Bertozzi commented on HBASE-14017:
-

not every runqueue need a tryExclusiveLock()/releaseLock() logic. 
but everyone must be able to lock to prevent operations on when delete is in 
progress
that's the main reason the acquireDeleteLock() is exposed, and there is 
nothing else like a release,
the fact that is implemented as a tryExclusiveLock() is just a coincidence.

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.0.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12596) bulkload needs to follow locality

2015-07-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612803#comment-14612803
 ] 

Ted Yu commented on HBASE-12596:


{code}
212 if (null == tableName || tableName.isEmpty()) {
213   LOG.warn(table name is null, so use default writer);
{code}
When would the above happen ?
Table name is set in configureIncrementalLoad(), right ?
{code}
218 Connection connection = 
ConnectionFactory.createConnection(conf);
219 RegionLocator locator = 
connection.getRegionLocator(TableName.valueOf(tableName));
{code}
You can use try-with-resources.
{code}
231   if (null == loc) {
232 LOG.warn(failed to get region location, so use default 
writer);
{code}
Should the log level be at TRACE ?


 bulkload needs to follow locality
 -

 Key: HBASE-12596
 URL: https://issues.apache.org/jira/browse/HBASE-12596
 Project: HBase
  Issue Type: Improvement
  Components: HFile, regionserver
Affects Versions: 0.98.8
 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7
Reporter: Victor Xu
Assignee: Victor Xu
 Fix For: 0.98.14

 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-master-v1.patch, 
 HBASE-12596.patch


 Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles 
 to be loaded; 2. Move these HFiles to the right hdfs directory. However, the 
 locality could be loss during the first step. Why not just write the HFiles 
 directly into the right place? We can do this easily because 
 StoreFile.WriterBuilder has the withFavoredNodes method, and we just need 
 to call it in HFileOutputFormat's getNewWriter().
 This feature is disabled by default, and we could use 
 'hbase.bulkload.locality.sensitive.enabled=true' to enable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits across regions

2015-07-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612667#comment-14612667
 ] 

Lars Hofhansl commented on HBASE-12988:
---

Good to commit this way?

 [Replication]Parallel apply edits across regions
 

 Key: HBASE-12988
 URL: https://issues.apache.org/jira/browse/HBASE-12988
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: hongyu bi
Assignee: Lars Hofhansl
 Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988.txt, 
 HBASE-12988-0.98.patch, ParallelReplication-v2.txt


 we can apply  edits to slave cluster in parallel on table-level to speed up 
 replication .
 update : per conversation blow , it's better to apply edits on row-level in 
 parallel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-07-02 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-13832:

Attachment: (was: HBASE-13832-v4.patch)

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, 
 hbase-13832-test-hang.patch, hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-07-02 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-13832:

Attachment: HBASE-13832-v4.patch

v4 is Enis patch with a fix and a new test for a case with queued writers that 
were hanging.

still, that catched exception in the syncLoop() and the loop still going 
instead of aborting or at least spinning until !isRunning() seems strange to 
me. we know that we must die there. why not exit from that loop? (with the 
actual implementation of abort we know that running will be false after a 
sendAbortProcessSignal() but that may not be the case in the future)

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, 
 hbase-13832-test-hang.patch, hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-07-02 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-13832:

Attachment: HBASE-13832-v4.patch

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, 
 hbase-13832-test-hang.patch, hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HBASE-13867) Add endpoint coprocessor guide to HBase book

2015-07-02 Thread Gaurav Bhardwaj (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-13867 started by Gaurav Bhardwaj.
---
 Add endpoint coprocessor guide to HBase book
 

 Key: HBASE-13867
 URL: https://issues.apache.org/jira/browse/HBASE-13867
 Project: HBase
  Issue Type: Task
  Components: Coprocessors, documentation
Reporter: Vladimir Rodionov
Assignee: Gaurav Bhardwaj

 Endpoint coprocessors are very poorly documented.
 Coprocessor section of HBase book must be updated either with its own 
 endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some 
 other guides. There is good description here:
 http://www.3pillarglobal.com/insights/hbase-coprocessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13267) Deprecate or remove isFileDeletable from SnapshotHFileCleaner

2015-07-02 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612701#comment-14612701
 ] 

Mikhail Antonov commented on HBASE-13267:
-

Oh yeah, sorry I misinterpreted your comment a bit I guess. I meant if we want 
to remove it, we can do it the way you described.

 Deprecate or remove isFileDeletable from SnapshotHFileCleaner
 -

 Key: HBASE-13267
 URL: https://issues.apache.org/jira/browse/HBASE-13267
 Project: HBase
  Issue Type: Task
Reporter: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 The isFileDeletable method in SnapshotHFileCleaner became vestigial after 
 HBASE-12627, lets remove it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13480) ShortCircuitConnection doesn't short-circuit all calls as expected

2015-07-02 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612700#comment-14612700
 ] 

Enis Soztutar commented on HBASE-13480:
---

I think Josh is right. The ConnectionAdapter being a proxy is the problem. If 
it had extended the ConnectionImplementation, it would have worked. The net 
result is that we are not doing short circuit connections at all.  

 ShortCircuitConnection doesn't short-circuit all calls as expected
 --

 Key: HBASE-13480
 URL: https://issues.apache.org/jira/browse/HBASE-13480
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 1.0.0, 2.0.0, 1.1.0
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 Noticed the following situation in debugging unexpected unit tests failures 
 in HBASE-13351.
 {{ConnectionUtils#createShortCircuitHConnection(Connection, ServerName, 
 AdminService.BlockingInterface, ClientService.BlockingInterface)}} is 
 intended to avoid the extra RPC by calling the server's instantiation of the 
 protobuf rpc stub directly for the AdminService and ClientService.
 The problem is that this is insufficient to actually avoid extra remote 
 RPCs as all other calls to the Connection are routed to a real Connection 
 instance. As such, any object created by the real Connection (such as an 
 HTable) will use the real Connection, not the SSC.
 The end result is that 
 {{MasterRpcService#reportRegionStateTransition(RpcController, 
 ReportRegionStateTransitionRequest)}} will make additional remote RPCs over 
 what it thinks is an SSC through a {{Get}} on {{HTable}} which was 
 constructed using the SSC, but the {{Get}} itself will use the underlying 
 real Connection instead of the SSC. With insufficiently sized thread pools, 
 this has been observed to result in RPC deadlock in the HMaster where an RPC 
 attempts to make another RPC but there are no more threads available to 
 service the second RPC so the first RPC blocks indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13480) ShortCircuitConnection doesn't short-circuit all calls as expected

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13480:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 ShortCircuitConnection doesn't short-circuit all calls as expected
 --

 Key: HBASE-13480
 URL: https://issues.apache.org/jira/browse/HBASE-13480
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 1.0.0, 2.0.0, 1.1.0
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 Noticed the following situation in debugging unexpected unit tests failures 
 in HBASE-13351.
 {{ConnectionUtils#createShortCircuitHConnection(Connection, ServerName, 
 AdminService.BlockingInterface, ClientService.BlockingInterface)}} is 
 intended to avoid the extra RPC by calling the server's instantiation of the 
 protobuf rpc stub directly for the AdminService and ClientService.
 The problem is that this is insufficient to actually avoid extra remote 
 RPCs as all other calls to the Connection are routed to a real Connection 
 instance. As such, any object created by the real Connection (such as an 
 HTable) will use the real Connection, not the SSC.
 The end result is that 
 {{MasterRpcService#reportRegionStateTransition(RpcController, 
 ReportRegionStateTransitionRequest)}} will make additional remote RPCs over 
 what it thinks is an SSC through a {{Get}} on {{HTable}} which was 
 constructed using the SSC, but the {{Get}} itself will use the underlying 
 real Connection instead of the SSC. With insufficiently sized thread pools, 
 this has been observed to result in RPC deadlock in the HMaster where an RPC 
 attempts to make another RPC but there are no more threads available to 
 service the second RPC so the first RPC blocks indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion

2015-07-02 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612737#comment-14612737
 ] 

Stephen Yuan Jiang commented on HBASE-14017:


We should move the tryExclusiveLock() up:

{code}
public synchronized boolean tryExclusiveLock(final TableLockManager 
lockManager,
final TableName tableName, final String purpose) {
  if (tryExclusiveLock()) return false;  //==
  // Take zk-write-lock
  tableLock = lockManager.writeLock(tableName, purpose);
  try {
tableLock.acquire();
  } catch (IOException e) {
LOG.error(failed acquire write lock on  + tableName, e);
tableLock = null;
releaseExclusiveLock(); // ==
return false;
  }
  return true;
}

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.0.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - tryWrite() acquire the lock, before set 
 wlock=true;
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-07-02 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702-branch-1-v3.patch

[~tedyu] fixed the test. So one of the test was failing because an existing 
test was directly changing global configuration (util.getConfiguration()) in 
its test body which affected any tests that ran later. :-/

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13702-branch-1-v2.patch, 
 HBASE-13702-branch-1-v3.patch, HBASE-13702-branch-1.patch, 
 HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, 
 HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13561:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 ITBLL.Verify doesn't actually evaluate counters after job completes
 ---

 Key: HBASE-13561
 URL: https://issues.apache.org/jira/browse/HBASE-13561
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 Was digging through ITBLL and noticed this oddity:
 The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the 
 {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that 
 were written given the current arguments, it's not even checking to see if 
 there things like UNDEFINED records found.
 It seems to me that {{Verify}} should really be doing *some* checking on the 
 counters like {{Loop}} does and not just leaving it up to the visual 
 inspection of whomever launched the task.
 Am I missing something?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13605) RegionStates should not keep its list of dead servers

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13605:
--
Fix Version/s: (was: 1.0.2)

 RegionStates should not keep its list of dead servers
 -

 Key: HBASE-13605
 URL: https://issues.apache.org/jira/browse/HBASE-13605
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 2.0.0, 1.1.2

 Attachments: hbase-13605_v1.patch, hbase-13605_v3-branch-1.1.patch, 
 hbase-13605_v4-branch-1.1.patch, hbase-13605_v4-master.patch


 As mentioned in 
 https://issues.apache.org/jira/browse/HBASE-9514?focusedCommentId=13769761page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769761
  and HBASE-12844 we should have only 1 source of cluster membership. 
 The list of dead server and RegionStates doing it's own liveliness check 
 (ServerManager.isServerReachable()) has caused an assignment problem again in 
 a test cluster where the region states thinks that the server is dead and 
 SSH will handle the region assignment. However the RS is not dead at all, 
 living happily, and never gets zk expiry or YouAreDeadException or anything. 
 This leaves the list of regions unassigned in OFFLINE state. 
 master assigning the region:
 {code}
 15-04-20 09:02:25,780 DEBUG [AM.ZK.Worker-pool3-t330] master.RegionStates: 
 Onlined 77dddcd50c22e56bfff133c0e1f9165b on 
 os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {ENCODED = 
 77dddcd50c
 {code}
 Master then disabled the table, and unassigned the region:
 {code}
 2015-04-20 09:02:27,158 WARN  [ProcedureExecutorThread-1] 
 zookeeper.ZKTableStateManager: Moving table loadtest_d1 state from DISABLING 
 to DISABLING
  Starting unassign of 
 loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. (offlining), 
 current state: {77dddcd50c22e56bfff133c0e1f9165b state=OPEN, 
 ts=1429520545780,   
 server=os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268}
 bleProcedure$BulkDisabler-0] master.AssignmentManager: Sent CLOSE to 
 os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 for region 
 loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b.
 2015-04-20 09:02:27,414 INFO  [AM.ZK.Worker-pool3-t316] master.RegionStates: 
 Offlined 77dddcd50c22e56bfff133c0e1f9165b from 
 os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
 {code}
 On table re-enable, AM does not assign the region: 
 {code}
 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
 balancer.BaseLoadBalancer: Reassigned 25 regions. 25 retained the pre-restart 
 assignment.·
 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
 procedure.EnableTableProcedure: Bulk assigning 25 region(s) across 5 
 server(s), retainAssignment=true
 l,16000,1429515659726-GeneralBulkAssigner-4] master.RegionStates: Couldn't 
 reach online server 
 os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
 l,16000,1429515659726-GeneralBulkAssigner-4] master.AssignmentManager: 
 Updating the state to OFFLINE to allow to be reassigned by SSH
 nmentManager: Skip assigning 
 loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b., it is on a dead 
 but not processed yet server: 
 os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely

2015-07-02 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612715#comment-14612715
 ] 

Stephen Yuan Jiang commented on HBASE-14016:


[~mbertozzi]  I think we should do something like the following: 

{code}
  public boolean tryAcquireTableWrite(final TableName table, final String 
purpose) {
boolean lockAcquired = false;
lock.lock();
try {
lockAcquired = getRunQueueOrCreate(table).tryWrite(lockManager, table, 
purpose);
} finally {
lock.unlock();
}

return lockAcquired;
  }
{code}

 Procedure V2: NPE in a delete table follow by create table closely
 --

 Key: HBASE-14016
 URL: https://issues.apache.org/jira/browse/HBASE-14016
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang

 In our internal test for HBASE 1.1, we found a race condition that delete 
 table followed by create table closely would leak zk lock due to NPE in 
 ProcedureFairRunQueues
 {noformat}
 Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
   at 
 org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
   at 
 org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
   at 
 org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
 {noformat}
 Here is the code that cause the race condition:
 {code}
 protected boolean markTableAsDeleted(final TableName table) {
 TableRunQueue queue = getRunQueue(table);
 if (queue != null) {
 ...
 if (queue.isEmpty()  !queue.isLocked()) {
   fairq.remove(table);
 ...
 }
 public boolean tryWrite(final TableLockManager lockManager,
 final TableName tableName, final String purpose) {
 ...
 tableLock = lockManager.writeLock(tableName, purpose);
 try {
   tableLock.acquire();
   ...
 wlock = true;
 ...
 }
 {code}
 The root cause is: wlock is set too late and not protect the queue be deleted.
 - Thread 1: create table is running; queue is empty - tryWrite() acquire the 
 lock (now wlock is still false)
 - Thread 2: markTableAsDeleted see the queue empty and wlock= false
 - Thread 1: set wlock=true - too late
 - Thread 2: delete the queue
 - Thread 1: never able to release the lock - NPE trying to get queue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion

2015-07-02 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-14017:

Attachment: HBASE-14017-v0.patch

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.0.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - tryWrite() acquire the lock, before set 
 wlock=true;
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14011) MultiByteBuffer position based reads does not work correctly

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-14011:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master. Thanks for the reviews.

 MultiByteBuffer position based reads does not work correctly
 

 Key: HBASE-14011
 URL: https://issues.apache.org/jira/browse/HBASE-14011
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-14011.patch


 The positional based reads in MBBs are having some issues when we try to read 
 the first element from the 2nd BB when the MBB is formed with multiple BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12596) bulkload needs to follow locality

2015-07-02 Thread Victor Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor Xu updated HBASE-12596:
--
Description: 
Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles 
to be loaded; 2. Move these HFiles to the right hdfs directory. However, the 
locality could be loss during the first step. Why not just write the HFiles 
directly into the right place? We can do this easily because 
StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to 
call it in HFileOutputFormat's getNewWriter().
This feature is disabled by default, and we could use 
'hbase.bulkload.locality.sensitive.enabled' to enable it.

  was:Normally, we have 2 steps to perform a bulkload: 1. use a job to write 
HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, 
the locality could be loss during the first step. Why not just write the HFiles 
directly into the right place? We can do this easily because 
StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to 
call it in HFileOutputFormat's getNewWriter().


 bulkload needs to follow locality
 -

 Key: HBASE-12596
 URL: https://issues.apache.org/jira/browse/HBASE-12596
 Project: HBase
  Issue Type: Improvement
  Components: HFile, regionserver
Affects Versions: 0.98.8
 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7
Reporter: Victor Xu
Assignee: Victor Xu
 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-master-v1.patch, 
 HBASE-12596.patch


 Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles 
 to be loaded; 2. Move these HFiles to the right hdfs directory. However, the 
 locality could be loss during the first step. Why not just write the HFiles 
 directly into the right place? We can do this easily because 
 StoreFile.WriterBuilder has the withFavoredNodes method, and we just need 
 to call it in HFileOutputFormat's getNewWriter().
 This feature is disabled by default, and we could use 
 'hbase.bulkload.locality.sensitive.enabled' to enable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12596) bulkload needs to follow locality

2015-07-02 Thread Victor Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor Xu updated HBASE-12596:
--
Attachment: HBASE-12596-master-v1.patch
HBASE-12596-0.98-v1.patch

Add patches for both 0.98 and master branches. This feature is disabled by 
default, and we could use 'hbase.bulkload.locality.sensitive.enabled' to enable 
it.

 bulkload needs to follow locality
 -

 Key: HBASE-12596
 URL: https://issues.apache.org/jira/browse/HBASE-12596
 Project: HBase
  Issue Type: Improvement
  Components: HFile, regionserver
Affects Versions: 0.98.8
 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7
Reporter: Victor Xu
Assignee: Victor Xu
 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-master-v1.patch, 
 HBASE-12596.patch


 Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles 
 to be loaded; 2. Move these HFiles to the right hdfs directory. However, the 
 locality could be loss during the first step. Why not just write the HFiles 
 directly into the right place? We can do this easily because 
 StoreFile.WriterBuilder has the withFavoredNodes method, and we just need 
 to call it in HFileOutputFormat's getNewWriter().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13506) AES-GCM cipher support where available

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13506:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 AES-GCM cipher support where available
 --

 Key: HBASE-13506
 URL: https://issues.apache.org/jira/browse/HBASE-13506
 Project: HBase
  Issue Type: Sub-task
  Components: encryption, security
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 The initial encryption drop only had AES-CTR support because authenticated 
 modes such as GCM are only available in Java 7 and up, and our trunk at the 
 time was targeted at Java 6. However we can optionally use AES-GCM cipher 
 support where available. For HBase 1.0 and up, Java 7 is now the minimum so 
 use of AES-GCM can go in directly. It's probably possible to add support in 
 0.98 too using reflection for cipher object initialization. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13511) Derive data keys with HKDF

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13511:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 Derive data keys with HKDF
 --

 Key: HBASE-13511
 URL: https://issues.apache.org/jira/browse/HBASE-13511
 Project: HBase
  Issue Type: Sub-task
  Components: encryption, security
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 When we are locally managing master key material, when users have supplied 
 their own data key material, derive the actual data keys using HKDF 
 (https://tools.ietf.org/html/rfc5869)
 DK' = HKDF(S, DK, MK)
 where
 S = salt
 DK = user supplied data key
 MK = master key
 DK' = derived data key for the HFile
 User supplied key material may be weak or an attacker may have some partial 
 knowledge of it.
 Where we generate random data keys we can still use HKDF as a way to mix more 
 entropy into the secure random generator. 
 DK' = HKDF(R, MK)
 where
 R = random key material drawn from the system's secure random generator
 MK = master key
 (Salting isn't useful here because salt S and R would be drawn from the same 
 pool, so will not have statistical independence.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13347) RowCounter using special filter is broken

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13347:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 RowCounter using special filter is broken
 -

 Key: HBASE-13347
 URL: https://issues.apache.org/jira/browse/HBASE-13347
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 1.0.0
Reporter: Lars George
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 The {{RowCounter}} in the {{mapreduce}} package is supposed to check if the 
 row count scan has a column selection added to it, and if so, use a different 
 filter that finds the row and counts it. But the {{qualifier.add()}} call is 
 missing in the {{for}} loop. See 
 https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java#L165
 Needs fixing or row count might be wrong when using {{--range}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13352) Add hbase.import.version to Import usage.

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13352:
--
Attachment: hbase-13352_v3.patch

rebased the patch. Will commit this version unless objection. It does not exit 
with non-zero in case of output != input. Just a warning. 

 Add hbase.import.version to Import usage.
 -

 Key: HBASE-13352
 URL: https://issues.apache.org/jira/browse/HBASE-13352
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.1.2, 1.3.0, 1.2.1

 Attachments: 13352-v2.txt, 13352.txt, hbase-13352_v3.patch


 We just tried to export some (small amount of) data out of an 0.94 cluster to 
 0.98 cluster. We used Export/Import for that.
 By default we found that the import M/R job correctly reports the number of 
 records seen, but _silently_ does not import anything. After looking at the 
 0.98 it's obvious there's an hbase.import.version 
 (-Dhbase.import.version=0.94) to make this work.
 Two issues:
 # -Dhbase.import.version=0.94 should be show with the the Import.usage
 # If not given it should not just silently not import anything
 In this issue I'll just a trivially add this option to the Import tool's 
 usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14015) Allow setting a richer state value when toString a pv2

2015-07-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14015:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-1, branch-1.2 and master ([~busbey] FYI -- helps debugging 
sir).

 Allow setting a richer state value when toString a pv2
 --

 Key: HBASE-14015
 URL: https://issues.apache.org/jira/browse/HBASE-14015
 Project: HBase
  Issue Type: Improvement
  Components: proc-v2
Reporter: stack
Assignee: stack
Priority: Minor
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 
 0001-HBASE-14015-Allow-setting-a-richer-state-value-when-.patch


 Debugging, my procedure after a crash was loaded out of the store and its 
 state was RUNNING. It would help if I knew in which of the states of a 
 StateMachineProcedure it was going to start RUNNING at.
 Chatting w/ Matteo, he suggested allowing Procedures customize the String.
 Here is patch that makes it so StateMachineProcedure will now print out the 
 base state -- RUNNING, FINISHED -- followed by a ':' and then the 
 StateMachineProcedure state: e.g. SimpleStateMachineProcedure 
 state=RUNNABLE:SERVER_CRASH_ASSIGN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely

2015-07-02 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14016:
---
Affects Version/s: (was: 2.0.0)

 Procedure V2: NPE in a delete table follow by create table closely
 --

 Key: HBASE-14016
 URL: https://issues.apache.org/jira/browse/HBASE-14016
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 1.2.0, 1.1.1, 1.3.0
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang

 In our internal test for HBASE 1.1, we found a race condition that delete 
 table followed by create table closely would leak zk lock due to NPE in 
 ProcedureFairRunQueues
 {noformat}
 Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
   at 
 org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
   at 
 org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
   at 
 org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
 {noformat}
 Here is the code that cause the race condition:
 {code}
 protected boolean markTableAsDeleted(final TableName table) {
 TableRunQueue queue = getRunQueue(table);
 if (queue != null) {
 ...
 if (queue.isEmpty()  !queue.isLocked()) {
   fairq.remove(table);
 ...
 }
 public boolean tryWrite(final TableLockManager lockManager,
 final TableName tableName, final String purpose) {
 ...
 tableLock = lockManager.writeLock(tableName, purpose);
 try {
   tableLock.acquire();
   ...
 wlock = true;
 ...
 }
 {code}
 The root cause is: wlock is set too late and not protect the queue be deleted.
 - Thread 1: create table is running; queue is empty - tryWrite() acquire the 
 lock (now wlock is still false)
 - Thread 2: markTableAsDeleted see the queue empty and wlock= false
 - Thread 1: set wlock=true - too late
 - Thread 2: delete the queue
 - Thread 1: never able to release the lock - NPE trying to get queue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely

2015-07-02 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-14016:
--

 Summary: Procedure V2: NPE in a delete table follow by create 
table closely
 Key: HBASE-14016
 URL: https://issues.apache.org/jira/browse/HBASE-14016
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 1.1.1, 2.0.0, 1.2.0, 1.3.0
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang


In our internal test for HBASE 1.1, we found a race condition that delete table 
followed by create table closely would leak zk lock due to NPE in 
ProcedureFairRunQueues
{noformat}
Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
{noformat}

Here is the code that cause the race condition:
{code}
protected boolean markTableAsDeleted(final TableName table) {
TableRunQueue queue = getRunQueue(table);
if (queue != null) {
...
if (queue.isEmpty()  !queue.isLocked()) {
  fairq.remove(table);
...
}

public boolean tryWrite(final TableLockManager lockManager,
final TableName tableName, final String purpose) {
...
tableLock = lockManager.writeLock(tableName, purpose);
try {
  tableLock.acquire();
  ...
wlock = true;
...
}
{code}

The root cause is: wlock is set too late and not protect the queue be deleted.
- Thread 1: create table is running; queue is empty - tryWrite() acquire the 
lock (now wlock is still false)
- Thread 2: markTableAsDeleted see the queue empty and wlock= false
- Thread 1: set wlock=true - too late
- Thread 2: delete the queue
- Thread 1: never able to release the lock - NPE trying to get queue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely

2015-07-02 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi resolved HBASE-14016.
-
Resolution: Duplicate

sorry closing as duplicate of HBASE-14017
(we don't need a full lock)

 Procedure V2: NPE in a delete table follow by create table closely
 --

 Key: HBASE-14016
 URL: https://issues.apache.org/jira/browse/HBASE-14016
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang

 In our internal test for HBASE 1.1, we found a race condition that delete 
 table followed by create table closely would leak zk lock due to NPE in 
 ProcedureFairRunQueues
 {noformat}
 Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
   at 
 org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
   at 
 org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
   at 
 org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
 {noformat}
 Here is the code that cause the race condition:
 {code}
 protected boolean markTableAsDeleted(final TableName table) {
 TableRunQueue queue = getRunQueue(table);
 if (queue != null) {
 ...
 if (queue.isEmpty()  !queue.isLocked()) {
   fairq.remove(table);
 ...
 }
 public boolean tryWrite(final TableLockManager lockManager,
 final TableName tableName, final String purpose) {
 ...
 tableLock = lockManager.writeLock(tableName, purpose);
 try {
   tableLock.acquire();
   ...
 wlock = true;
 ...
 }
 {code}
 The root cause is: wlock is set too late and not protect the queue be deleted.
 - Thread 1: create table is running; queue is empty - tryWrite() acquire the 
 lock (now wlock is still false)
 - Thread 2: markTableAsDeleted see the queue empty and wlock= false
 - Thread 1: set wlock=true - too late
 - Thread 2: delete the queue
 - Thread 1: never able to release the lock - NPE trying to get queue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-07-02 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612722#comment-14612722
 ] 

Matteo Bertozzi commented on HBASE-13832:
-

calling directly stop() was not what I was proposing. what I was saying was 
just exiting from the syncLoop(). before with the while (isRunning()) we were 
spinning after the signal, to make clear that there were no other run of the 
syncLoop(). in this case we may do another round of the loop and execute stuff 
which in theory is not what you expect after sending the abort signal.

the test does not rely on the 1s/2s timing, it passes even without. but I was 
trying to make the problem more clear  to someone looking the code.

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, 
 hbase-13832-test-hang.patch, hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion

2015-07-02 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612736#comment-14612736
 ] 

Stephen Yuan Jiang commented on HBASE-14017:


[~mbertozzi]  This would not solve the problem, because tryExclusiveLock would 
not check the result of 'tryExclusiveLock()'. 

{code}
public synchronized boolean tryExclusiveLock(final TableLockManager 
lockManager,
final TableName tableName, final String purpose) {
  if (isLocked()) return false;
  // Take zk-write-lock
  tableLock = lockManager.writeLock(tableName, purpose);
  try {
tableLock.acquire();
  } catch (IOException e) {
LOG.error(failed acquire write lock on  + tableName, e);
tableLock = null;
return false;
  }
  tryExclusiveLock();
  return true;
}
{code}

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.0.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - tryWrite() acquire the lock, before set 
 wlock=true;
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion

2015-07-02 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612766#comment-14612766
 ] 

Matteo Bertozzi commented on HBASE-14017:
-

it does not make any difference, we are under synchronized. we check for 
isLocked() so that tryLock will always lock successfully, not the best looking 
thing ever but correct anyway [~stack] 

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.0.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - tryWrite() acquire the lock, before set 
 wlock=true;
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-12596) bulkload needs to follow locality

2015-07-02 Thread Victor Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor Xu reassigned HBASE-12596:
-

Assignee: Victor Xu

 bulkload needs to follow locality
 -

 Key: HBASE-12596
 URL: https://issues.apache.org/jira/browse/HBASE-12596
 Project: HBase
  Issue Type: Improvement
  Components: HFile, regionserver
Affects Versions: 0.98.8
 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7
Reporter: Victor Xu
Assignee: Victor Xu
 Attachments: HBASE-12596.patch


 Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles 
 to be loaded; 2. Move these HFiles to the right hdfs directory. However, the 
 locality could be loss during the first step. Why not just write the HFiles 
 directly into the right place? We can do this easily because 
 StoreFile.WriterBuilder has the withFavoredNodes method, and we just need 
 to call it in HFileOutputFormat's getNewWriter().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13941) Backport HBASE-13917 (Remove string comparison to identify request priority) to release branches

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13941:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 Backport HBASE-13917 (Remove string comparison to identify request priority) 
 to release branches
 

 Key: HBASE-13941
 URL: https://issues.apache.org/jira/browse/HBASE-13941
 Project: HBase
  Issue Type: Task
Reporter: Andrew Purtell
 Fix For: 0.98.14, 1.1.2, 1.0.3


 Backport HBASE-13917 (Remove string comparison to identify request priority) 
 to release branches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13988) Add exception handler for lease thread

2015-07-02 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612656#comment-14612656
 ] 

Enis Soztutar commented on HBASE-13988:
---

The patch aborts the RS with the throwable which will get logged as well, no? 

{code}
uncaughtExceptionHandler = new UncaughtExceptionHandler() {
  @Override
  public void uncaughtException(Thread t, Throwable e) {
abort(Uncaught exception in service thread  + t.getName(), e);
  }
};
...
  public void abort(String reason, Throwable cause) {
String msg = ABORTING region server  + this + :  + reason;
if (cause != null) {
  LOG.fatal(msg, cause);
} else {
  LOG.fatal(msg);
}
{code}

We were already aborting the RS in case leases thread dies, so it does not 
change the semantics. +1. 


 Add exception handler for lease thread
 --

 Key: HBASE-13988
 URL: https://issues.apache.org/jira/browse/HBASE-13988
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Fix For: 2.0.0, 1.0.2, 1.1.2, 0.98.15

 Attachments: HBASE-13988-v001.diff


 In a prod cluster, a region server exited for some important 
 threads were not alive. After excluding other threads from the log, we 
 doubted the lease thread was the root. 
 So we need to add an exception handler to the lease thread to debug why it 
 exited in future.
  
 {quote}
 2015-06-29,12:46:09,222 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: One or more 
 threads are no longer alive -- stop
 2015-06-29,12:46:09,223 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
 server on 21600
 ...
 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.LogRoller: 
 LogRoller exiting.
 2015-06-29,12:46:09,330 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Thread-37 exiting
 2015-06-29,12:46:09,330 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: 
 regionserver21600.compactionChecker exiting
 2015-06-29,12:46:12,403 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: 
 regionserver21600.periodicFlusher exiting
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13452) HRegion warning about memstore size miscalculation is not actionable

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13452:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 HRegion warning about memstore size miscalculation is not actionable
 

 Key: HBASE-13452
 URL: https://issues.apache.org/jira/browse/HBASE-13452
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dev Lakhani
Assignee: Mikhail Antonov
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.2.1, 1.0.3


 During normal operation the HRegion class reports a message related to 
 memstore flushing in HRegion.class :
   if (!canFlush) {
 addAndGetGlobalMemstoreSize(-memstoreSize.get());
   } else if (memstoreSize.get() != 0) {
 LOG.error(Memstore size is  + memstoreSize.get());
   }
 The log file is filled with lots of 
 Memstore size is 558744
 Memstore size is 4390632
 Memstore size is 558744 
 ...
 These message are uninformative, clog up the logs and offers no root cause 
 nor solution. Maybe the message needs to be more informative, changed to WARN 
 or some further information provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13267) Deprecate or remove isFileDeletable from SnapshotHFileCleaner

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13267:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 Deprecate or remove isFileDeletable from SnapshotHFileCleaner
 -

 Key: HBASE-13267
 URL: https://issues.apache.org/jira/browse/HBASE-13267
 Project: HBase
  Issue Type: Task
Reporter: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 The isFileDeletable method in SnapshotHFileCleaner became vestigial after 
 HBASE-12627, lets remove it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13221) HDFS Transparent Encryption breaks WAL writing

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13221:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 HDFS Transparent Encryption breaks WAL writing
 --

 Key: HBASE-13221
 URL: https://issues.apache.org/jira/browse/HBASE-13221
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.98.0, 1.0.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.0.3


 We need to detect when HDFS Transparent Encryption (Hadoop 2.6.0+) is enabled 
 and fall back to more synchronization in the WAL to prevent catastrophic 
 failure under load.
 See HADOOP-11708 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13271) Table#puts(ListPut) operation is indeterminate; needs fixing

2015-07-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13271:
--
Fix Version/s: (was: 1.0.2)
   1.0.3

 Table#puts(ListPut) operation is indeterminate; needs fixing
 --

 Key: HBASE-13271
 URL: https://issues.apache.org/jira/browse/HBASE-13271
 Project: HBase
  Issue Type: Improvement
  Components: API
Affects Versions: 1.0.0
Reporter: stack
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 Another API issue found by [~larsgeorge]:
 Table.put(ListPut) is questionable after the API change.
 {code}
 [Mar-17 9:21 AM] Lars George: Table.put(ListPut) is weird since you cannot 
 flush partial lists
 [Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the 
 put() call returns with a local exception (say empty Put) and then you have 2 
 that are in the buffer
 [Mar-17 9:21 AM] Lars George: but how to you force commit them?
 [Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but 
 that is gone now
 [Mar-17 9:22 AM] Lars George: and flush() is not available on a Table
 [Mar-17 9:22 AM] Lars George: And you cannot access the underlying 
 BufferedMutation neither
 [Mar-17 9:23 AM] Lars George: You can *only* add more Puts if you can, or 
 call close()
 [Mar-17 9:23 AM] Lars George: that is just weird to explain
 {code}
 So, Table needs to get flush back or we deprecate this method or it flushes 
 immediately and does not return until complete in the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13267) Deprecate or remove isFileDeletable from SnapshotHFileCleaner

2015-07-02 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612695#comment-14612695
 ] 

Mikhail Antonov commented on HBASE-13267:
-

I guess there's some action to be taken here.. Should we do as Dave suggested 
and proceeed with removing it? I can take this one.

 Deprecate or remove isFileDeletable from SnapshotHFileCleaner
 -

 Key: HBASE-13267
 URL: https://issues.apache.org/jira/browse/HBASE-13267
 Project: HBase
  Issue Type: Task
Reporter: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3


 The isFileDeletable method in SnapshotHFileCleaner became vestigial after 
 HBASE-12627, lets remove it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely

2015-07-02 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14016:
---
Affects Version/s: 2.0.0

 Procedure V2: NPE in a delete table follow by create table closely
 --

 Key: HBASE-14016
 URL: https://issues.apache.org/jira/browse/HBASE-14016
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang

 In our internal test for HBASE 1.1, we found a race condition that delete 
 table followed by create table closely would leak zk lock due to NPE in 
 ProcedureFairRunQueues
 {noformat}
 Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
   at 
 org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
   at 
 org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
   at 
 org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
 {noformat}
 Here is the code that cause the race condition:
 {code}
 protected boolean markTableAsDeleted(final TableName table) {
 TableRunQueue queue = getRunQueue(table);
 if (queue != null) {
 ...
 if (queue.isEmpty()  !queue.isLocked()) {
   fairq.remove(table);
 ...
 }
 public boolean tryWrite(final TableLockManager lockManager,
 final TableName tableName, final String purpose) {
 ...
 tableLock = lockManager.writeLock(tableName, purpose);
 try {
   tableLock.acquire();
   ...
 wlock = true;
 ...
 }
 {code}
 The root cause is: wlock is set too late and not protect the queue be deleted.
 - Thread 1: create table is running; queue is empty - tryWrite() acquire the 
 lock (now wlock is still false)
 - Thread 2: markTableAsDeleted see the queue empty and wlock= false
 - Thread 1: set wlock=true - too late
 - Thread 2: delete the queue
 - Thread 1: never able to release the lock - NPE trying to get queue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13977:
---
Status: Open  (was: Patch Available)

 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch, 
 HBASE-13977_4.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13977:
---
Attachment: HBASE-13977_4.patch

Try QA.

 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch, 
 HBASE-13977_4.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell

2015-07-02 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13977:
---
Status: Patch Available  (was: Open)

 Convert getKey and related APIs to Cell
 ---

 Key: HBASE-13977
 URL: https://issues.apache.org/jira/browse/HBASE-13977
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, 
 HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch, 
 HBASE-13977_4.patch


 During the course of changes for HBASE-11425 felt that more APIs can be 
 converted to return Cell instead of BB like getKey, getLastKey. 
 We can also rename the getKeyValue to getCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion

2015-07-02 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612780#comment-14612780
 ] 

Stephen Yuan Jiang commented on HBASE-14017:


+1 - it should work.  One thing is that you can just use 'tryExclusiveLock()' 
(expose it in interface) instead of creating a new 'acquireDeleteLock()'.

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.0.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-07-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612781#comment-14612781
 ] 

Ted Yu commented on HBASE-13702:


Waiting for Jenkins to come back so that QA can test the patch.

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13702-branch-1-v2.patch, 
 HBASE-13702-branch-1-v3.patch, HBASE-13702-branch-1.patch, 
 HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, 
 HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12596) bulkload needs to follow locality

2015-07-02 Thread Victor Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor Xu updated HBASE-12596:
--
Description: 
Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles 
to be loaded; 2. Move these HFiles to the right hdfs directory. However, the 
locality could be loss during the first step. Why not just write the HFiles 
directly into the right place? We can do this easily because 
StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to 
call it in HFileOutputFormat's getNewWriter().
This feature is disabled by default, and we could use 
'hbase.bulkload.locality.sensitive.enabled=true' to enable it.

  was:
Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles 
to be loaded; 2. Move these HFiles to the right hdfs directory. However, the 
locality could be loss during the first step. Why not just write the HFiles 
directly into the right place? We can do this easily because 
StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to 
call it in HFileOutputFormat's getNewWriter().
This feature is disabled by default, and we could use 
'hbase.bulkload.locality.sensitive.enabled' to enable it.


 bulkload needs to follow locality
 -

 Key: HBASE-12596
 URL: https://issues.apache.org/jira/browse/HBASE-12596
 Project: HBase
  Issue Type: Improvement
  Components: HFile, regionserver
Affects Versions: 0.98.8
 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7
Reporter: Victor Xu
Assignee: Victor Xu
 Fix For: 0.98.14

 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-master-v1.patch, 
 HBASE-12596.patch


 Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles 
 to be loaded; 2. Move these HFiles to the right hdfs directory. However, the 
 locality could be loss during the first step. Why not just write the HFiles 
 directly into the right place? We can do this easily because 
 StoreFile.WriterBuilder has the withFavoredNodes method, and we just need 
 to call it in HFileOutputFormat's getNewWriter().
 This feature is disabled by default, and we could use 
 'hbase.bulkload.locality.sensitive.enabled=true' to enable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >