date:20150609


 [ 
https://issues.apache.org/jira/browse/HBASE-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-13871:
---
Attachment: HBASE-13871.patch

 Change RegionScannerImpl to deal with Cell instead of byte[], int, int
 --

 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13871.patch, HBASE-13871.patch


 This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13848) Access InfoServer SSL passwords through Credential Provder API


[ 
https://issues.apache.org/jira/browse/HBASE-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578670#comment-14578670
 ] 

Hadoop QA commented on HBASE-13848:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12738535/HBASE-13848.1.patch
  against master branch at commit 487e4aa74fcc6ef4201f6ffdcfd1a7169c754562.
  ATTACHMENT ID: 12738535

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestRegionRebalancing

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14343//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14343//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14343//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14343//console

This message is automatically generated.

 Access InfoServer SSL passwords through Credential Provder API
 --

 Key: HBASE-13848
 URL: https://issues.apache.org/jira/browse/HBASE-13848
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Sean Busbey
Assignee: Sean Busbey
 Attachments: HBASE-13848.1.patch, HBASE-13848.1.patch


 HBASE-11810 took care of getting our SSL passwords out of the Hadoop 
 Credential Provider API, but we also get several out of clear text 
 configuration for the InfoServer class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13855) NPE in PartitionedMobCompactor


[ 
https://issues.apache.org/jira/browse/HBASE-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578640#comment-14578640
 ] 

ramkrishna.s.vasudevan commented on HBASE-13855:


+1 for commit.

 NPE in PartitionedMobCompactor
 --

 Key: HBASE-13855
 URL: https://issues.apache.org/jira/browse/HBASE-13855
 Project: HBase
  Issue Type: Sub-task
  Components: mob
Affects Versions: hbase-11339
Reporter: Jingcheng Du
Assignee: Jingcheng Du
 Fix For: hbase-11339

 Attachments: HBASE-13855.diff


 In PartitionedMobCompactor, mob files are split into partitions, the 
 compactions of partitions run in parallel.
 The partitions share  the same set of del files. There might be race 
 conditions when open readers of del store files in each partition which can 
 cause NPE.
 In this patch, we will pre-create the reader for each del store file to avoid 
 this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13836) Do not reset the mvcc for bulk loaded mob reference cells in reading

[
https://issues.apache.org/jira/browse/HBASE-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578660#comment-14578660
]

Anoop Sam John commented on HBASE-13836:

-if (this.cur != null this.reader.isBulkLoaded()) {
+if (this.cur != null !this.reader.isSkipResetSeqId()) {

It will be better for readability that this check is this.cur != null
this.reader.isBulkLoaded() !this.reader.isSkipResetSeqId()

Also no need to call setter on reader for non bulk loaded cases.. Just set it
only for bulk loaded files. And pls add some comments that why we do this skip.

Do not reset the mvcc for bulk loaded mob reference cells in reading

Key: HBASE-13836
URL: https://issues.apache.org/jira/browse/HBASE-13836
Project: HBase
Issue Type: Sub-task
Components: mob
Affects Versions: hbase-11339
Reporter: Jingcheng Du
Assignee: Jingcheng Du
Fix For: hbase-11339

Attachments: HBASE-13836.diff

Now in scanner, the cells mvcc of the bulk loaded files are reset by the
seqId parsed from the file name. We need to skip this if the hfiles are
bulkloaded in mob compactions.
In mob compaction, the bulk loaded ref cell might not be the latest cell
among the ones that have the same row key. In reading, the mvcc is reset by
the largest one, it will cover the real latest ref cell. We have to skip the
resetting in this case.
The solution is we add a new field to fileinfo, when this field is set as
true, we skip the resetting.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13836) Do not reset the mvcc for bulk loaded mob reference cells in reading


[ 
https://issues.apache.org/jira/browse/HBASE-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578649#comment-14578649
 ] 

ramkrishna.s.vasudevan commented on HBASE-13836:


{code}
public static final byte[] SKIP_RESET_SEQ_ID = 
Bytes.toBytes(SKIP_RESET_SEQ_ID);
{code}
May be you can add the reason and specify used in MOB cases.  It will help to 
understand later because it is specific to MOB.
Also you can always make this 'true' and avoid the setSkipResetSeqId(true) 
every time for a non-bulk load case? If is a bulk load you can set based on the 
metadata info.
{code}
private boolean skipResetSeqId = false;
{code}

 Do not reset the mvcc for bulk loaded mob reference cells in reading
 

 Key: HBASE-13836
 URL: https://issues.apache.org/jira/browse/HBASE-13836
 Project: HBase
  Issue Type: Sub-task
  Components: mob
Affects Versions: hbase-11339
Reporter: Jingcheng Du
Assignee: Jingcheng Du
 Fix For: hbase-11339

 Attachments: HBASE-13836.diff


 Now in scanner, the cells mvcc of the bulk loaded files are reset by the 
 seqId parsed from the file name. We need to skip this if the hfiles are 
 bulkloaded in mob compactions.
 In mob compaction, the bulk loaded ref cell might not be the latest cell 
 among the ones that have the same row key. In reading, the mvcc is reset by 
 the largest one, it will cover the real latest ref cell. We have to skip the 
 resetting in this case.
 The solution is we add a new field to fileinfo, when this field is set as 
 true, we skip the resetting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13872) Typo in dfshealth.html - Decomissioning

2015-06-09 Thread nijel (JIRA)

nijel created HBASE-13872:
-

 Summary: Typo in dfshealth.html - Decomissioning
 Key: HBASE-13872
 URL: https://issues.apache.org/jira/browse/HBASE-13872
 Project: HBase
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial


div class=page-headerh1smallDecomissioning/small/h1/div
change to 
div class=page-headerh1smallDecommissioning/small/h1/div

in dfshealth.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13451) Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators


[ 
https://issues.apache.org/jira/browse/HBASE-13451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578616#comment-14578616
 ] 

Hudson commented on HBASE-13451:


FAILURE: Integrated in HBase-TRUNK #6554 (See 
[https://builds.apache.org/job/HBase-TRUNK/6554/])
HBASE-13451 - Make the HFileBlockIndex blockKeys to Cells so that it could 
(ramkrishna: rev 487e4aa74fcc6ef4201f6ffdcfd1a7169c754562)
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV3.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilterWriter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/BloomFilterChunk.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/CellUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileSeek.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilterBase.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CompoundBloomFilterWriter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CompoundBloomFilter.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CompoundBloomFilterBase.java


 Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use 
 in the CellComparators
 --

 Key: HBASE-13451
 URL: https://issues.apache.org/jira/browse/HBASE-13451
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-13451.patch, HBASE-13451_1.patch, 
 HBASE-13451_2.patch, HBASE-13451_3.patch, HBASE-13451_4.patch, 
 HBASE-13451_5.patch, HBASE-13451_6.patch, HBASE-13451_7.patch


 After HBASE-10800 we could ensure that all the blockKeys in the BlockReader 
 are converted to Cells (KeyOnlyKeyValue) so that we could use 
 CellComparators. Note that this can be done only for the keys that are 
 written using CellComparators and not for the ones using RawBytesComparator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13829) Add more ThrottleType

2015-06-09 Thread Matteo Bertozzi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578769#comment-14578769
 ] 

Matteo Bertozzi commented on HBASE-13829:
-

+1 on v2

 Add more ThrottleType
 -

 Key: HBASE-13829
 URL: https://issues.apache.org/jira/browse/HBASE-13829
 Project: HBase
  Issue Type: Improvement
  Components: Client
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13829-v1.patch, HBASE-13829-v2.patch, 
 HBASE-13829.patch


 HBASE-11598 add simple throttling for hbase. But in the client, it doesn't 
 support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, 
 READ_SIZE.
 REVIEW BOARD: https://reviews.apache.org/r/34989/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13871) Change RegionScannerImpl to deal with Cell instead of byte[], int, int


[ 
https://issues.apache.org/jira/browse/HBASE-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578773#comment-14578773
 ] 

ramkrishna.s.vasudevan commented on HBASE-13871:


LGTM.
nit,
createFirstOnRow can have a small doc. 

 Change RegionScannerImpl to deal with Cell instead of byte[], int, int
 --

 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13871.patch, HBASE-13871.patch


 This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13871) Change RegionScannerImpl to deal with Cell instead of byte[], int, int


[ 
https://issues.apache.org/jira/browse/HBASE-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578790#comment-14578790
 ] 

Hadoop QA commented on HBASE-13871:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12738544/HBASE-13871.patch
  against master branch at commit 487e4aa74fcc6ef4201f6ffdcfd1a7169c754562.
  ATTACHMENT ID: 12738544

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14344//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14344//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14344//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14344//console

This message is automatically generated.

 Change RegionScannerImpl to deal with Cell instead of byte[], int, int
 --

 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13871.patch, HBASE-13871.patch


 This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13755) Provide single super user check implementation


[ 
https://issues.apache.org/jira/browse/HBASE-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578392#comment-14578392
 ] 

Hadoop QA commented on HBASE-13755:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12738491/HBASE-13755-v4.patch
  against master branch at commit c62b396f9f4537d121e2661f328674c99a8b7fb7.
  ATTACHMENT ID: 12738491

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14341//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14341//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14341//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14341//console

This message is automatically generated.

 Provide single super user check implementation
 --

 Key: HBASE-13755
 URL: https://issues.apache.org/jira/browse/HBASE-13755
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Mikhail Antonov
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13755-v1.patch, HBASE-13755-v2.patch, 
 HBASE-13755-v3.patch, HBASE-13755-v3.patch, HBASE-13755-v3.patch, 
 HBASE-13755-v3.patch, HBASE-13755-v3.patch, HBASE-13755-v4.patch


 Followup for HBASE-13375.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13870) Aborted major compaction does not clean up after itself

2015-06-09 Thread Gautam Gopalakrishnan (JIRA)

Gautam Gopalakrishnan created HBASE-13870:
-

 Summary: Aborted major compaction does not clean up after itself
 Key: HBASE-13870
 URL: https://issues.apache.org/jira/browse/HBASE-13870
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.98.10
Reporter: Gautam Gopalakrishnan
Priority: Minor


When a major compaction is aborted, incomplete HFiles can be left behind in the 
.tmp directory of the region. They are cleaned up when the region is reassigned 
but in long running clusters, this does not happen often leading to excess disk 
usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13847) getWriteRequestCount function in HRegionServer uses int variable to return the count.


[ 
https://issues.apache.org/jira/browse/HBASE-13847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578388#comment-14578388
 ] 

Hudson commented on HBASE-13847:


FAILURE: Integrated in HBase-1.1 #531 (See 
[https://builds.apache.org/job/HBase-1.1/531/])
HBASE-13847 getWriteRequestCount function in HRegionServer uses int variable to 
return the count (Abhilash) (stack: rev 
4afad40d5a935b5e5de2cd65ad7ffc8a28d1e893)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


 getWriteRequestCount function in HRegionServer uses int variable to return 
 the count.
 -

 Key: HBASE-13847
 URL: https://issues.apache.org/jira/browse/HBASE-13847
 Project: HBase
  Issue Type: Bug
  Components: hbase, regionserver
Affects Versions: 1.0.0
Reporter: Abhilash
Assignee: Abhilash
  Labels: easyfix
 Fix For: 2.0.0, 1.0.1, 1.2.0, 1.1.1

 Attachments: HBASE-13847.patch, HBASE-13847.patch, HBASE-13847.patch, 
 HBASE-13847.patch, screenshot-1.png


 Variable used to return the value of getWriteRequestCount is int, must be 
 long. I think it causes cluster UI to show negative Write Request Count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13847) getWriteRequestCount function in HRegionServer uses int variable to return the count.


[ 
https://issues.apache.org/jira/browse/HBASE-13847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578382#comment-14578382
 ] 

Hudson commented on HBASE-13847:


FAILURE: Integrated in HBase-TRUNK #6553 (See 
[https://builds.apache.org/job/HBase-TRUNK/6553/])
HBASE-13847 getWriteRequestCount function in HRegionServer uses int variable to 
return the count (Abhilash) (stack: rev 
c62b396f9f4537d121e2661f328674c99a8b7fb7)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


 getWriteRequestCount function in HRegionServer uses int variable to return 
 the count.
 -

 Key: HBASE-13847
 URL: https://issues.apache.org/jira/browse/HBASE-13847
 Project: HBase
  Issue Type: Bug
  Components: hbase, regionserver
Affects Versions: 1.0.0
Reporter: Abhilash
Assignee: Abhilash
  Labels: easyfix
 Fix For: 2.0.0, 1.0.1, 1.2.0, 1.1.1

 Attachments: HBASE-13847.patch, HBASE-13847.patch, HBASE-13847.patch, 
 HBASE-13847.patch, screenshot-1.png


 Variable used to return the value of getWriteRequestCount is int, must be 
 long. I think it causes cluster UI to show negative Write Request Count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13871) Change RegionScannerImpl to deal with Cell instead of byte[], int, int

Anoop Sam John created HBASE-13871:
--

 Summary: Change RegionScannerImpl to deal with Cell instead of 
byte[], int, int
 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John


This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13848) Access InfoServer SSL passwords through Credential Provder API

2015-06-09 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-13848:

Status: Patch Available  (was: Open)

 Access InfoServer SSL passwords through Credential Provder API
 --

 Key: HBASE-13848
 URL: https://issues.apache.org/jira/browse/HBASE-13848
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Sean Busbey
Assignee: Sean Busbey
 Attachments: HBASE-13848.1.patch, HBASE-13848.1.patch


 HBASE-11810 took care of getting our SSL passwords out of the Hadoop 
 Credential Provider API, but we also get several out of clear text 
 configuration for the InfoServer class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13855) NPE in PartitionedMobCompactor


[ 
https://issues.apache.org/jira/browse/HBASE-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578484#comment-14578484
 ] 

Anoop Sam John commented on HBASE-13855:


+1
[~jmhsieh]  Pls check and commit ?

 NPE in PartitionedMobCompactor
 --

 Key: HBASE-13855
 URL: https://issues.apache.org/jira/browse/HBASE-13855
 Project: HBase
  Issue Type: Sub-task
  Components: mob
Affects Versions: hbase-11339
Reporter: Jingcheng Du
Assignee: Jingcheng Du
 Fix For: hbase-11339

 Attachments: HBASE-13855.diff


 In PartitionedMobCompactor, mob files are split into partitions, the 
 compactions of partitions run in parallel.
 The partitions share  the same set of del files. There might be race 
 conditions when open readers of del store files in each partition which can 
 cause NPE.
 In this patch, we will pre-create the reader for each del store file to avoid 
 this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13848) Access InfoServer SSL passwords through Credential Provder API

2015-06-09 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-13848:

Status: Open  (was: Patch Available)

 Access InfoServer SSL passwords through Credential Provder API
 --

 Key: HBASE-13848
 URL: https://issues.apache.org/jira/browse/HBASE-13848
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Sean Busbey
Assignee: Sean Busbey
 Attachments: HBASE-13848.1.patch


 HBASE-11810 took care of getting our SSL passwords out of the Hadoop 
 Credential Provider API, but we also get several out of clear text 
 configuration for the InfoServer class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13871) Change RegionScannerImpl to deal with Cell instead of byte[], int, int


[ 
https://issues.apache.org/jira/browse/HBASE-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578492#comment-14578492
 ] 

Hadoop QA commented on HBASE-13871:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12738516/HBASE-13871.patch
  against master branch at commit c62b396f9f4537d121e2661f328674c99a8b7fb7.
  ATTACHMENT ID: 12738516

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1917 checkstyle errors (more than the master's current 1912 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestMinVersions
  org.apache.hadoop.hbase.regionserver.TestKeepDeletes
  org.apache.hadoop.hbase.regionserver.TestScanWithBloomError
  
org.apache.hadoop.hbase.regionserver.TestStoreFileRefresherChore
  org.apache.hadoop.hbase.regionserver.TestScanner
  org.apache.hadoop.hbase.filter.TestInvocationRecordFilter
  org.apache.hadoop.hbase.procedure.TestProcedureManager
  org.apache.hadoop.hbase.regionserver.TestResettingCounters

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14342//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14342//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14342//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14342//console

This message is automatically generated.

 Change RegionScannerImpl to deal with Cell instead of byte[], int, int
 --

 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13871.patch


 This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13848) Access InfoServer SSL passwords through Credential Provder API

2015-06-09 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-13848:

Attachment: HBASE-13848.1.patch

reattaching for another run. based on conversation on HBASE-13811 I'm sure the 
failure in TestDistributedLogSplitting isn't related. Let's see if the fix from 
there helped.

 Access InfoServer SSL passwords through Credential Provder API
 --

 Key: HBASE-13848
 URL: https://issues.apache.org/jira/browse/HBASE-13848
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Sean Busbey
Assignee: Sean Busbey
 Attachments: HBASE-13848.1.patch, HBASE-13848.1.patch


 HBASE-11810 took care of getting our SSL passwords out of the Hadoop 
 Credential Provider API, but we also get several out of clear text 
 configuration for the InfoServer class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13870) Aborted major compaction does not clean up after itself


[ 
https://issues.apache.org/jira/browse/HBASE-13870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578409#comment-14578409
 ] 

Anoop Sam John commented on HBASE-13870:


Planning to give a patch?

 Aborted major compaction does not clean up after itself
 ---

 Key: HBASE-13870
 URL: https://issues.apache.org/jira/browse/HBASE-13870
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.98.10
Reporter: Gautam Gopalakrishnan
Priority: Minor

 When a major compaction is aborted, incomplete HFiles can be left behind in 
 the .tmp directory of the region. They are cleaned up when the region is 
 reassigned but in long running clusters, this does not happen often leading 
 to excess disk usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13451) Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators


 [ 
https://issues.apache.org/jira/browse/HBASE-13451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13451:
---
Attachment: HBASE-13451_7.patch

Patch that was committed. Thanks for the reviews Ted, Stack and Anoop. 

 Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use 
 in the CellComparators
 --

 Key: HBASE-13451
 URL: https://issues.apache.org/jira/browse/HBASE-13451
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-13451.patch, HBASE-13451_1.patch, 
 HBASE-13451_2.patch, HBASE-13451_3.patch, HBASE-13451_4.patch, 
 HBASE-13451_5.patch, HBASE-13451_6.patch, HBASE-13451_7.patch


 After HBASE-10800 we could ensure that all the blockKeys in the BlockReader 
 are converted to Cells (KeyOnlyKeyValue) so that we could use 
 CellComparators. Note that this can be done only for the keys that are 
 written using CellComparators and not for the ones using RawBytesComparator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13451) Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators


 [ 
https://issues.apache.org/jira/browse/HBASE-13451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13451:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master.

 Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use 
 in the CellComparators
 --

 Key: HBASE-13451
 URL: https://issues.apache.org/jira/browse/HBASE-13451
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-13451.patch, HBASE-13451_1.patch, 
 HBASE-13451_2.patch, HBASE-13451_3.patch, HBASE-13451_4.patch, 
 HBASE-13451_5.patch, HBASE-13451_6.patch, HBASE-13451_7.patch


 After HBASE-10800 we could ensure that all the blockKeys in the BlockReader 
 are converted to Cells (KeyOnlyKeyValue) so that we could use 
 CellComparators. Note that this can be done only for the keys that are 
 written using CellComparators and not for the ones using RawBytesComparator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13451) Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators


[ 
https://issues.apache.org/jira/browse/HBASE-13451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578417#comment-14578417
 ] 

Anoop Sam John commented on HBASE-13451:


Thanks [~ram_krish] for the nice work.. It was a much needed cleanup and we 
reduce a lot of short living object creations (KeyOnlyKeyValue)

 Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use 
 in the CellComparators
 --

 Key: HBASE-13451
 URL: https://issues.apache.org/jira/browse/HBASE-13451
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-13451.patch, HBASE-13451_1.patch, 
 HBASE-13451_2.patch, HBASE-13451_3.patch, HBASE-13451_4.patch, 
 HBASE-13451_5.patch, HBASE-13451_6.patch, HBASE-13451_7.patch


 After HBASE-10800 we could ensure that all the blockKeys in the BlockReader 
 are converted to Cells (KeyOnlyKeyValue) so that we could use 
 CellComparators. Note that this can be done only for the keys that are 
 written using CellComparators and not for the ones using RawBytesComparator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13836) Do not reset the mvcc for bulk loaded mob reference cells in reading

2015-06-09 Thread Jingcheng Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HBASE-13836:
-
Attachment: HBASE-13836.diff

Upload the patch. Please take a look, thanks a lot!

 Do not reset the mvcc for bulk loaded mob reference cells in reading
 

 Key: HBASE-13836
 URL: https://issues.apache.org/jira/browse/HBASE-13836
 Project: HBase
  Issue Type: Sub-task
  Components: mob
Affects Versions: hbase-11339
Reporter: Jingcheng Du
Assignee: Jingcheng Du
 Fix For: hbase-11339

 Attachments: HBASE-13836.diff


 Now in scanner, the cells mvcc of the bulk loaded files are reset by the 
 seqId parsed from the file name. We need to skip this if the hfiles are 
 bulkloaded in mob compactions.
 In mob compaction, the bulk loaded ref cell might not be the latest cell 
 among the ones that have the same row key. In reading, the mvcc is reset by 
 the largest one, it will cover the real latest ref cell. We have to skip the 
 resetting in this case.
 The solution is we add a new field to fileinfo, when this field is set as 
 true, we skip the resetting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13865) Default value of hbase.hregion.memstore.block.multiplier in HBase book is wrong


[ 
https://issues.apache.org/jira/browse/HBASE-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578374#comment-14578374
 ] 

Vladimir Rodionov commented on HBASE-13865:
---

It does, but I would move default (4) to HConstants. 

 Default value of hbase.hregion.memstore.block.multiplier in HBase book is 
 wrong
 ---

 Key: HBASE-13865
 URL: https://issues.apache.org/jira/browse/HBASE-13865
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.0
Reporter: Vladimir Rodionov
Priority: Trivial
 Attachments: HBASE-13865.1.patch


 Its 4 in the book and 2 in a current master. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13871) Change RegionScannerImpl to deal with Cell instead of byte[], int, int


 [ 
https://issues.apache.org/jira/browse/HBASE-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-13871:
---
Fix Version/s: 2.0.0
   Status: Patch Available  (was: Open)

 Change RegionScannerImpl to deal with Cell instead of byte[], int, int
 --

 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13871.patch


 This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13871) Change RegionScannerImpl to deal with Cell instead of byte[], int, int


 [ 
https://issues.apache.org/jira/browse/HBASE-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-13871:
---
Attachment: HBASE-13871.patch

 Change RegionScannerImpl to deal with Cell instead of byte[], int, int
 --

 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13871.patch


 This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13840) Master UI should rename column labels from KVs to Cell

2015-06-09 Thread Lars George (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars George updated HBASE-13840:

Component/s: regionserver

 Master UI should rename column labels from KVs to Cell
 --

 Key: HBASE-13840
 URL: https://issues.apache.org/jira/browse/HBASE-13840
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, UI
Affects Versions: 1.1.0
Reporter: Lars George
 Fix For: 2.0.0, 1.2.0


 Currently the master UI still refers to KVs in some of the tables. We should 
 do a sweep and rename to Cell.
 Also do for RS templates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13840) Master UI should rename column labels from KVs to Cell

2015-06-09 Thread Lars George (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578977#comment-14578977
 ] 

Lars George commented on HBASE-13840:
-

This also applies to the region server UI. I will rename the issue.

 Master UI should rename column labels from KVs to Cell
 --

 Key: HBASE-13840
 URL: https://issues.apache.org/jira/browse/HBASE-13840
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, UI
Affects Versions: 1.1.0
Reporter: Lars George
 Fix For: 2.0.0, 1.2.0


 Currently the master UI still refers to KVs in some of the tables. We should 
 do a sweep and rename to Cell.
 Also do for RS templates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13873) LoadTestTool addAuthInfoToConf throws UnsupportedOperationException

2015-06-09 Thread sunyerui (JIRA)

sunyerui created HBASE-13873:


 Summary: LoadTestTool addAuthInfoToConf throws 
UnsupportedOperationException
 Key: HBASE-13873
 URL: https://issues.apache.org/jira/browse/HBASE-13873
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Affects Versions: 0.98.13
Reporter: sunyerui
 Fix For: 0.98.14


When run IntegrationTestIngestWithACL on distributed clusters with kerberos 
security enabled, the method addAuthInfoToConf() in LoadTestTool will be 
invoked and throws UnsupportedOperationException, stack as follows:
{panel}
2015-06-09 22:15:33,605 ERROR \[main\] util.AbstractHBaseTool: Error running 
command-line tool
java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at 
org.apache.hadoop.hbase.util.LoadTestTool.addAuthInfoToConf(LoadTestTool.java:811)
at 
org.apache.hadoop.hbase.util.LoadTestTool.loadTable(LoadTestTool.java:516)
at 
org.apache.hadoop.hbase.util.LoadTestTool.doWork(LoadTestTool.java:479)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runIngestTest(IntegrationTestIngest.java:151)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.internalRunIngestTest(IntegrationTestIngest.java:114)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runTestFromCommandLine(IntegrationTestIngest.java:97)
at 
org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:115)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.IntegrationTestIngestWithACL.main(IntegrationTestIngestWithACL.java:136)
{panel}

The corresponding code is below and the reason is obvious. Arrays.asList return 
a java.util.Arrays$ArrayList but not java.util.ArrayList. Both of them are 
inherited from java.util.AbstractList, but the former didn't override the 
method add(), so the parent method java.util.AbstractList.add() will be invoked 
and the exception threw.
{code}
private void addAuthInfoToConf(Properties authConfig, Configuration conf, 
String owner,
  String userList) throws IOException {
ListString users = Arrays.asList(userList.split(,));
users.add(owner);
...
  }
{code}

Does anyone occurred on this? I think it's an obvious bug but no one report it, 
so please tell me if I misunderstanding it. If it's actually a bug here, then 
it can be fixed very easy as below:
{code}
 ListString users = new ArraListString(Arrays.asList(userList.split(,)));
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13873) LoadTestTool addAuthInfoToConf throws UnsupportedOperationException


[ 
https://issues.apache.org/jira/browse/HBASE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579114#comment-14579114
 ] 

Anoop Sam John commented on HBASE-13873:


Planning to put a patch?

 LoadTestTool addAuthInfoToConf throws UnsupportedOperationException
 ---

 Key: HBASE-13873
 URL: https://issues.apache.org/jira/browse/HBASE-13873
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Affects Versions: 0.98.13
Reporter: sunyerui
 Fix For: 0.98.14


 When run IntegrationTestIngestWithACL on distributed clusters with kerberos 
 security enabled, the method addAuthInfoToConf() in LoadTestTool will be 
 invoked and throws UnsupportedOperationException, stack as follows:
 {code}
 2015-06-09 22:15:33,605 ERROR [main] util.AbstractHBaseTool: Error running 
 command-line tool
 java.lang.UnsupportedOperationException
 at java.util.AbstractList.add(AbstractList.java:148)
 at java.util.AbstractList.add(AbstractList.java:108)
 at 
 org.apache.hadoop.hbase.util.LoadTestTool.addAuthInfoToConf(LoadTestTool.java:811)
 at 
 org.apache.hadoop.hbase.util.LoadTestTool.loadTable(LoadTestTool.java:516)
 at 
 org.apache.hadoop.hbase.util.LoadTestTool.doWork(LoadTestTool.java:479)
 at 
 org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
 at 
 org.apache.hadoop.hbase.IntegrationTestIngest.runIngestTest(IntegrationTestIngest.java:151)
 at 
 org.apache.hadoop.hbase.IntegrationTestIngest.internalRunIngestTest(IntegrationTestIngest.java:114)
 at 
 org.apache.hadoop.hbase.IntegrationTestIngest.runTestFromCommandLine(IntegrationTestIngest.java:97)
 at 
 org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:115)
 at 
 org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.hbase.IntegrationTestIngestWithACL.main(IntegrationTestIngestWithACL.java:136)
 {code}
 The corresponding code is below and the reason is obvious. Arrays.asList 
 return a java.util.Arrays$ArrayList but not java.util.ArrayList. Both of them 
 are inherited from java.util.AbstractList, but the former didn't override the 
 method add(), so the parent method java.util.AbstractList.add() will be 
 invoked and the exception threw.
 {code}
 private void addAuthInfoToConf(Properties authConfig, Configuration conf, 
 String owner,
   String userList) throws IOException {
 ListString users = Arrays.asList(userList.split(,));
 users.add(owner);
 ...
   }
 {code}
 Does anyone occurred on this? I think it's an obvious bug but no one report 
 it, so please tell me if I misunderstanding it. If it's actually a bug here, 
 then it can be fixed very easy as below:
 {code}
  ListString users = new 
 ArrayListString(Arrays.asList(userList.split(,)));
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13873) LoadTestTool addAuthInfoToConf throws UnsupportedOperationException

2015-06-09 Thread sunyerui (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunyerui updated HBASE-13873:
-
Description: 
When run IntegrationTestIngestWithACL on distributed clusters with kerberos 
security enabled, the method addAuthInfoToConf() in LoadTestTool will be 
invoked and throws UnsupportedOperationException, stack as follows:
{code}
2015-06-09 22:15:33,605 ERROR [main] util.AbstractHBaseTool: Error running 
command-line tool
java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at 
org.apache.hadoop.hbase.util.LoadTestTool.addAuthInfoToConf(LoadTestTool.java:811)
at 
org.apache.hadoop.hbase.util.LoadTestTool.loadTable(LoadTestTool.java:516)
at 
org.apache.hadoop.hbase.util.LoadTestTool.doWork(LoadTestTool.java:479)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runIngestTest(IntegrationTestIngest.java:151)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.internalRunIngestTest(IntegrationTestIngest.java:114)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runTestFromCommandLine(IntegrationTestIngest.java:97)
at 
org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:115)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.IntegrationTestIngestWithACL.main(IntegrationTestIngestWithACL.java:136)
{code}

The corresponding code is below and the reason is obvious. Arrays.asList return 
a java.util.Arrays$ArrayList but not java.util.ArrayList. Both of them are 
inherited from java.util.AbstractList, but the former didn't override the 
method add(), so the parent method java.util.AbstractList.add() will be invoked 
and the exception threw.
{code}
private void addAuthInfoToConf(Properties authConfig, Configuration conf, 
String owner,
  String userList) throws IOException {
ListString users = Arrays.asList(userList.split(,));
users.add(owner);
...
  }
{code}

Does anyone occurred on this? I think it's an obvious bug but no one report it, 
so please tell me if I misunderstanding it. If it's actually a bug here, then 
it can be fixed very easy as below:
{code}
 ListString users = new ArrayListString(Arrays.asList(userList.split(,)));
{code}

  was:
When run IntegrationTestIngestWithACL on distributed clusters with kerberos 
security enabled, the method addAuthInfoToConf() in LoadTestTool will be 
invoked and throws UnsupportedOperationException, stack as follows:
{panel}
2015-06-09 22:15:33,605 ERROR \[main\] util.AbstractHBaseTool: Error running 
command-line tool
java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at 
org.apache.hadoop.hbase.util.LoadTestTool.addAuthInfoToConf(LoadTestTool.java:811)
at 
org.apache.hadoop.hbase.util.LoadTestTool.loadTable(LoadTestTool.java:516)
at 
org.apache.hadoop.hbase.util.LoadTestTool.doWork(LoadTestTool.java:479)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runIngestTest(IntegrationTestIngest.java:151)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.internalRunIngestTest(IntegrationTestIngest.java:114)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runTestFromCommandLine(IntegrationTestIngest.java:97)
at 
org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:115)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.IntegrationTestIngestWithACL.main(IntegrationTestIngestWithACL.java:136)
{panel}

The corresponding code is below and the reason is obvious. Arrays.asList return 
a java.util.Arrays$ArrayList but not java.util.ArrayList. Both of them are 
inherited from java.util.AbstractList, but the former didn't override the 
method add(), so the parent method java.util.AbstractList.add() will be invoked 
and the exception threw.
{code}
private void addAuthInfoToConf(Properties authConfig, Configuration conf, 
String owner,
  String userList) throws IOException {
ListString users = Arrays.asList(userList.split(,));
users.add(owner);
...
  }
{code}

Does anyone occurred on this? I think it's an obvious bug but no one report it, 
so please tell me if I misunderstanding it. If it's actually a bug here, then 
it can be fixed very easy as below:
{code}
 ListString users = new

[jira] [Updated] (HBASE-13840) Server UIs should rename column labels from KVs to Cell

2015-06-09 Thread Lars George (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars George updated HBASE-13840:

Summary: Server UIs should rename column labels from KVs to Cell  (was: 
Master UI should rename column labels from KVs to Cell)

 Server UIs should rename column labels from KVs to Cell
 ---

 Key: HBASE-13840
 URL: https://issues.apache.org/jira/browse/HBASE-13840
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver, UI
Affects Versions: 1.1.0
Reporter: Lars George
 Fix For: 2.0.0, 1.2.0


 Currently the master UI still refers to KVs in some of the tables. We should 
 do a sweep and rename to Cell.
 Also do for RS templates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13873) LoadTestTool addAuthInfoToConf throws UnsupportedOperationException

2015-06-09 Thread sunyerui (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunyerui updated HBASE-13873:
-
Description: 
When run IntegrationTestIngestWithACL on distributed clusters with kerberos 
security enabled, the method addAuthInfoToConf() in LoadTestTool will be 
invoked and throws UnsupportedOperationException, stack as follows:
{panel}
2015-06-09 22:15:33,605 ERROR \[main\] util.AbstractHBaseTool: Error running 
command-line tool
java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at 
org.apache.hadoop.hbase.util.LoadTestTool.addAuthInfoToConf(LoadTestTool.java:811)
at 
org.apache.hadoop.hbase.util.LoadTestTool.loadTable(LoadTestTool.java:516)
at 
org.apache.hadoop.hbase.util.LoadTestTool.doWork(LoadTestTool.java:479)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runIngestTest(IntegrationTestIngest.java:151)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.internalRunIngestTest(IntegrationTestIngest.java:114)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runTestFromCommandLine(IntegrationTestIngest.java:97)
at 
org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:115)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.IntegrationTestIngestWithACL.main(IntegrationTestIngestWithACL.java:136)
{panel}

The corresponding code is below and the reason is obvious. Arrays.asList return 
a java.util.Arrays$ArrayList but not java.util.ArrayList. Both of them are 
inherited from java.util.AbstractList, but the former didn't override the 
method add(), so the parent method java.util.AbstractList.add() will be invoked 
and the exception threw.
{code}
private void addAuthInfoToConf(Properties authConfig, Configuration conf, 
String owner,
  String userList) throws IOException {
ListString users = Arrays.asList(userList.split(,));
users.add(owner);
...
  }
{code}

Does anyone occurred on this? I think it's an obvious bug but no one report it, 
so please tell me if I misunderstanding it. If it's actually a bug here, then 
it can be fixed very easy as below:
{code}
 ListString users = new ArrayListString(Arrays.asList(userList.split(,)));
{code}

  was:
When run IntegrationTestIngestWithACL on distributed clusters with kerberos 
security enabled, the method addAuthInfoToConf() in LoadTestTool will be 
invoked and throws UnsupportedOperationException, stack as follows:
{panel}
2015-06-09 22:15:33,605 ERROR \[main\] util.AbstractHBaseTool: Error running 
command-line tool
java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at 
org.apache.hadoop.hbase.util.LoadTestTool.addAuthInfoToConf(LoadTestTool.java:811)
at 
org.apache.hadoop.hbase.util.LoadTestTool.loadTable(LoadTestTool.java:516)
at 
org.apache.hadoop.hbase.util.LoadTestTool.doWork(LoadTestTool.java:479)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runIngestTest(IntegrationTestIngest.java:151)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.internalRunIngestTest(IntegrationTestIngest.java:114)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.runTestFromCommandLine(IntegrationTestIngest.java:97)
at 
org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:115)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.IntegrationTestIngestWithACL.main(IntegrationTestIngestWithACL.java:136)
{panel}

The corresponding code is below and the reason is obvious. Arrays.asList return 
a java.util.Arrays$ArrayList but not java.util.ArrayList. Both of them are 
inherited from java.util.AbstractList, but the former didn't override the 
method add(), so the parent method java.util.AbstractList.add() will be invoked 
and the exception threw.
{code}
private void addAuthInfoToConf(Properties authConfig, Configuration conf, 
String owner,
  String userList) throws IOException {
ListString users = Arrays.asList(userList.split(,));
users.add(owner);
...
  }
{code}

Does anyone occurred on this? I think it's an obvious bug but no one report it, 
so please tell me if I misunderstanding it. If it's actually a bug here, then 
it can be fixed very easy as below:
{code}
 ListString users = new

[jira] [Commented] (HBASE-13329) Memstore flush fails if data has always the same value, breaking the region


[ 
https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579205#comment-14579205
 ] 

Ted Yu commented on HBASE-13329:


lgtm

 Memstore flush fails if data has always the same value, breaking the region
 ---

 Key: HBASE-13329
 URL: https://issues.apache.org/jira/browse/HBASE-13329
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.1
 Environment: linux-debian-jessie
 ec2 - t2.micro instances
Reporter: Ruben Aguiar
Assignee: Ruben Aguiar
Priority: Critical
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1

 Attachments: 13329-v1.patch


 While trying to benchmark my opentsdb cluster, I've created a script that 
 sends to hbase always the same value (in this case 1). After a few minutes, 
 the whole region server crashes and the region itself becomes impossible to 
 open again (cannot assign or unassign). After some investigation, what I saw 
 on the logs is that when a Memstore flush is called on a large region (128mb) 
 the process errors, killing the regionserver. On restart, replaying the edits 
 generates the same error, making the region unavailable. Tried to manually 
 unassign, assign or close_region. That didn't work because the code that 
 reads/replays it crashes.
 From my investigation this seems to be an overflow issue. The logs show that 
 the function getMinimumMidpointArray tried to access index -32743 of an 
 array, extremely close to the minimum short value in Java. Upon investigation 
 of the source code, it seems an index short is used, being incremented as 
 long as the two vectors are the same, probably making it overflow on large 
 vectors with equal data. Changing it to int should solve the problem.
 Here follows the hadoop logs of when the regionserver went down. Any help is 
 appreciated. Any other information you need please do tell me:
 2015-03-24 18:00:56,187 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Rolled WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
  with entries=143, filesize=134.70 MB; new WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
 2015-03-24 18:00:56,188 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Archiving 
 hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
  to 
 hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
 2015-03-24 18:04:35,722 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
 Started memstore flush for 
 tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region 
 memstore size 128.04 MB
 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: 
 ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. 
 Forcing server shutdown
 org.apache.hadoop.hbase.DroppedSnapshotException: region: 
 tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
   at 
 org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
   at 
 org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
   at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
   at

[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master


 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
Attachment: HBASE-13845-branch-1.patch

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, HBASE-13845-branch-1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13871) Change RegionScannerImpl to deal with Cell instead of byte[], int, int


[ 
https://issues.apache.org/jira/browse/HBASE-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579239#comment-14579239
 ] 

Anoop Sam John commented on HBASE-13871:


Yes this class need not be top level.. Let me see how we can make it inner.
bq.5252 this.isScan = scan.isGetScan() ? -1 : 0;5252this.isScan = 
scan.isGetScan() ? 1 : 0;
Yes this change is intended. Pls see the compare in isStopROw is changed. So we 
need this change.

 Change RegionScannerImpl to deal with Cell instead of byte[], int, int
 --

 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13871.patch, HBASE-13871.patch


 This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13845) Expire of one region server carrying meta can bring down the master

2015-06-09 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579253#comment-14579253
 ] 

Nick Dimiduk commented on HBASE-13845:
--

Nice one [~jerryhe]

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, HBASE-13845-branch-1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13329) Memstore flush fails if data has always the same value, breaking the region

2015-06-09 Thread Ruben Aguiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579196#comment-14579196
 ] 

Ruben Aguiar commented on HBASE-13329:
--

Done

 Memstore flush fails if data has always the same value, breaking the region
 ---

 Key: HBASE-13329
 URL: https://issues.apache.org/jira/browse/HBASE-13329
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.1
 Environment: linux-debian-jessie
 ec2 - t2.micro instances
Reporter: Ruben Aguiar
Assignee: Ruben Aguiar
Priority: Critical
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1

 Attachments: 13329-v1.patch


 While trying to benchmark my opentsdb cluster, I've created a script that 
 sends to hbase always the same value (in this case 1). After a few minutes, 
 the whole region server crashes and the region itself becomes impossible to 
 open again (cannot assign or unassign). After some investigation, what I saw 
 on the logs is that when a Memstore flush is called on a large region (128mb) 
 the process errors, killing the regionserver. On restart, replaying the edits 
 generates the same error, making the region unavailable. Tried to manually 
 unassign, assign or close_region. That didn't work because the code that 
 reads/replays it crashes.
 From my investigation this seems to be an overflow issue. The logs show that 
 the function getMinimumMidpointArray tried to access index -32743 of an 
 array, extremely close to the minimum short value in Java. Upon investigation 
 of the source code, it seems an index short is used, being incremented as 
 long as the two vectors are the same, probably making it overflow on large 
 vectors with equal data. Changing it to int should solve the problem.
 Here follows the hadoop logs of when the regionserver went down. Any help is 
 appreciated. Any other information you need please do tell me:
 2015-03-24 18:00:56,187 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Rolled WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
  with entries=143, filesize=134.70 MB; new WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
 2015-03-24 18:00:56,188 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Archiving 
 hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
  to 
 hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
 2015-03-24 18:04:35,722 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
 Started memstore flush for 
 tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region 
 memstore size 128.04 MB
 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: 
 ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. 
 Forcing server shutdown
 org.apache.hadoop.hbase.DroppedSnapshotException: region: 
 tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
   at 
 org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
   at 
 org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
   at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
   at

[jira] [Commented] (HBASE-13845) Expire of one region server carrying meta can bring down the master


[ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579219#comment-14579219
 ] 

Jerry He commented on HBASE-13845:
--

Thanks for the review, [~stack]

Committed to branch-1.1 and branch-1.

I am thinking about putting only the test case into master branch, as a 
regression guide. What do you think, [~stack]?  

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, HBASE-13845-branch-1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13829) Add more ThrottleType


[ 
https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579147#comment-14579147
 ] 

Hudson commented on HBASE-13829:


FAILURE: Integrated in HBase-TRUNK #6555 (See 
[https://builds.apache.org/job/HBase-TRUNK/6555/])
HBASE-13829 Add more ThrottleType (Guanghao Zhang) (tedyu: rev 
6cc42c8cd16d01cded9936bf53bf35e6e2ff5b66)
* hbase-shell/src/main/ruby/shell/commands/set_quota.rb
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestQuotaThrottle.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/quotas/ThrottleSettings.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/quotas/ThrottleType.java
* src/main/asciidoc/_chapters/ops_mgt.adoc
* hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/quotas/QuotaSettingsFactory.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestQuotaAdmin.java
* hbase-shell/src/main/ruby/hbase/quotas.rb


 Add more ThrottleType
 -

 Key: HBASE-13829
 URL: https://issues.apache.org/jira/browse/HBASE-13829
 Project: HBase
  Issue Type: Improvement
  Components: Client
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13829-v1.patch, HBASE-13829-v2.patch, 
 HBASE-13829.patch


 HBASE-11598 add simple throttling for hbase. But in the client, it doesn't 
 support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, 
 READ_SIZE.
 REVIEW BOARD: https://reviews.apache.org/r/34989/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13871) Change RegionScannerImpl to deal with Cell instead of byte[], int, int


[ 
https://issues.apache.org/jira/browse/HBASE-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579228#comment-14579228
 ] 

stack commented on HBASE-13871:
---

Do we have to have a class named FirstOnRowFakeCell at top level of our class 
hierarchy? Is it only used in CellUtil? Could it be an inner class of it?

Any reason for this change?

5252  this.isScan = scan.isGetScan() ? -1 : 0;  5252  
this.isScan = scan.isGetScan() ? 1 : 0;

Otherwise, seems good.

 Change RegionScannerImpl to deal with Cell instead of byte[], int, int
 --

 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13871.patch, HBASE-13871.patch


 This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13329) Memstore flush fails if data has always the same value, breaking the region

2015-06-09 Thread Ruben Aguiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Aguiar updated HBASE-13329:
-
Attachment: 13329-v1.patch

 Memstore flush fails if data has always the same value, breaking the region
 ---

 Key: HBASE-13329
 URL: https://issues.apache.org/jira/browse/HBASE-13329
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.1
 Environment: linux-debian-jessie
 ec2 - t2.micro instances
Reporter: Ruben Aguiar
Assignee: Ruben Aguiar
Priority: Critical
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1

 Attachments: 13329-v1.patch


 While trying to benchmark my opentsdb cluster, I've created a script that 
 sends to hbase always the same value (in this case 1). After a few minutes, 
 the whole region server crashes and the region itself becomes impossible to 
 open again (cannot assign or unassign). After some investigation, what I saw 
 on the logs is that when a Memstore flush is called on a large region (128mb) 
 the process errors, killing the regionserver. On restart, replaying the edits 
 generates the same error, making the region unavailable. Tried to manually 
 unassign, assign or close_region. That didn't work because the code that 
 reads/replays it crashes.
 From my investigation this seems to be an overflow issue. The logs show that 
 the function getMinimumMidpointArray tried to access index -32743 of an 
 array, extremely close to the minimum short value in Java. Upon investigation 
 of the source code, it seems an index short is used, being incremented as 
 long as the two vectors are the same, probably making it overflow on large 
 vectors with equal data. Changing it to int should solve the problem.
 Here follows the hadoop logs of when the regionserver went down. Any help is 
 appreciated. Any other information you need please do tell me:
 2015-03-24 18:00:56,187 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Rolled WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
  with entries=143, filesize=134.70 MB; new WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
 2015-03-24 18:00:56,188 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Archiving 
 hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
  to 
 hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
 2015-03-24 18:04:35,722 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
 Started memstore flush for 
 tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region 
 memstore size 128.04 MB
 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: 
 ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. 
 Forcing server shutdown
 org.apache.hadoop.hbase.DroppedSnapshotException: region: 
 tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
   at 
 org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
   at 
 org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
   at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
   at

[jira] [Commented] (HBASE-13845) Expire of one region server carrying meta can bring down the master


[ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579236#comment-14579236
 ] 

Hadoop QA commented on HBASE-13845:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12738613/HBASE-13845-branch-1.patch
  against branch-1 branch at commit 6cc42c8cd16d01cded9936bf53bf35e6e2ff5b66.
  ATTACHMENT ID: 12738613

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14346//console

This message is automatically generated.

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, HBASE-13845-branch-1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13845) Expire of one region server carrying meta can bring down the master


[ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579412#comment-14579412
 ] 

Jerry He commented on HBASE-13845:
--

it seems there was a glitch applying the branch-1 patch from Hadoop QA.
It was applied fine with 'git apply' locally. There was mentioning on the hbase 
dev mailing list recently as I recall?

Committed the test case to the master branch. 

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, 
 HBASE-13845-branch-1.patch, HBASE-13845-master-test-case-only.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-13378) RegionScannerImpl synchronized for READ_UNCOMMITTED Isolation Levels

2015-06-09 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579309#comment-14579309
 ] 

Andrew Purtell edited comment on HBASE-13378 at 6/9/15 5:53 PM:


{quote}
bq. Up to this point READ_UNCOMMITED meant: You might see partially finished 
rows.
bq. Now it means: You might see partially finished rows, and you may not see 
cells that have existed when the scanner started.
This sounds to me like an incompatible change for a patch release
{quote}

Precisely, why does this sound like an *incompatible* change?


was (Author: apurtell):
{quote}
bq. Up to this point READ_UNCOMMITED meant: You might see partially finished 
rows.
Now it means: You might see partially finished rows, and you may not see cells 
that have existed when the scanner started.
This sounds to me like an incompatible change for a patch release
{quote}

Precisely, why does this sound like an *incompatible* change?

 RegionScannerImpl synchronized for READ_UNCOMMITTED Isolation Levels
 

 Key: HBASE-13378
 URL: https://issues.apache.org/jira/browse/HBASE-13378
 Project: HBase
  Issue Type: New Feature
Reporter: John Leach
Assignee: John Leach
Priority: Minor
 Attachments: HBASE-13378.patch, HBASE-13378.txt

   Original Estimate: 2h
  Time Spent: 2h
  Remaining Estimate: 0h

 This block of code below coupled with the close method could be changed so 
 that READ_UNCOMMITTED does not synchronize.  
 {CODE:JAVA}
   // synchronize on scannerReadPoints so that nobody calculates
   // getSmallestReadPoint, before scannerReadPoints is updated.
   IsolationLevel isolationLevel = scan.getIsolationLevel();
   synchronized(scannerReadPoints) {
 this.readPt = getReadpoint(isolationLevel);
 scannerReadPoints.put(this, this.readPt);
   }
 {CODE}
 This hotspots for me under heavy get requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master


 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, 
 HBASE-13845-branch-1.patch, HBASE-13845-master-test-case-only.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13845) Expire of one region server carrying meta can bring down the master


[ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579266#comment-14579266
 ] 

stack commented on HBASE-13845:
---

bq. I am thinking about putting only the test case into master branch, as a 
regression guide. What do you think, stack?

I think it is a good idea. +1

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, HBASE-13845-branch-1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master


 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
Attachment: HBASE-13845-master-test-case-only.patch

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, 
 HBASE-13845-branch-1.patch, HBASE-13845-master-test-case-only.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13378) RegionScannerImpl synchronized for READ_UNCOMMITTED Isolation Levels

2015-06-09 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579309#comment-14579309
 ] 

Andrew Purtell commented on HBASE-13378:


{quote}
bq. Up to this point READ_UNCOMMITED meant: You might see partially finished 
rows.
Now it means: You might see partially finished rows, and you may not see cells 
that have existed when the scanner started.
This sounds to me like an incompatible change for a patch release
{quote}

Precisely, why does this sound like an *incompatible* change?

 RegionScannerImpl synchronized for READ_UNCOMMITTED Isolation Levels
 

 Key: HBASE-13378
 URL: https://issues.apache.org/jira/browse/HBASE-13378
 Project: HBase
  Issue Type: New Feature
Reporter: John Leach
Assignee: John Leach
Priority: Minor
 Attachments: HBASE-13378.patch, HBASE-13378.txt

   Original Estimate: 2h
  Time Spent: 2h
  Remaining Estimate: 0h

 This block of code below coupled with the close method could be changed so 
 that READ_UNCOMMITTED does not synchronize.  
 {CODE:JAVA}
   // synchronize on scannerReadPoints so that nobody calculates
   // getSmallestReadPoint, before scannerReadPoints is updated.
   IsolationLevel isolationLevel = scan.getIsolationLevel();
   synchronized(scannerReadPoints) {
 this.readPt = getReadpoint(isolationLevel);
 scannerReadPoints.put(this, this.readPt);
   }
 {CODE}
 This hotspots for me under heavy get requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13845) Expire of one region server carrying meta can bring down the master


[ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579382#comment-14579382
 ] 

Hudson commented on HBASE-13845:


SUCCESS: Integrated in HBase-1.2 #140 (See 
[https://builds.apache.org/job/HBase-1.2/140/])
HBASE-13845 Expire of one region server carrying meta can bring down the master 
(jerryjch: rev d37d9c43de6c919a58ff34548a36af1e22e6cc2a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMetaShutdownHandler.java


 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, HBASE-13845-branch-1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13873) LoadTestTool addAuthInfoToConf throws UnsupportedOperationException

2015-06-09 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579318#comment-14579318
 ] 

Andrew Purtell commented on HBASE-13873:


Would you mind providing a patch [~sunyerui] ? 

 LoadTestTool addAuthInfoToConf throws UnsupportedOperationException
 ---

 Key: HBASE-13873
 URL: https://issues.apache.org/jira/browse/HBASE-13873
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Affects Versions: 0.98.13
Reporter: sunyerui
 Fix For: 0.98.14


 When run IntegrationTestIngestWithACL on distributed clusters with kerberos 
 security enabled, the method addAuthInfoToConf() in LoadTestTool will be 
 invoked and throws UnsupportedOperationException, stack as follows:
 {code}
 2015-06-09 22:15:33,605 ERROR [main] util.AbstractHBaseTool: Error running 
 command-line tool
 java.lang.UnsupportedOperationException
 at java.util.AbstractList.add(AbstractList.java:148)
 at java.util.AbstractList.add(AbstractList.java:108)
 at 
 org.apache.hadoop.hbase.util.LoadTestTool.addAuthInfoToConf(LoadTestTool.java:811)
 at 
 org.apache.hadoop.hbase.util.LoadTestTool.loadTable(LoadTestTool.java:516)
 at 
 org.apache.hadoop.hbase.util.LoadTestTool.doWork(LoadTestTool.java:479)
 at 
 org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
 at 
 org.apache.hadoop.hbase.IntegrationTestIngest.runIngestTest(IntegrationTestIngest.java:151)
 at 
 org.apache.hadoop.hbase.IntegrationTestIngest.internalRunIngestTest(IntegrationTestIngest.java:114)
 at 
 org.apache.hadoop.hbase.IntegrationTestIngest.runTestFromCommandLine(IntegrationTestIngest.java:97)
 at 
 org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:115)
 at 
 org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.hbase.IntegrationTestIngestWithACL.main(IntegrationTestIngestWithACL.java:136)
 {code}
 The corresponding code is below and the reason is obvious. Arrays.asList 
 return a java.util.Arrays$ArrayList but not java.util.ArrayList. Both of them 
 are inherited from java.util.AbstractList, but the former didn't override the 
 method add(), so the parent method java.util.AbstractList.add() will be 
 invoked and the exception threw.
 {code}
 private void addAuthInfoToConf(Properties authConfig, Configuration conf, 
 String owner,
   String userList) throws IOException {
 ListString users = Arrays.asList(userList.split(,));
 users.add(owner);
 ...
   }
 {code}
 Does anyone occurred on this? I think it's an obvious bug but no one report 
 it, so please tell me if I misunderstanding it. If it's actually a bug here, 
 then it can be fixed very easy as below:
 {code}
  ListString users = new 
 ArrayListString(Arrays.asList(userList.split(,)));
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HBASE-13862) TestRegionRebalancing is flaky avain

2015-06-09 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov reassigned HBASE-13862:
---

Assignee: Sergey Soldatov

 TestRegionRebalancing is flaky avain
 

 Key: HBASE-13862
 URL: https://issues.apache.org/jira/browse/HBASE-13862
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0
Reporter: Mikhail Antonov
Assignee: Sergey Soldatov

 I can reproduce it by running mvn test -Dtest=TestRegionRebalancing on fresh 
 master about 1 out of 3-4 runs.
 {code}
 unning org.apache.hadoop.hbase.TestRegionRebalancing
 2015-06-08 12:00:52.125 java[45610:5873722] Unable to load realm info from 
 SCDynamicStore
 Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 26.743 sec 
  FAILURE! - in org.apache.hadoop.hbase.TestRegionRebalancing
 testRebalanceOnRegionServerNumberChange[0](org.apache.hadoop.hbase.TestRegionRebalancing)
   Time elapsed: 15.599 sec   FAILURE!
 java.lang.AssertionError: null
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144)
 testRebalanceOnRegionServerNumberChange[1](org.apache.hadoop.hbase.TestRegionRebalancing)
   Time elapsed: 10.671 sec   FAILURE!
 java.lang.AssertionError: null
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144)
 Results :
 Failed tests:
   TestRegionRebalancing.testRebalanceOnRegionServerNumberChange:144 null
   TestRegionRebalancing.testRebalanceOnRegionServerNumberChange:144 null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13871) Change RegionScannerImpl to deal with Cell instead of byte[], int, int


[ 
https://issues.apache.org/jira/browse/HBASE-13871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579279#comment-14579279
 ] 

stack commented on HBASE-13871:
---

bq.  Pls see the compare in isStopROw is changed. So we need this change
I see. Thanks.

 Change RegionScannerImpl to deal with Cell instead of byte[], int, int
 --

 Key: HBASE-13871
 URL: https://issues.apache.org/jira/browse/HBASE-13871
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-13871.patch, HBASE-13871.patch


 This is also a sub item for splitting HBASE-13387 into smaller chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap

2015-06-09 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579351#comment-14579351
 ] 

Esteban Gutierrez commented on HBASE-13874:
---

Right HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD as a percentage doesn't 
makes sense. 20% of heap could be anywhere between 4GB to 20GB with large heap 
sizes. The easiest way is to make it configurable but we are just adding 
another knob for not being able to track all resources consumption.

 Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when 
 big heap
 --

 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack

 Fix this in HBaseConfiguration:
 {code}
  79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
  80   float globalMemstoreLimit = 
 conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
  81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
  82   float blockCacheUpperLimit =
  83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
  84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
  85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
  86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
  87(int)(CONVERT_TO_PERCENTAGE *
  88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) 
 {
  89   throw new RuntimeException(
  90 Current heap configuration for MemStore and BlockCache 
 exceeds  +
  91 the threshold required for successful cluster operation.  +
  92 The combined value cannot exceed 0.8. Please check  +
  93 the settings for 
 hbase.regionserver.global.memstore.upperLimit and  +
  94 hfile.block.cache.size in your configuration.  +
  95 hbase.regionserver.global.memstore.upperLimit is  +
  96 globalMemstoreLimit +
  97  hfile.block.cache.size is  + blockCacheUpperLimit);
  98   }
  99   }
 {code}
 Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over 
 for hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap


[ 
https://issues.apache.org/jira/browse/HBASE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579358#comment-14579358
 ] 

Vladimir Rodionov commented on HBASE-13874:
---

Should log WARN if HBase own heap is below, say 4GB. No errors should be thrown 
in any case. If lower minimum (4GB?), then thresholds needs to be adjusted and 
WARM message should be logged.

 Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when 
 big heap
 --

 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack

 Fix this in HBaseConfiguration:
 {code}
  79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
  80   float globalMemstoreLimit = 
 conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
  81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
  82   float blockCacheUpperLimit =
  83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
  84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
  85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
  86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
  87(int)(CONVERT_TO_PERCENTAGE *
  88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) 
 {
  89   throw new RuntimeException(
  90 Current heap configuration for MemStore and BlockCache 
 exceeds  +
  91 the threshold required for successful cluster operation.  +
  92 The combined value cannot exceed 0.8. Please check  +
  93 the settings for 
 hbase.regionserver.global.memstore.upperLimit and  +
  94 hfile.block.cache.size in your configuration.  +
  95 hbase.regionserver.global.memstore.upperLimit is  +
  96 globalMemstoreLimit +
  97  hfile.block.cache.size is  + blockCacheUpperLimit);
  98   }
  99   }
 {code}
 Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over 
 for hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap

stack created HBASE-13874:
-

 Summary: Fix 0.8 being hardcoded sum of blockcache + memstore; 
doesn't make sense when big heap
 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack


Fix this in HBaseConfiguration:

{code}
 79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
 80   float globalMemstoreLimit = 
conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
 81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
 82   float blockCacheUpperLimit =
 83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
 84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
 85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
 86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
 87(int)(CONVERT_TO_PERCENTAGE *
 88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) {
 89   throw new RuntimeException(
 90 Current heap configuration for MemStore and BlockCache exceeds 
 +
 91 the threshold required for successful cluster operation.  +
 92 The combined value cannot exceed 0.8. Please check  +
 93 the settings for hbase.regionserver.global.memstore.upperLimit 
and  +
 94 hfile.block.cache.size in your configuration.  +
 95 hbase.regionserver.global.memstore.upperLimit is  +
 96 globalMemstoreLimit +
 97  hfile.block.cache.size is  + blockCacheUpperLimit);
 98   }
 99   }
{code}

Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over for 
hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12465) HBase master start fails due to incorrect file creations

2015-06-09 Thread Alicia Ying Shu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579354#comment-14579354
 ] 

Alicia Ying Shu commented on HBASE-12465:
-

[~clayb] Which version of Hbase are you using? Is it a secure or unsecure 
cluster? Thanks.

 HBase master start fails due to incorrect file creations
 

 Key: HBASE-12465
 URL: https://issues.apache.org/jira/browse/HBASE-12465
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.96.0
 Environment: Ubuntu
Reporter: Biju Nair
Assignee: Alicia Ying Shu
  Labels: hbase, hbase-bulkload

 - Start of HBase master fails due to the following error found in the log.
 2014-11-11 20:25:58,860 WARN org.apache.hadoop.hbase.backup.HFileArchiver: 
 Failed to archive class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePa
 th,file:hdfs:///hbase/.tmp/data/default/tbl/00820520f5cb7839395e83f40c8d97c2/e/52bf9eee7a27460c8d9e2a26fa43c918_SeqId_282271246_
  on try #1
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=hbase,access=WRITE,inode=/hbase/.tmp/data/default/tbl/00820520f5cb7839395e83f40c8d97c2/e/52bf9eee7a27460c8d9e2a26fa43c918_SeqId_282271246_:devuser:supergroup:-rwxr-xr-x
 -  All the files that hbase master was complaining about are created under an 
 users user-id instead on hbase user resulting in incorrect access 
 permission for the master to act on.
 - Looks like this was due to bulk load done using LoadIncrementalHFiles 
 program.
 - HBASE-12052 is another scenario similar to this one. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap

2015-06-09 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579418#comment-14579418
 ] 

Nick Dimiduk commented on HBASE-13874:
--

bq. Should log WARN if HBase own heap is below, say 4GB

Yes, I like this approach. What's our decided lower bound? 4g? 2g? Folks have 
been running with 8-12g total heap for ages; {{8 * 0.2 = 1.6}}. Let's warn if 
own heap is less than 1.5g or less than 20%, whichever is smaller.

[~vrodionov] you think 1.5g is too low?

 Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when 
 big heap
 --

 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack

 Fix this in HBaseConfiguration:
 {code}
  79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
  80   float globalMemstoreLimit = 
 conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
  81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
  82   float blockCacheUpperLimit =
  83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
  84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
  85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
  86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
  87(int)(CONVERT_TO_PERCENTAGE *
  88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) 
 {
  89   throw new RuntimeException(
  90 Current heap configuration for MemStore and BlockCache 
 exceeds  +
  91 the threshold required for successful cluster operation.  +
  92 The combined value cannot exceed 0.8. Please check  +
  93 the settings for 
 hbase.regionserver.global.memstore.upperLimit and  +
  94 hfile.block.cache.size in your configuration.  +
  95 hbase.regionserver.global.memstore.upperLimit is  +
  96 globalMemstoreLimit +
  97  hfile.block.cache.size is  + blockCacheUpperLimit);
  98   }
  99   }
 {code}
 Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over 
 for hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13846) Run MiniCluster on top of other MiniDfsCluster


[ 
https://issues.apache.org/jira/browse/HBASE-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579492#comment-14579492
 ] 

Jesse Yates commented on HBASE-13846:
-

Thanks for taking a look Ted! I assume otherwise, you are +1? Posting another 
patch momentarily; if QA is happy, I'll commit unless I hear otherwise.

 Run MiniCluster on top of other MiniDfsCluster
 --

 Key: HBASE-13846
 URL: https://issues.apache.org/jira/browse/HBASE-13846
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 2.0.0, 0.98.14, 1.2.0, 1.1.1
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.1

 Attachments: hbase-13846-0.98-v0.patch, hbase-13846-master-v0.patch


 Similar to how we don't start a mini-zk cluster when we already have one 
 specified, this will skip starting a mini-dfs cluster if the user specifies a 
 different one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13875) Clock skew between master and region server may render restored region without server address

Ted Yu created HBASE-13875:
--

 Summary: Clock skew between master and region server may render 
restored region without server address
 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


We observed the following issue in cluster testing on a restored table 
(table_gwbh9rxyz3).
{code}
2015-06-08 
14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 'get 
'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
...
2015-06-08 
14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
server address listed in hbase:meta for region table_gwbh9rxyz3,,1433773371699. 
48652273628a291653d8c43aaa02179a. containing row row1
{code}
Here was related log snippet from master - part for 
RestoreSnapshotHandler#handleTableOperation():
{code}
2015-06-08 14:28:41,968 DEBUG 
[MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
snapshot.RestoreSnapshotHelper: starting restore
2015-06-08 14:28:41,969 DEBUG 
[MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
snapshot.RestoreSnapshotHelper: get table regions: 
hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
2015-06-08 14:28:41,984 DEBUG 
[MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
2015-06-08 14:28:41,984 INFO  
[MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
snapshot.RestoreSnapshotHelper: region to restore: 
48652273628a291653d8c43aaa02179a
2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
backup.HFileArchiver: Finished archiving from class 
org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
 to 
hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
2015-06-08 14:28:42,002 INFO  
[MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
Deleted []
2015-06-08 14:28:42,002 INFO  
[MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
Added 0
2015-06-08 14:28:42,014 INFO  
[MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
= '', ENDKEY = ''}]
2015-06-08 14:28:42,022 DEBUG 
[B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
Verify snapshot=table_gwbh9rxyz3-ru-20150608 
against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
2015-06-08 14:28:42,022 DEBUG 
[B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
Sentinel is not yet finished with restoring snapshot={ 
ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
2015-06-08 14:28:42,038 INFO  
[MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
Added 2
2015-06-08 14:28:42,038 INFO  
[MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
= '', ENDKEY = ''}]
{code}
Here was log snippet from region server - corresponding to table being enabled 
after snapshot restore:
{code}
2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
 baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
retransition opening state of node 48652273628a291653d8c43aaa02179a
2015-06-08 14:28:41,916 INFO  
[PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
regionserver.HRegionServer: Post open deploy tasks for 
table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
2015-06-08 14:28:41,920 INFO  
[PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] hbase.MetaTableAccessor: 
Updated row table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. 
with server=ip-172-31-46-239.ec2.internal,51852,1433758173941
2015-06-08 14:28:41,920 DEBUG 
[PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
regionserver.HRegionServer: Finished post open deploy task for 
table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a
{code}
What happened was that due to clock skew, server location 
(ip-172-31-46-239.ec2.internal) for the region was eclipsed by the delete 
marker put in by

[jira] [Commented] (HBASE-13875) Clock skew between master and region server may render restored region without server address


[ 
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579585#comment-14579585
 ] 

Enis Soztutar commented on HBASE-13875:
---

Otherwise looks good. +1 if tests pass. 

 Clock skew between master and region server may render restored region 
 without server address
 -

 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: 13875-branch-1.txt


 We observed the following issue in cluster testing on a restored table 
 (table_gwbh9rxyz3).
 {code}
 2015-06-08 
 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 
 'get 'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
 ...
 2015-06-08 
 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
 server address listed in hbase:meta for region 
 table_gwbh9rxyz3,,1433773371699. 
 48652273628a291653d8c43aaa02179a. containing row row1
 {code}
 Here was related log snippet from master - part for 
 RestoreSnapshotHandler#handleTableOperation():
 {code}
 2015-06-08 14:28:41,968 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: starting restore
 2015-06-08 14:28:41,969 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: get table regions: 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
 2015-06-08 14:28:41,984 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
 2015-06-08 14:28:41,984 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: region to restore: 
 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
 backup.HFileArchiver: Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
 file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
  to 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted []
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 0
 2015-06-08 14:28:42,014 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Verify snapshot=table_gwbh9rxyz3-ru-20150608 
 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Sentinel is not yet finished with restoring snapshot={ 
 ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 2
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 {code}
 Here was log snippet from region server - corresponding to table being 
 enabled after snapshot restore:
 {code}
 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
 zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
 quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
  baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
 retransition opening state of node 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:41,916 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 regionserver.HRegionServer: Post open deploy tasks for 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
 2015-06-08 14:28:41,920 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 hbase.MetaTableAccessor: Updated row

[jira] [Commented] (HBASE-13875) Clock skew between master and region server may render restored region without server address


[ 
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579584#comment-14579584
 ] 

Enis Soztutar commented on HBASE-13875:
---

You can remove the first line, and change the second one to be {{now+1}}. We do 
not need to add 20. 
{code}
+// Threads.sleep(20);
+addRegionsToMeta(connection, regionInfos, regionReplication, now+20);
{code}

 Clock skew between master and region server may render restored region 
 without server address
 -

 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: 13875-branch-1.txt


 We observed the following issue in cluster testing on a restored table 
 (table_gwbh9rxyz3).
 {code}
 2015-06-08 
 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 
 'get 'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
 ...
 2015-06-08 
 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
 server address listed in hbase:meta for region 
 table_gwbh9rxyz3,,1433773371699. 
 48652273628a291653d8c43aaa02179a. containing row row1
 {code}
 Here was related log snippet from master - part for 
 RestoreSnapshotHandler#handleTableOperation():
 {code}
 2015-06-08 14:28:41,968 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: starting restore
 2015-06-08 14:28:41,969 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: get table regions: 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
 2015-06-08 14:28:41,984 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
 2015-06-08 14:28:41,984 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: region to restore: 
 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
 backup.HFileArchiver: Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
 file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
  to 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted []
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 0
 2015-06-08 14:28:42,014 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Verify snapshot=table_gwbh9rxyz3-ru-20150608 
 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Sentinel is not yet finished with restoring snapshot={ 
 ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 2
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 {code}
 Here was log snippet from region server - corresponding to table being 
 enabled after snapshot restore:
 {code}
 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
 zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
 quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
  baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
 retransition opening state of node 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:41,916 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 regionserver.HRegionServer: Post open deploy tasks for 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.

[jira] [Commented] (HBASE-13846) Run MiniCluster on top of other MiniDfsCluster


[ 
https://issues.apache.org/jira/browse/HBASE-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579459#comment-14579459
 ] 

Ted Yu commented on HBASE-13846:


{code}
2985   * @param requireDown require the that cluster no be up before it 
is set.
{code}
I think I know what the above means:
{code}
2985   * @param requireDown require that the cluster not be up before it 
is set.
{code}
Please add Apache license header to test class.

 Run MiniCluster on top of other MiniDfsCluster
 --

 Key: HBASE-13846
 URL: https://issues.apache.org/jira/browse/HBASE-13846
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 2.0.0, 0.98.14, 1.2.0, 1.1.1
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.1

 Attachments: hbase-13846-0.98-v0.patch, hbase-13846-master-v0.patch


 Similar to how we don't start a mini-zk cluster when we already have one 
 specified, this will skip starting a mini-dfs cluster if the user specifies a 
 different one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13846) Run MiniCluster on top of other MiniDfsCluster


 [ 
https://issues.apache.org/jira/browse/HBASE-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesse Yates updated HBASE-13846:

Attachment: hbase-13846-master-v1.patch

Updated master version - fixing javadoc, extra imports header.

 Run MiniCluster on top of other MiniDfsCluster
 --

 Key: HBASE-13846
 URL: https://issues.apache.org/jira/browse/HBASE-13846
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 2.0.0, 0.98.14, 1.2.0, 1.1.1
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.1

 Attachments: hbase-13846-0.98-v0.patch, hbase-13846-master-v0.patch, 
 hbase-13846-master-v1.patch


 Similar to how we don't start a mini-zk cluster when we already have one 
 specified, this will skip starting a mini-dfs cluster if the user specifies a 
 different one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13846) Run MiniCluster on top of other MiniDfsCluster


 [ 
https://issues.apache.org/jira/browse/HBASE-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesse Yates updated HBASE-13846:

Status: Patch Available  (was: Open)

 Run MiniCluster on top of other MiniDfsCluster
 --

 Key: HBASE-13846
 URL: https://issues.apache.org/jira/browse/HBASE-13846
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 2.0.0, 0.98.14, 1.2.0, 1.1.1
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.1

 Attachments: hbase-13846-0.98-v0.patch, hbase-13846-master-v0.patch, 
 hbase-13846-master-v1.patch


 Similar to how we don't start a mini-zk cluster when we already have one 
 specified, this will skip starting a mini-dfs cluster if the user specifies a 
 different one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13846) Run MiniCluster on top of other MiniDfsCluster


 [ 
https://issues.apache.org/jira/browse/HBASE-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesse Yates updated HBASE-13846:

Status: Open  (was: Patch Available)

 Run MiniCluster on top of other MiniDfsCluster
 --

 Key: HBASE-13846
 URL: https://issues.apache.org/jira/browse/HBASE-13846
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 2.0.0, 0.98.14, 1.2.0, 1.1.1
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.1

 Attachments: hbase-13846-0.98-v0.patch, hbase-13846-master-v0.patch, 
 hbase-13846-master-v1.patch


 Similar to how we don't start a mini-zk cluster when we already have one 
 specified, this will skip starting a mini-dfs cluster if the user specifies a 
 different one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13876) Improving performance of HeapMemoryManager

[
https://issues.apache.org/jira/browse/HBASE-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when server
is either load heavy or write heavy. Earlier version just waited to for number
of evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized derivative)
of number of evictions and number of flushes during last two periods. The main
motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when server
is either load heavy or write heavy. Earlier version just waited to for number
of evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized derivative)
of number of evictions and number of flushes in last two periods. The main
motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

Improving performance of HeapMemoryManager
--

Key: HBASE-13876
URL: https://issues.apache.org/jira/browse/HBASE-13876
Project: HBase
Issue Type: Improvement
Components: hbase, regionserver
Affects Versions: 2.0.0, 1.0.1, 1.1.0, 1.1.1
Reporter: Abhilash
Assignee: Abhilash
Priority: Minor
Attachments: HBASE-13876.patch

I am trying to improve the performance of DefaultHeapMemoryTuner by
introducing some more checks. The current checks under which the
DefaultHeapMemoryTuner works are very rare so I am trying to weaken these
checks to improve its performance.
Check current memstore size and current block cache size. If we are using
less than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
server is either load heavy or write heavy. Earlier version just waited to
for number of evictions / number of flushes to be zero which is very rare to
happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has
indeed decreased number of evictions / flush ( combined). I am doing this
analysis by comparing percent change (which is basically nothing but
normalized derivative) of number of evictions and number of flushes during
last two periods. The main motive for doing this was that if we have random
reads then even after increasing block cache we wont be able to decrease
number of cache misses and eventually we will not waste memory on block
caches.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13671) More classes to add to the invoking repository of org.apache.hadoop.hbase.mapreduce.driver

2015-06-09 Thread li xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579430#comment-14579430
 ] 

li xiang commented on HBASE-13671:
--

Thanks Jerry and Ted for reviewing !

 More classes to add to the invoking repository of 
 org.apache.hadoop.hbase.mapreduce.driver
 --

 Key: HBASE-13671
 URL: https://issues.apache.org/jira/browse/HBASE-13671
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: li xiang
Assignee: li xiang
 Fix For: 2.0.0, 0.98.13, 1.2.0, 1.1.1

 Attachments: HBASE-13671-v1.patch


 In org.apache.hadoop.hbase.mapreduce.driver, only the following classes are 
 added and can be invoked by hbase-server.jar:
   - RowCounter 
   - CellCounter
   - Export
   - Import
   - ImportTsv
   - LoadIncrementalHFiles
   - CopyTable
   - VerifyReplication
 More classes(valid programs) to add, such as ExportSnapshot, WALPlayer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13876) Improving performance of HeapMemoryTunerManager

Abhilash created HBASE-13876:


 Summary: Improving performance of HeapMemoryTunerManager
 Key: HBASE-13876
 URL: https://issues.apache.org/jira/browse/HBASE-13876
 Project: HBase
  Issue Type: Improvement
  Components: hbase, regionserver
Affects Versions: 1.1.0, 1.0.1, 2.0.0, 1.1.1
Reporter: Abhilash
Assignee: Abhilash
Priority: Minor


I am trying to improve the performance of DefaultHeapMemoryTuner by introducing 
some more checks. The current checks under which the DefaultHeapMemoryTuner 
works are very rare so I am trying to weaken these checks to improve its 
performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13845) Expire of one region server carrying meta can bring down the master


[ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579432#comment-14579432
 ] 

Hudson commented on HBASE-13845:


FAILURE: Integrated in HBase-1.1 #532 (See 
[https://builds.apache.org/job/HBase-1.1/532/])
HBASE-13845 Expire of one region server carrying meta can bring down the master 
(jerryjch: rev f95430a8428a5a3e9bfd7b9d38d434eb298133ae)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMetaShutdownHandler.java


 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, 
 HBASE-13845-branch-1.patch, HBASE-13845-master-test-case-only.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap

2015-06-09 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579484#comment-14579484
 ] 

Esteban Gutierrez commented on HBASE-13874:
---

I think we should fail fast in the same way we do now and only make this 
configurable if required. Finding the right thresholds will be always a problem 
for users.

 Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when 
 big heap
 --

 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack

 Fix this in HBaseConfiguration:
 {code}
  79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
  80   float globalMemstoreLimit = 
 conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
  81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
  82   float blockCacheUpperLimit =
  83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
  84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
  85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
  86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
  87(int)(CONVERT_TO_PERCENTAGE *
  88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) 
 {
  89   throw new RuntimeException(
  90 Current heap configuration for MemStore and BlockCache 
 exceeds  +
  91 the threshold required for successful cluster operation.  +
  92 The combined value cannot exceed 0.8. Please check  +
  93 the settings for 
 hbase.regionserver.global.memstore.upperLimit and  +
  94 hfile.block.cache.size in your configuration.  +
  95 hbase.regionserver.global.memstore.upperLimit is  +
  96 globalMemstoreLimit +
  97  hfile.block.cache.size is  + blockCacheUpperLimit);
  98   }
  99   }
 {code}
 Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over 
 for hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13329) Memstore flush fails if data has always the same value, breaking the region

2015-06-09 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579511#comment-14579511
 ] 

Nick Dimiduk commented on HBASE-13329:
--

I'd like this fix for 1.1.1 but would feel better about it if there was a test 
along with it.

 Memstore flush fails if data has always the same value, breaking the region
 ---

 Key: HBASE-13329
 URL: https://issues.apache.org/jira/browse/HBASE-13329
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.1
 Environment: linux-debian-jessie
 ec2 - t2.micro instances
Reporter: Ruben Aguiar
Assignee: Ruben Aguiar
Priority: Critical
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1

 Attachments: 13329-v1.patch


 While trying to benchmark my opentsdb cluster, I've created a script that 
 sends to hbase always the same value (in this case 1). After a few minutes, 
 the whole region server crashes and the region itself becomes impossible to 
 open again (cannot assign or unassign). After some investigation, what I saw 
 on the logs is that when a Memstore flush is called on a large region (128mb) 
 the process errors, killing the regionserver. On restart, replaying the edits 
 generates the same error, making the region unavailable. Tried to manually 
 unassign, assign or close_region. That didn't work because the code that 
 reads/replays it crashes.
 From my investigation this seems to be an overflow issue. The logs show that 
 the function getMinimumMidpointArray tried to access index -32743 of an 
 array, extremely close to the minimum short value in Java. Upon investigation 
 of the source code, it seems an index short is used, being incremented as 
 long as the two vectors are the same, probably making it overflow on large 
 vectors with equal data. Changing it to int should solve the problem.
 Here follows the hadoop logs of when the regionserver went down. Any help is 
 appreciated. Any other information you need please do tell me:
 2015-03-24 18:00:56,187 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Rolled WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
  with entries=143, filesize=134.70 MB; new WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
 2015-03-24 18:00:56,188 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Archiving 
 hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
  to 
 hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
 2015-03-24 18:04:35,722 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
 Started memstore flush for 
 tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region 
 memstore size 128.04 MB
 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: 
 ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. 
 Forcing server shutdown
 org.apache.hadoop.hbase.DroppedSnapshotException: region: 
 tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
   at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
   at 
 org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
   at 
 org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
   at

[jira] [Updated] (HBASE-13876) Improving performance of HeapMemoryManager


 [ 
https://issues.apache.org/jira/browse/HBASE-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhilash updated HBASE-13876:
-
Attachment: HBASE-13876.patch

 Improving performance of HeapMemoryManager
 --

 Key: HBASE-13876
 URL: https://issues.apache.org/jira/browse/HBASE-13876
 Project: HBase
  Issue Type: Improvement
  Components: hbase, regionserver
Affects Versions: 2.0.0, 1.0.1, 1.1.0, 1.1.1
Reporter: Abhilash
Assignee: Abhilash
Priority: Minor
 Attachments: HBASE-13876.patch


 I am trying to improve the performance of DefaultHeapMemoryTuner by 
 introducing some more checks. The current checks under which the 
 DefaultHeapMemoryTuner works are very rare so I am trying to weaken these 
 checks to improve its performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13875) Clock skew between master and region server may render restored region without server address


 [ 
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13875:
---
Attachment: 13875-branch-1.txt

 Clock skew between master and region server may render restored region 
 without server address
 -

 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 13875-branch-1.txt


 We observed the following issue in cluster testing on a restored table 
 (table_gwbh9rxyz3).
 {code}
 2015-06-08 
 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 
 'get 'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
 ...
 2015-06-08 
 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
 server address listed in hbase:meta for region 
 table_gwbh9rxyz3,,1433773371699. 
 48652273628a291653d8c43aaa02179a. containing row row1
 {code}
 Here was related log snippet from master - part for 
 RestoreSnapshotHandler#handleTableOperation():
 {code}
 2015-06-08 14:28:41,968 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: starting restore
 2015-06-08 14:28:41,969 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: get table regions: 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
 2015-06-08 14:28:41,984 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
 2015-06-08 14:28:41,984 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: region to restore: 
 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
 backup.HFileArchiver: Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
 file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
  to 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted []
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 0
 2015-06-08 14:28:42,014 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Verify snapshot=table_gwbh9rxyz3-ru-20150608 
 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Sentinel is not yet finished with restoring snapshot={ 
 ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 2
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 {code}
 Here was log snippet from region server - corresponding to table being 
 enabled after snapshot restore:
 {code}
 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
 zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
 quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
  baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
 retransition opening state of node 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:41,916 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 regionserver.HRegionServer: Post open deploy tasks for 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
 2015-06-08 14:28:41,920 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 hbase.MetaTableAccessor: Updated row 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. with 
 server=ip-172-31-46-239.ec2.internal,51852,1433758173941
 2015-06-08 14:28:41,920 DEBUG

[jira] [Commented] (HBASE-13875) Clock skew between master and region server may render restored region without server address


[ 
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579570#comment-14579570
 ] 

Ted Yu commented on HBASE-13875:


Patch uses master timestamp for the mutations so that clock skew doesn't affect 
the correctness of restored table(s).

 Clock skew between master and region server may render restored region 
 without server address
 -

 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 13875-branch-1.txt


 We observed the following issue in cluster testing on a restored table 
 (table_gwbh9rxyz3).
 {code}
 2015-06-08 
 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 
 'get 'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
 ...
 2015-06-08 
 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
 server address listed in hbase:meta for region 
 table_gwbh9rxyz3,,1433773371699. 
 48652273628a291653d8c43aaa02179a. containing row row1
 {code}
 Here was related log snippet from master - part for 
 RestoreSnapshotHandler#handleTableOperation():
 {code}
 2015-06-08 14:28:41,968 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: starting restore
 2015-06-08 14:28:41,969 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: get table regions: 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
 2015-06-08 14:28:41,984 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
 2015-06-08 14:28:41,984 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: region to restore: 
 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
 backup.HFileArchiver: Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
 file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
  to 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted []
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 0
 2015-06-08 14:28:42,014 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Verify snapshot=table_gwbh9rxyz3-ru-20150608 
 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Sentinel is not yet finished with restoring snapshot={ 
 ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 2
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 {code}
 Here was log snippet from region server - corresponding to table being 
 enabled after snapshot restore:
 {code}
 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
 zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
 quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
  baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
 retransition opening state of node 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:41,916 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 regionserver.HRegionServer: Post open deploy tasks for 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
 2015-06-08 14:28:41,920 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 hbase.MetaTableAccessor: Updated row

[jira] [Commented] (HBASE-13845) Expire of one region server carrying meta can bring down the master


[ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579569#comment-14579569
 ] 

Hudson commented on HBASE-13845:


FAILURE: Integrated in HBase-TRUNK #6556 (See 
[https://builds.apache.org/job/HBase-TRUNK/6556/])
HBASE-13845 Expire of one region server carrying meta can bring down the 
master: test case (jerryjch: rev 14fe23254a78c52bdaef0da819268c8b405059cb)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMetaShutdownHandler.java


 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, 
 HBASE-13845-branch-1.patch, HBASE-13845-master-test-case-only.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13875) Clock skew between master and region server may render restored region without server address


 [ 
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13875:
---
Status: Patch Available  (was: Open)

 Clock skew between master and region server may render restored region 
 without server address
 -

 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 13875-branch-1.txt


 We observed the following issue in cluster testing on a restored table 
 (table_gwbh9rxyz3).
 {code}
 2015-06-08 
 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 
 'get 'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
 ...
 2015-06-08 
 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
 server address listed in hbase:meta for region 
 table_gwbh9rxyz3,,1433773371699. 
 48652273628a291653d8c43aaa02179a. containing row row1
 {code}
 Here was related log snippet from master - part for 
 RestoreSnapshotHandler#handleTableOperation():
 {code}
 2015-06-08 14:28:41,968 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: starting restore
 2015-06-08 14:28:41,969 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: get table regions: 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
 2015-06-08 14:28:41,984 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
 2015-06-08 14:28:41,984 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: region to restore: 
 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
 backup.HFileArchiver: Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
 file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
  to 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted []
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 0
 2015-06-08 14:28:42,014 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Verify snapshot=table_gwbh9rxyz3-ru-20150608 
 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Sentinel is not yet finished with restoring snapshot={ 
 ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 2
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 {code}
 Here was log snippet from region server - corresponding to table being 
 enabled after snapshot restore:
 {code}
 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
 zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
 quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
  baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
 retransition opening state of node 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:41,916 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 regionserver.HRegionServer: Post open deploy tasks for 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
 2015-06-08 14:28:41,920 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 hbase.MetaTableAccessor: Updated row 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. with 
 server=ip-172-31-46-239.ec2.internal,51852,1433758173941
 2015-06-08 14:28:41,920 DEBUG

[jira] [Updated] (HBASE-13876) Improving performance of HeapMemoryManager

[
https://issues.apache.org/jira/browse/HBASE-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Abhilash updated HBASE-13876:
-
Description:
I am trying to improve the performance of DefaultHeapMemoryTuner by introducing
some more checks. The current checks under which the DefaultHeapMemoryTuner
works are very rare so I am trying to weaken these checks to improve its
performance.
Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
cluster is load heavy or write heavy. Earlier version just waited to for number
of evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized of the
derivative) of number of evictions and number of flushes in last two periods.
The main motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

Improving performance of HeapMemoryManager
--

I am trying to improve the performance of DefaultHeapMemoryTuner by
introducing some more checks. The current checks under which the
DefaultHeapMemoryTuner works are very rare so I am trying to weaken these
checks to improve its performance.
Check current memstore size and current block cache size. If we are using
less than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
cluster is load heavy or write heavy. Earlier version just waited to for
number of evictions / number of flushes to be zero which is very rare to
happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has
indeed decreased number of evictions / flush ( combined). I am doing this
analysis by comparing percent change (which is basically nothing but
normalized of the derivative) of number of evictions and number of flushes in
last two periods. The main motive for doing this was that if we have random
reads then even after increasing block cache we wont be able to decrease
number of cache misses and eventually we will not waste memory on block
caches.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13329) Memstore flush fails if data has always the same value, breaking the region


[ 
https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579434#comment-14579434
 ] 

Hadoop QA commented on HBASE-13329:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12738609/13329-v1.patch
  against master branch at commit 6cc42c8cd16d01cded9936bf53bf35e6e2ff5b66.
  ATTACHMENT ID: 12738609

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14345//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14345//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14345//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14345//console

This message is automatically generated.

 Memstore flush fails if data has always the same value, breaking the region
 ---

 Key: HBASE-13329
 URL: https://issues.apache.org/jira/browse/HBASE-13329
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.1
 Environment: linux-debian-jessie
 ec2 - t2.micro instances
Reporter: Ruben Aguiar
Assignee: Ruben Aguiar
Priority: Critical
 Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1

 Attachments: 13329-v1.patch


 While trying to benchmark my opentsdb cluster, I've created a script that 
 sends to hbase always the same value (in this case 1). After a few minutes, 
 the whole region server crashes and the region itself becomes impossible to 
 open again (cannot assign or unassign). After some investigation, what I saw 
 on the logs is that when a Memstore flush is called on a large region (128mb) 
 the process errors, killing the regionserver. On restart, replaying the edits 
 generates the same error, making the region unavailable. Tried to manually 
 unassign, assign or close_region. That didn't work because the code that 
 reads/replays it crashes.
 From my investigation this seems to be an overflow issue. The logs show that 
 the function getMinimumMidpointArray tried to access index -32743 of an 
 array, extremely close to the minimum short value in Java. Upon investigation 
 of the source code, it seems an index short is used, being incremented as 
 long as the two vectors are the same, probably making it overflow on large 
 vectors with equal data. Changing it to int should solve the problem.
 Here follows the hadoop logs of when the regionserver went down. Any help is 
 appreciated. Any other information you need please do tell me:
 2015-03-24 18:00:56,187 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Rolled WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
  with entries=143, filesize=134.70 MB; new WAL 
 /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
 2015-03-24 18:00:56,188 INFO  [regionserver//10.2.0.73:16020.logRoller] 
 wal.FSHLog: Archiving 
 hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
  to 
 hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
 2015-03-24

[jira] [Commented] (HBASE-13848) Access InfoServer SSL passwords through Credential Provder API


[ 
https://issues.apache.org/jira/browse/HBASE-13848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579451#comment-14579451
 ] 

Ted Yu commented on HBASE-13848:


lgtm

 Access InfoServer SSL passwords through Credential Provder API
 --

 Key: HBASE-13848
 URL: https://issues.apache.org/jira/browse/HBASE-13848
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Sean Busbey
Assignee: Sean Busbey
 Attachments: HBASE-13848.1.patch, HBASE-13848.1.patch


 HBASE-11810 took care of getting our SSL passwords out of the Hadoop 
 Credential Provider API, but we also get several out of clear text 
 configuration for the InfoServer class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap

2015-06-09 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579500#comment-14579500
 ] 

Dave Latham commented on HBASE-13874:
-

I would definitely appreciate being able to use a config somehow besides 
recompiling as long as it is logically consistent, even if not recommended.

 Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when 
 big heap
 --

 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack

 Fix this in HBaseConfiguration:
 {code}
  79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
  80   float globalMemstoreLimit = 
 conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
  81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
  82   float blockCacheUpperLimit =
  83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
  84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
  85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
  86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
  87(int)(CONVERT_TO_PERCENTAGE *
  88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) 
 {
  89   throw new RuntimeException(
  90 Current heap configuration for MemStore and BlockCache 
 exceeds  +
  91 the threshold required for successful cluster operation.  +
  92 The combined value cannot exceed 0.8. Please check  +
  93 the settings for 
 hbase.regionserver.global.memstore.upperLimit and  +
  94 hfile.block.cache.size in your configuration.  +
  95 hbase.regionserver.global.memstore.upperLimit is  +
  96 globalMemstoreLimit +
  97  hfile.block.cache.size is  + blockCacheUpperLimit);
  98   }
  99   }
 {code}
 Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over 
 for hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13875) Clock skew between master and region server may render restored region without server address


 [ 
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13875:
--
Fix Version/s: 1.1.1
   1.2.0
   2.0.0

 Clock skew between master and region server may render restored region 
 without server address
 -

 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: 13875-branch-1.txt


 We observed the following issue in cluster testing on a restored table 
 (table_gwbh9rxyz3).
 {code}
 2015-06-08 
 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 
 'get 'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
 ...
 2015-06-08 
 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
 server address listed in hbase:meta for region 
 table_gwbh9rxyz3,,1433773371699. 
 48652273628a291653d8c43aaa02179a. containing row row1
 {code}
 Here was related log snippet from master - part for 
 RestoreSnapshotHandler#handleTableOperation():
 {code}
 2015-06-08 14:28:41,968 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: starting restore
 2015-06-08 14:28:41,969 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: get table regions: 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
 2015-06-08 14:28:41,984 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
 2015-06-08 14:28:41,984 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: region to restore: 
 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
 backup.HFileArchiver: Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
 file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
  to 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted []
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 0
 2015-06-08 14:28:42,014 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Verify snapshot=table_gwbh9rxyz3-ru-20150608 
 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Sentinel is not yet finished with restoring snapshot={ 
 ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 2
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 {code}
 Here was log snippet from region server - corresponding to table being 
 enabled after snapshot restore:
 {code}
 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
 zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
 quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
  baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
 retransition opening state of node 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:41,916 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 regionserver.HRegionServer: Post open deploy tasks for 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
 2015-06-08 14:28:41,920 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 hbase.MetaTableAccessor: Updated row 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. with

[jira] [Updated] (HBASE-13876) Improving performance of HeapMemoryManager

[
https://issues.apache.org/jira/browse/HBASE-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when server
is either load heavy or write heavy. Earlier version just waited to for number
of evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized derivative)
of number of evictions and number of flushes in last two periods. The main
motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when server
is either load heavy or write heavy. Earlier version just waited to for number
of evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized of the
derivative) of number of evictions and number of flushes in last two periods.
The main motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

Improving performance of HeapMemoryManager
--

I am trying to improve the performance of DefaultHeapMemoryTuner by
introducing some more checks. The current checks under which the
DefaultHeapMemoryTuner works are very rare so I am trying to weaken these
checks to improve its performance.
Check current memstore size and current block cache size. If we are using
less than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
server is either load heavy or write heavy. Earlier version just waited to
for number of evictions / number of flushes to be zero which is very rare to
happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has
indeed decreased number of evictions / flush ( combined). I am doing this
analysis by comparing percent change (which is basically nothing but
normalized derivative) of number of evictions and number of flushes in last
two periods. The main motive for doing this was that if we have random reads
then even after increasing block cache we wont be able to decrease number of
cache misses and eventually we will not waste memory on block caches.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13875) Clock skew between master and region server may render restored region without server address


 [ 
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13875:
---
Attachment: (was: 13875-branch-1.txt)

 Clock skew between master and region server may render restored region 
 without server address
 -

 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.1.1


 We observed the following issue in cluster testing on a restored table 
 (table_gwbh9rxyz3).
 {code}
 2015-06-08 
 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 
 'get 'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
 ...
 2015-06-08 
 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
 server address listed in hbase:meta for region 
 table_gwbh9rxyz3,,1433773371699. 
 48652273628a291653d8c43aaa02179a. containing row row1
 {code}
 Here was related log snippet from master - part for 
 RestoreSnapshotHandler#handleTableOperation():
 {code}
 2015-06-08 14:28:41,968 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: starting restore
 2015-06-08 14:28:41,969 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: get table regions: 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
 2015-06-08 14:28:41,984 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
 2015-06-08 14:28:41,984 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: region to restore: 
 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
 backup.HFileArchiver: Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
 file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
  to 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted []
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 0
 2015-06-08 14:28:42,014 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Verify snapshot=table_gwbh9rxyz3-ru-20150608 
 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Sentinel is not yet finished with restoring snapshot={ 
 ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 2
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 {code}
 Here was log snippet from region server - corresponding to table being 
 enabled after snapshot restore:
 {code}
 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
 zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
 quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
  baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
 retransition opening state of node 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:41,916 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 regionserver.HRegionServer: Post open deploy tasks for 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
 2015-06-08 14:28:41,920 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 hbase.MetaTableAccessor: Updated row 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. with 
 server=ip-172-31-46-239.ec2.internal,51852,1433758173941
 2015-06-08 14:28:41,920

[jira] [Updated] (HBASE-13876) Improving performance of HeapMemoryManager

[
https://issues.apache.org/jira/browse/HBASE-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
cluster is load heavy or write heavy. Earlier version just waited to for number
of evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized of the
derivative) of number of evictions and number of flushes in last two periods.
The main motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

was:
I am trying to improve the performance of DefaultHeapMemoryTuner by introducing
some more checks. The current checks under which the DefaultHeapMemoryTuner
works are very rare so I am trying to weaken these checks to improve its
performance.
Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
cluster is load heavy or write heavy. Earlier version just waited to for number
of evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized of the
derivative) of number of evictions and number of flushes in last two periods.
The main motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

Improving performance of HeapMemoryManager
--

I am trying to improve the performance of DefaultHeapMemoryTuner by
introducing some more checks. The current checks under which the
DefaultHeapMemoryTuner works are very rare so I am trying to weaken these
checks to improve its performance.
Check current memstore size and current block cache size. If we are using
less than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
cluster is load heavy or write heavy. Earlier version just waited to for
number of evictions / number of flushes to be zero which is very rare to
happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has
indeed decreased number of evictions / flush ( combined). I am doing this
analysis by comparing percent change (which is basically nothing but
normalized of the derivative) of number of evictions and number of flushes in
last two periods. The main motive for doing this was that if we have random
reads then even after increasing block cache we wont be able to decrease
number of cache misses and eventually we will not waste memory on block
caches.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13876) Improving performance of HeapMemoryManager

[
https://issues.apache.org/jira/browse/HBASE-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when server
is either load heavy or write heavy. Earlier version just waited to for number
of evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized of the
derivative) of number of evictions and number of flushes in last two periods.
The main motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
cluster is load heavy or write heavy. Earlier version just waited to for number
of evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized of the
derivative) of number of evictions and number of flushes in last two periods.
The main motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

Improving performance of HeapMemoryManager
--

I am trying to improve the performance of DefaultHeapMemoryTuner by
introducing some more checks. The current checks under which the
DefaultHeapMemoryTuner works are very rare so I am trying to weaken these
checks to improve its performance.
Check current memstore size and current block cache size. If we are using
less than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
server is either load heavy or write heavy. Earlier version just waited to
for number of evictions / number of flushes to be zero which is very rare to
happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has
indeed decreased number of evictions / flush ( combined). I am doing this
analysis by comparing percent change (which is basically nothing but
normalized of the derivative) of number of evictions and number of flushes in
last two periods. The main motive for doing this was that if we have random
reads then even after increasing block cache we wont be able to decrease
number of cache misses and eventually we will not waste memory on block
caches.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap


[ 
https://issues.apache.org/jira/browse/HBASE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579494#comment-14579494
 ] 

stack commented on HBASE-13874:
---

As per Esteban, I could just make it so you can configure the 0.8 upper bound 
with how to change the limit in the message we throw when we fail to start. 
WARNs are not always seen. Guessing allowed minimum is always going to be 
unsatisfactory for some one (See Jimmy Lin running hbase on raspberry pi, see 
this user I've been messing with running 110G heaps)

 Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when 
 big heap
 --

 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack

 Fix this in HBaseConfiguration:
 {code}
  79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
  80   float globalMemstoreLimit = 
 conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
  81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
  82   float blockCacheUpperLimit =
  83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
  84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
  85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
  86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
  87(int)(CONVERT_TO_PERCENTAGE *
  88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) 
 {
  89   throw new RuntimeException(
  90 Current heap configuration for MemStore and BlockCache 
 exceeds  +
  91 the threshold required for successful cluster operation.  +
  92 The combined value cannot exceed 0.8. Please check  +
  93 the settings for 
 hbase.regionserver.global.memstore.upperLimit and  +
  94 hfile.block.cache.size in your configuration.  +
  95 hbase.regionserver.global.memstore.upperLimit is  +
  96 globalMemstoreLimit +
  97  hfile.block.cache.size is  + blockCacheUpperLimit);
  98   }
  99   }
 {code}
 Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over 
 for hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13876) Improving performance of HeapMemoryManager


 [ 
https://issues.apache.org/jira/browse/HBASE-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhilash updated HBASE-13876:
-
Summary: Improving performance of HeapMemoryManager  (was: Improving 
performance of HeapMemoryTunerManager)

 Improving performance of HeapMemoryManager
 --

 Key: HBASE-13876
 URL: https://issues.apache.org/jira/browse/HBASE-13876
 Project: HBase
  Issue Type: Improvement
  Components: hbase, regionserver
Affects Versions: 2.0.0, 1.0.1, 1.1.0, 1.1.1
Reporter: Abhilash
Assignee: Abhilash
Priority: Minor

 I am trying to improve the performance of DefaultHeapMemoryTuner by 
 introducing some more checks. The current checks under which the 
 DefaultHeapMemoryTuner works are very rare so I am trying to weaken these 
 checks to improve its performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13875) Clock skew between master and region server may render restored region without server address


 [ 
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13875:
---
Attachment: 13875-branch-1.txt

 Clock skew between master and region server may render restored region 
 without server address
 -

 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: 13875-branch-1.txt


 We observed the following issue in cluster testing on a restored table 
 (table_gwbh9rxyz3).
 {code}
 2015-06-08 
 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 
 'get 'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
 ...
 2015-06-08 
 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
 server address listed in hbase:meta for region 
 table_gwbh9rxyz3,,1433773371699. 
 48652273628a291653d8c43aaa02179a. containing row row1
 {code}
 Here was related log snippet from master - part for 
 RestoreSnapshotHandler#handleTableOperation():
 {code}
 2015-06-08 14:28:41,968 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: starting restore
 2015-06-08 14:28:41,969 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: get table regions: 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
 2015-06-08 14:28:41,984 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
 2015-06-08 14:28:41,984 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: region to restore: 
 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
 backup.HFileArchiver: Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
 file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
  to 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted []
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 0
 2015-06-08 14:28:42,014 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Verify snapshot=table_gwbh9rxyz3-ru-20150608 
 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Sentinel is not yet finished with restoring snapshot={ 
 ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 2
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 {code}
 Here was log snippet from region server - corresponding to table being 
 enabled after snapshot restore:
 {code}
 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
 zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
 quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
  baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
 retransition opening state of node 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:41,916 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 regionserver.HRegionServer: Post open deploy tasks for 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
 2015-06-08 14:28:41,920 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 hbase.MetaTableAccessor: Updated row 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a. with

[jira] [Commented] (HBASE-13875) Clock skew between master and region server may render restored region without server address


[ 
https://issues.apache.org/jira/browse/HBASE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579576#comment-14579576
 ] 

Ted Yu commented on HBASE-13875:


Removes redundant Put in MetaTableAccessor#addRegionsToMeta().

 Clock skew between master and region server may render restored region 
 without server address
 -

 Key: HBASE-13875
 URL: https://issues.apache.org/jira/browse/HBASE-13875
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: 13875-branch-1.txt


 We observed the following issue in cluster testing on a restored table 
 (table_gwbh9rxyz3).
 {code}
 2015-06-08 
 14:29:47,313|beaver.component.hbase|INFO|6196|140144585275136|MainThread| 
 'get 'table_gwbh9rxyz3','row1', {COLUMN = 'family1'}'
 ...
 2015-06-08 
 14:31:38,203|beaver.machine|INFO|6196|140144585275136|MainThread|ERROR: No 
 server address listed in hbase:meta for region 
 table_gwbh9rxyz3,,1433773371699. 
 48652273628a291653d8c43aaa02179a. containing row row1
 {code}
 Here was related log snippet from master - part for 
 RestoreSnapshotHandler#handleTableOperation():
 {code}
 2015-06-08 14:28:41,968 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: starting restore
 2015-06-08 14:28:41,969 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: get table regions: 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3
 2015-06-08 14:28:41,984 DEBUG 
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: found 1 regions for table=table_gwbh9rxyz3
 2015-06-08 14:28:41,984 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] 
 snapshot.RestoreSnapshotHelper: region to restore: 
 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:42,001 DEBUG [RestoreSnapshot-pool584-t1] 
 backup.HFileArchiver: Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
 file:hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66,
  to 
 hdfs://ip-172-31-46-239.ec2.internal:8020/user/hbase/.slider/cluster/hbasesliderapp/database/archive/data/default/table_gwbh9rxyz3/48652273628a291653d8c43aaa02179a/family1/45aa3fb9e0404814b77a9cac91ebeb66
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted []
 2015-06-08 14:28:42,002 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 0
 2015-06-08 14:28:42,014 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Deleted [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Verify snapshot=table_gwbh9rxyz3-ru-20150608 
 against=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3
 2015-06-08 14:28:42,022 DEBUG 
 [B.defaultRpcServer.handler=13,queue=1,port=54936] snapshot.SnapshotManager: 
 Sentinel is not yet finished with restoring snapshot={ 
 ss=table_gwbh9rxyz3-ru-20150608 table=table_gwbh9rxyz3 type=FLUSH }
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Added 2
 2015-06-08 14:28:42,038 INFO  
 [MASTER_TABLE_OPERATIONS-ip-172-31-46-243:54936-0] hbase.MetaTableAccessor: 
 Overwritten [{ENCODED = 48652273628a291653d8c43aaa02179a, NAME = 
 'table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.', STARTKEY 
 = '', ENDKEY = ''}]
 {code}
 Here was log snippet from region server - corresponding to table being 
 enabled after snapshot restore:
 {code}
 2015-06-08 14:28:41,914 DEBUG [RS_OPEN_REGION-ip-172-31-46-239:51852-2] 
 zookeeper.ZKAssign: regionserver:51852-0x24dd2833c34000b, 
 quorum=ip-172-31-46-239.ec2.internal:2181,ip-172-31-46-241.ec2.internal:2181,ip-172-31-46-242.ec2.internal:2181,
  baseZNode=/services/slider/users/hbase/hbasesliderapp Attempting to 
 retransition opening state of node 48652273628a291653d8c43aaa02179a
 2015-06-08 14:28:41,916 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 regionserver.HRegionServer: Post open deploy tasks for 
 table_gwbh9rxyz3,,1433773371699.48652273628a291653d8c43aaa02179a.
 2015-06-08 14:28:41,920 INFO  
 [PostOpenDeployTasks:48652273628a291653d8c43aaa02179a] 
 hbase.MetaTableAccessor: Updated row

[jira] [Updated] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap

2015-06-09 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-13874:
--
Attachment: 0001-HBASE-13874-Fix-0.8-being-hardcoded-sum-of-blockcach.patch

Hey [~stack] let me know if this is what you are thinking about on this 
initially.

 Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when 
 big heap
 --

 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Esteban Gutierrez
 Attachments: 
 0001-HBASE-13874-Fix-0.8-being-hardcoded-sum-of-blockcach.patch


 Fix this in HBaseConfiguration:
 {code}
  79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
  80   float globalMemstoreLimit = 
 conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
  81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
  82   float blockCacheUpperLimit =
  83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
  84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
  85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
  86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
  87(int)(CONVERT_TO_PERCENTAGE *
  88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) 
 {
  89   throw new RuntimeException(
  90 Current heap configuration for MemStore and BlockCache 
 exceeds  +
  91 the threshold required for successful cluster operation.  +
  92 The combined value cannot exceed 0.8. Please check  +
  93 the settings for 
 hbase.regionserver.global.memstore.upperLimit and  +
  94 hfile.block.cache.size in your configuration.  +
  95 hbase.regionserver.global.memstore.upperLimit is  +
  96 globalMemstoreLimit +
  97  hfile.block.cache.size is  + blockCacheUpperLimit);
  98   }
  99   }
 {code}
 Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over 
 for hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap


[ 
https://issues.apache.org/jira/browse/HBASE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579710#comment-14579710
 ] 

Vladimir Rodionov commented on HBASE-13874:
---

[~saint@gmail.com] asked:
{quote}
Vladimir Rodionov you think 1.5g is too low?
{quote}

Yes, we have observed OOME with heaps below 8GB while running M/R jobs(1.5-2GB 
reserved for HBase), with standard settings for block cache and memstore in the 
past (pre-0.98). Can't say for 0.98+ since heaps are larger now :).  

 Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when 
 big heap
 --

 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Esteban Gutierrez
 Attachments: 
 0001-HBASE-13874-Fix-0.8-being-hardcoded-sum-of-blockcach.patch


 Fix this in HBaseConfiguration:
 {code}
  79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
  80   float globalMemstoreLimit = 
 conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
  81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
  82   float blockCacheUpperLimit =
  83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
  84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
  85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
  86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
  87(int)(CONVERT_TO_PERCENTAGE *
  88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) 
 {
  89   throw new RuntimeException(
  90 Current heap configuration for MemStore and BlockCache 
 exceeds  +
  91 the threshold required for successful cluster operation.  +
  92 The combined value cannot exceed 0.8. Please check  +
  93 the settings for 
 hbase.regionserver.global.memstore.upperLimit and  +
  94 hfile.block.cache.size in your configuration.  +
  95 hbase.regionserver.global.memstore.upperLimit is  +
  96 globalMemstoreLimit +
  97  hfile.block.cache.size is  + blockCacheUpperLimit);
  98   }
  99   }
 {code}
 Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over 
 for hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13874) Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when big heap


[ 
https://issues.apache.org/jira/browse/HBASE-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579714#comment-14579714
 ] 

Vladimir Rodionov commented on HBASE-13874:
---

If block cache + memstore exceed ( 1- 
hbase.regionserver.reserved.memory.threshold) we throw RuntimeException and we 
can control reserved memory ratio. Looks right to me.

 Fix 0.8 being hardcoded sum of blockcache + memstore; doesn't make sense when 
 big heap
 --

 Key: HBASE-13874
 URL: https://issues.apache.org/jira/browse/HBASE-13874
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Esteban Gutierrez
 Attachments: 
 0001-HBASE-13874-Fix-0.8-being-hardcoded-sum-of-blockcach.patch


 Fix this in HBaseConfiguration:
 {code}
  79   private static void checkForClusterFreeMemoryLimit(Configuration conf) {
  80   float globalMemstoreLimit = 
 conf.getFloat(hbase.regionserver.global.memstore.upperLimit, 0.4f);
  81   int gml = (int)(globalMemstoreLimit * CONVERT_TO_PERCENTAGE);
  82   float blockCacheUpperLimit =
  83 conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
  84   HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
  85   int bcul = (int)(blockCacheUpperLimit * CONVERT_TO_PERCENTAGE);
  86   if (CONVERT_TO_PERCENTAGE - (gml + bcul)
  87(int)(CONVERT_TO_PERCENTAGE *
  88   HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD)) 
 {
  89   throw new RuntimeException(
  90 Current heap configuration for MemStore and BlockCache 
 exceeds  +
  91 the threshold required for successful cluster operation.  +
  92 The combined value cannot exceed 0.8. Please check  +
  93 the settings for 
 hbase.regionserver.global.memstore.upperLimit and  +
  94 hfile.block.cache.size in your configuration.  +
  95 hbase.regionserver.global.memstore.upperLimit is  +
  96 globalMemstoreLimit +
  97  hfile.block.cache.size is  + blockCacheUpperLimit);
  98   }
  99   }
 {code}
 Hardcoding 0.8 doesn't make much sense in a heap of 100G+ (that is 20G over 
 for hbase itself -- more than enough).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13877) Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL


 [ 
https://issues.apache.org/jira/browse/HBASE-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13877:
--
Status: Patch Available  (was: Open)

 Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL
 

 Key: HBASE-13877
 URL: https://issues.apache.org/jira/browse/HBASE-13877
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: hbase-13877_v1.patch


 ITBLL with 1.25B rows failed for me (and Stack as reported in 
 https://issues.apache.org/jira/browse/HBASE-13811?focusedCommentId=14577834page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14577834)
  
 HBASE-13811 and HBASE-13853 fixed an issue with WAL edit filtering. 
 The root cause this time seems to be different. It is due to procedure based 
 flush interrupting the flush request in case the procedure is cancelled from 
 an exception elsewhere. This leaves the memstore snapshot intact without 
 aborting the server. The next flush, then flushes the previous memstore with 
 the current seqId (as opposed to seqId from the memstore snapshot). This 
 creates an hfile with larger seqId than what its contents are. Previous 
 behavior in 0.98 and 1.0 (I believe) is that after flush prepare and 
 interruption / exception will cause RS abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13877) Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL


 [ 
https://issues.apache.org/jira/browse/HBASE-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-13877:
--
Attachment: hbase-13877_v1.patch

How about this? 

 Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL
 

 Key: HBASE-13877
 URL: https://issues.apache.org/jira/browse/HBASE-13877
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: hbase-13877_v1.patch


 ITBLL with 1.25B rows failed for me (and Stack as reported in 
 https://issues.apache.org/jira/browse/HBASE-13811?focusedCommentId=14577834page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14577834)
  
 HBASE-13811 and HBASE-13853 fixed an issue with WAL edit filtering. 
 The root cause this time seems to be different. It is due to procedure based 
 flush interrupting the flush request in case the procedure is cancelled from 
 an exception elsewhere. This leaves the memstore snapshot intact without 
 aborting the server. The next flush, then flushes the previous memstore with 
 the current seqId (as opposed to seqId from the memstore snapshot). This 
 creates an hfile with larger seqId than what its contents are. Previous 
 behavior in 0.98 and 1.0 (I believe) is that after flush prepare and 
 interruption / exception will cause RS abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13876) Improving performance of HeapMemoryManager

[
https://issues.apache.org/jira/browse/HBASE-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when server
is either load heavy or write heavy. Earlier version just waited for number of
evictions / number of flushes to be zero which are very rare.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized derivative)
of number of evictions and number of flushes during last two periods. The main
motive for doing this was that if we have random reads then we will be having a
lot of cache misses. But even after increasing block cache we wont be able to
decrease number of cache misses and we will revert back and eventually we will
not waste memory on block caches. This will also help us ignore short term
random spikes in reads / writes.

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when server
is either load heavy or write heavy. Earlier version just waited for number of
evictions / number of flushes to be zero which is very rare to happen.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized derivative)
of number of evictions and number of flushes during last two periods. The main
motive for doing this was that if we have random reads then even after
increasing block cache we wont be able to decrease number of cache misses and
eventually we will not waste memory on block caches.

Improving performance of HeapMemoryManager
--

I am trying to improve the performance of DefaultHeapMemoryTuner by
introducing some more checks. The current checks under which the
DefaultHeapMemoryTuner works are very rare so I am trying to weaken these
checks to improve its performance.
Check current memstore size and current block cache size. If we are using
less than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
server is either load heavy or write heavy. Earlier version just waited for
number of evictions / number of flushes to be zero which are very rare.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has
indeed decreased number of evictions / flush ( combined). I am doing this
analysis by comparing percent change (which is basically nothing but
normalized derivative) of number of evictions and number of flushes during
last two periods. The main motive for doing this was that if we have random
reads then we will be having a lot of cache misses. But even after increasing
block cache we wont be able to decrease number of cache misses and we will
revert back and eventually we will not waste memory on block caches. This
will also help us ignore short term random spikes in reads / writes.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13876) Improving performance of HeapMemoryManager

[
https://issues.apache.org/jira/browse/HBASE-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when server
is either load heavy or write heavy. Earlier version just waited for number of
evictions / number of flushes to be zero which are very rare.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized derivative)
of number of evictions and number of flushes during last two periods. The main
motive for doing this was that if we have random reads then we will be having a
lot of cache misses. But even after increasing block cache we wont be able to
decrease number of cache misses and we will revert back and eventually we will
not waste memory on block caches. This will also help us ignore random short
term spikes in reads / writes.

Check current memstore size and current block cache size. If we are using less
than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when server
is either load heavy or write heavy. Earlier version just waited for number of
evictions / number of flushes to be zero which are very rare.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has indeed
decreased number of evictions / flush ( combined). I am doing this analysis by
comparing percent change (which is basically nothing but normalized derivative)
of number of evictions and number of flushes during last two periods. The main
motive for doing this was that if we have random reads then we will be having a
lot of cache misses. But even after increasing block cache we wont be able to
decrease number of cache misses and we will revert back and eventually we will
not waste memory on block caches. This will also help us ignore short term
random spikes in reads / writes.

Improving performance of HeapMemoryManager
--

I am trying to improve the performance of DefaultHeapMemoryTuner by
introducing some more checks. The current checks under which the
DefaultHeapMemoryTuner works are very rare so I am trying to weaken these
checks to improve its performance.
Check current memstore size and current block cache size. If we are using
less than 50% of currently available block cache size we say block cache is
sufficient and same for memstore. This check will be very effective when
server is either load heavy or write heavy. Earlier version just waited for
number of evictions / number of flushes to be zero which are very rare.
Otherwise based on percent change in number of cache misses and number of
flushes we increase / decrease memory provided for caching / memstore. After
doing so, on next call of HeapMemoryTuner we verify that last change has
indeed decreased number of evictions / flush ( combined). I am doing this
analysis by comparing percent change (which is basically nothing but
normalized derivative) of number of evictions and number of flushes during
last two periods. The main motive for doing this was that if we have random
reads then we will be having a lot of cache misses. But even after increasing
block cache we wont be able to decrease number of cache misses and we will
revert back and eventually we will not waste memory on block caches. This
will

[jira] [Commented] (HBASE-13877) Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL