[jira] [Commented] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865186#comment-13865186
 ] 

Hadoop QA commented on HBASE-9846:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621925/HBASE-9846_5.patch
  against trunk revision .
  ATTACHMENT ID: 12621925

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 34 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//console

This message is automatically generated.

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865189#comment-13865189
 ] 

Anoop Sam John edited comment on HBASE-10292 at 1/8/14 8:14 AM:


Oh sorry for the confusion.  I was not saying about retry in this test .
What I was refering CoprocessorHost#handleCoprocessorThrowable
{code}
if (e instanceof IOException) {
  throw (IOException)e;
}
// If we got here, e is not an IOException. A loaded coprocessor has a
// fatal bug, and the server (master or regionserver) should remove the
// faulty coprocessor from its set of active coprocessors. Setting
// 'hbase.coprocessor.abortonerror' to true will cause abortServer(),
// which may be useful in development and testing environments where
// 'failing fast' for error analysis is desired.
if 
(env.getConfiguration().getBoolean(hbase.coprocessor.abortonerror,false)) {
  // server is configured to abort.
  abortServer(env, e);
} else {
  LOG.error(Removing coprocessor ' + env.toString() + ' from  +
  environment because it threw:   + e,e);
  coprocessors.remove(env);
  try {
shutdown(env);
  } catch (Exception x) {
LOG.error(Uncaught exception when shutting down coprocessor '
+ env.toString() + ', x);
  }
  throw new DoNotRetryIOException(Coprocessor: ' + env.toString() +
  ' threw: ' + e + ' and has been removed from the active  +
  coprocessor set., e);
}
{code}
Was wondering why we are not throwing back DNRIOE to client when RS also 
aborts.  May be that is correct and intended. 



was (Author: anoop.hbase):
Oh sorry for the confusion.  I was not saying about retry in this test .
What I was refering CoprocessorHost#handleCoprocessorThrowable
{code}
if (e instanceof IOException) {
  throw (IOException)e;
}
// If we got here, e is not an IOException. A loaded coprocessor has a
// fatal bug, and the server (master or regionserver) should remove the
// faulty coprocessor from its set of active coprocessors. Setting
// 'hbase.coprocessor.abortonerror' to true will cause abortServer(),
// which may be useful in development and testing environments where
// 'failing fast' for error analysis is desired.
if 
(env.getConfiguration().getBoolean(hbase.coprocessor.abortonerror,false)) {
  // server is configured to abort.
  abortServer(env, e);
} else {
  LOG.error(Removing coprocessor ' + env.toString() + ' from  +
  environment because it threw:   + e,e);
  coprocessors.remove(env);
  try {
shutdown(env);
  } catch (Exception x) {
LOG.error(Uncaught exception when shutting down coprocessor '
+ env.toString() + ', x);
  }
  throw new DoNotRetryIOException(Coprocessor: ' + env.toString() +
  ' threw: ' + e + ' and has been removed from the active  +
  coprocessor set., e);
}
{code}
Was wondering why we are not throwing back DNRIOE to client when RS also 
aborts. 


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865189#comment-13865189
 ] 

Anoop Sam John commented on HBASE-10292:


Oh sorry for the confusion.  I was not saying about retry in this test .
What I was refering CoprocessorHost#handleCoprocessorThrowable
{code}
if (e instanceof IOException) {
  throw (IOException)e;
}
// If we got here, e is not an IOException. A loaded coprocessor has a
// fatal bug, and the server (master or regionserver) should remove the
// faulty coprocessor from its set of active coprocessors. Setting
// 'hbase.coprocessor.abortonerror' to true will cause abortServer(),
// which may be useful in development and testing environments where
// 'failing fast' for error analysis is desired.
if 
(env.getConfiguration().getBoolean(hbase.coprocessor.abortonerror,false)) {
  // server is configured to abort.
  abortServer(env, e);
} else {
  LOG.error(Removing coprocessor ' + env.toString() + ' from  +
  environment because it threw:   + e,e);
  coprocessors.remove(env);
  try {
shutdown(env);
  } catch (Exception x) {
LOG.error(Uncaught exception when shutting down coprocessor '
+ env.toString() + ', x);
  }
  throw new DoNotRetryIOException(Coprocessor: ' + env.toString() +
  ' threw: ' + e + ' and has been removed from the active  +
  coprocessor set., e);
}
{code}
Was wondering why we are not throwing back DNRIOE to client when RS also 
aborts. 


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865191#comment-13865191
 ] 

ramkrishna.s.vasudevan commented on HBASE-10292:


LGTM. +1

 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865216#comment-13865216
 ] 

Hadoop QA commented on HBASE-10156:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621934/10156v4.txt
  against trunk revision .
  ATTACHMENT ID: 12621934

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+ * In this implementation, there is one HLog/WAL.  All edits for all 
Regions are entered first in the HLog/WAL. Each
+ * HRegion is identified by a unique long codeint/code. HRegions do not 
need to declare themselves before using the
+ * HLog/WAL; they simply include their HRegion-id in the codeappend/code 
or codecompleteCacheFlush/code calls.
+ * pThis HLog/WAL implementation keeps multiple on-disk files kept in a 
chronological order. As data is flushed to
+ * other (better) on-disk structures (files sorted by key, hfiles), the log 
becomes obsolete. We can let go of all the
+ * log edits/entries for a given HRegion-id up to the most-recent CACHEFLUSH 
message from that HRegion.  A bunch of work
+ * in the below is done keeping account of these region sequence ids -- what 
is flushed out to hfiles, and what is yet
+ * pIts only practical to delete entire files. Thus, we delete an entire 
on-disk file codeF/code when all of the
+ * edits in codeF/code have a log-sequence-id that's older (smaller) than 
the most-recent CACHEFLUSH message for
+ * pThis implementation performs logfile-rolling internal to the 
implementation, so external callers do not have to be

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.testContendedLogRolling(TestLogRollingNoCluster.java:74)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//console

This message is automatically generated.

 Fix up the HBASE-8755 slowdown when low contention
 --

 Key: HBASE-10156
 URL: https://issues.apache.org/jira/browse/HBASE-10156
 Project: HBase
  Issue Type: Sub-task
  Components: wal
Reporter: stack
  

[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency

2014-01-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865230#comment-13865230
 ] 

Steve Loughran commented on HBASE-10296:


One aspect of ZK that is worth remembering is that it lets other apps keep an 
eye on what is going on

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10297) LoadAndVerify Integration Test for cell visibility

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865214#comment-13865214
 ] 

Hadoop QA commented on HBASE-10297:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621935/HBASE-10297_V2.patch
  against trunk revision .
  ATTACHMENT ID: 12621935

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster
  org.apache.hadoop.hbase.TestIOFencing

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//console

This message is automatically generated.

 LoadAndVerify Integration Test for cell visibility
 --

 Key: HBASE-10297
 URL: https://issues.apache.org/jira/browse/HBASE-10297
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10297.patch, HBASE-10297_V2.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs

2014-01-08 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-9846:
--

Status: Open  (was: Patch Available)

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Samir Ahmic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samir Ahmic updated HBASE-7386:
---

Attachment: HBASE-7386-conf-v2.patch
HBASE-7386-bin-v2.patch

Here is summary of v2 patches:
* Unnecessary comments removed 
* graceful_stop.sh modified to support case when supervisord is used (to reduce 
copy/paste), also script had issue with restoring balancer state that is now 
fixed 
* added  option clean_znode in hbase-daemon.sh that calls cleanZNode(). This 
used by zk_cleaner.py listener script.
* added zk_cleaner.py supervisord event listener which removes znode when 
regionserver crash and send mail notification about that event.  Sending email 
is optional
* i have verify that supervisor approach improves master failover in my testing 
this time is ~7s when using supervisor and when using standard scripts it is 
~40s
* since we have 'autorestart=true' in supervisord config if any process fails 
unexpectedly supervisor will restart it automatically 
 
 

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block

2014-01-08 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865267#comment-13865267
 ] 

Liang Xie commented on HBASE-10263:
---

there're two +1 already. If no new comment/objection,  i'd like to commit 
trunk_v2 into trunk tomorrow.

 make LruBlockCache single/multi/in-memory ratio user-configurable and provide 
 preemptive mode for in-memory type block
 --

 Key: HBASE-10263
 URL: https://issues.apache.org/jira/browse/HBASE-10263
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, 
 HBASE-10263-trunk_v2.patch


 currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 
 1:2:1, which can lead to somewhat counter-intuition behavior for some user 
 scenario where in-memory table's read performance is much worse than ordinary 
 table when two tables' data size is almost equal and larger than 
 regionserver's cache size (we ever did some such experiment and verified that 
 in-memory table random read performance is two times worse than ordinary 
 table).
 this patch fixes above issue and provides:
 1. make single/multi/in-memory ratio user-configurable
 2. provide a configurable switch which can make in-memory block preemptive, 
 by preemptive means when this switch is on in-memory block can kick out any 
 ordinary block to make room until no ordinary block, when this switch is off 
 (by default) the behavior is the same as previous, using 
 single/multi/in-memory ratio to determine evicting.
 by default, above two changes are both off and the behavior keeps the same as 
 before applying this patch. it's client/user's choice to determine whether or 
 which behavior to use by enabling one of these two enhancements.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865272#comment-13865272
 ] 

Nicolas Liochon commented on HBASE-7386:


bq. i have verify that supervisor approach improves master failover in my 
testing this time is ~7s when using supervisor and when using standard scripts 
it is ~40s

This is strange. Do you know why? What is the test scenario?

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865289#comment-13865289
 ] 

Hadoop QA commented on HBASE-10292:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621938/10292.patch
  against trunk revision .
  ATTACHMENT ID: 12621938

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//console

This message is automatically generated.

 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs

2014-01-08 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-9846:
--

Attachment: HBASE-9846_6.patch

This should be good to go.

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs

2014-01-08 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-9846:
--

Status: Patch Available  (was: Open)

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


graceful_stop.sh hung

2014-01-08 Thread hzwangxx
Hi, all
  I restart a region server by using graceful_stop.sh (bin/graceful_stop.sh 
--restart --reload --debug hostname), when running a moment, the process 
hanging as follows:

2014-01-08 18:40:48,150 [main] INFO  region_mover - Moving region 
78c953d53f6498664d9a067701a7e7d7 (42 of 340) to 
server=inspur255.deu.edu.cn,60020,1388056934052
2014-01-08 18:40:50,097 [main] INFO  region_mover - Moving region 
c621b3bf29262ca5248c03a8d6ebb41e (43 of 340) to 
server=inspur253.deu.edu.cn,60020,1388053123213
2014-01-08 18:40:51,652 [main] INFO  region_mover - Moving region 
4dad873a6af4d3a9809339281c3cb34c (44 of 340) to 
server=inspur254.deu.edu.cn,60020,1388054917364
2014-01-08 18:40:56,701 [main] INFO  region_mover - Moving region 
0e311941f5ff202bcefe57aa4079a188 (45 of 340) to 
server=inspur253.deu.edu.cn,60020,1388053123213
2014-01-08 18:40:58,632 [main] INFO  region_mover - Moving region 
3a99fb65caf32a05e18b8b6b93f8 (46 of 340) to 
server=inspur308.deu.edu.cn,60020,1388059770705
2014-01-08 18:41:02,127 [main] INFO  region_mover - Moving region 
34f3ef516fe6fba940eeb0902b9acd3d (47 of 340) to 
server=inspur254.deu.edu.cn,60020,1388054917364
2014-01-08 18:41:03,689 [main] INFO  region_mover - Moving region 
ac201a56d80f13ca5357d474578a91c2 (48 of 340) to 
server=inspur308.deu.edu.cn,60020,1388059770705
2014-01-08 18:41:05,669 [main] INFO  region_mover - Moving region 
812912be704946d24c5f1b5e3184b2f5 (49 of 340) to 
server=inspur253.deu.edu.cn,60020,1388053123213

  I run ‘du' command to check the last region , which  has not any data.
hadoop@inspur249:~/hbase$ hdfs dfs -du -s -h 
/hbase/test/812912be704946d24c5f1b5e3184b2f5/*
486  /hbase/test/812912be704946d24c5f1b5e3184b2f5/.regioninfo
0  /hbase/test/812912be704946d24c5f1b5e3184b2f5/body
0  /hbase/test/812912be704946d24c5f1b5e3184b2f5/meta

hadoop version is cdh4.2.1 and hbase is 0.94

Thanks!
Best Regards~
Xiyi



[jira] [Commented] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865408#comment-13865408
 ] 

Hadoop QA commented on HBASE-9846:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621960/HBASE-9846_6.patch
  against trunk revision .
  ATTACHMENT ID: 12621960

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 40 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//console

This message is automatically generated.

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865479#comment-13865479
 ] 

Samir Ahmic commented on HBASE-7386:


From what i could see it is all about removing master znode in zookeeper. In 
supervisor scenario master znode is deleted by autorestart and in standard 
scripts we don't delete master znode. Is master znode ephemeral ?  It should 
be gone when master dies. 
Test scenario is very simple:
* distrubuted cluster 0.96
* start master and backup master on different machines 
* date; kill -9 master and watch logs on backup master to see when it become 
active
* i have also used python based script that watches '/hbase/master' znode and 
detect changes



 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865513#comment-13865513
 ] 

Nicolas Liochon commented on HBASE-7386:


bq. standard scripts we don't delete master znode
We should, that's what HBASE-5926 is about.It used to work for sure.
It's better to delete it just after the server death, as the restart may never 
happen...


 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865571#comment-13865571
 ] 

Samir Ahmic commented on HBASE-7386:


bq. It's better to delete it just after the server death, as the restart may 
never happen...

Are you suggesting that i modify 'zk_cleaner.py' listener script to delete 
master znode when detects that master is in one of this states 
('PROCESS_STATE_STOPPING', 'PROCESS_STATE_EXITED', 'PROCESS_STATE_UNKNOWN) ? 
I'm already doing this for regionservers so it should few lines of code.

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865594#comment-13865594
 ] 

Nicolas Liochon commented on HBASE-7386:


may be ;-). Just that if the process exit, we can clean the ZK node 
immediately, ideally w/o relying on a separate watchdog. 
What's the PROCESS_STATE_UNKNOWN? 

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865644#comment-13865644
 ] 

Samir Ahmic commented on HBASE-7386:


bq. What's the PROCESS_STATE_UNKNOWN?
According to supervisor documentation [http://supervisord.org/subprocess.html] 
The process is in an unknown state (supervisord programming error).

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865672#comment-13865672
 ] 

Andrew Purtell commented on HBASE-10292:


bq. Was wondering why we are not throwing back DNRIOE to client when RS also 
aborts

Got it. I think the answer is this code predates DNRIOE.

 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10292:
---

   Resolution: Fixed
Fix Version/s: 0.99.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the reviews Stack, Anoop, and Ram. Committed to trunk and 0.98. 

I filed HBASE-10300 for followup.

 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10300) Insure a throw of DoNotRetryIOException when a regionserver aborts

2014-01-08 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-10300:
--

 Summary: Insure a throw of DoNotRetryIOException when a 
regionserver aborts
 Key: HBASE-10300
 URL: https://issues.apache.org/jira/browse/HBASE-10300
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
 Fix For: 0.98.0, 0.99.0


As discussed on HBASE-10292, we may not be throwing DoNotRetryIOExceptions back 
to the client when aborting the server, especially when handling fatal 
coprocessor exceptions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-8889) TestIOFencing#testFencingAroundCompaction occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865688#comment-13865688
 ] 

Andrew Purtell commented on HBASE-8889:
---

The code may be incorrect as indicated, but isn't there a larger issue also? 
Above, a file under compaction has a missing block. Blocks don't normally go 
missing in the test, so missing blocks suggest file deletion. And further above 
there is a warning about a missing file in another case. Might want to 
investigate why/how files can be pulled out from underneath compaction.

 TestIOFencing#testFencingAroundCompaction occasionally fails
 

 Key: HBASE-8889
 URL: https://issues.apache.org/jira/browse/HBASE-8889
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: TestIOFencing.tar.gz


 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/6232//testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/
  :
 {code}
 java.lang.AssertionError: Timed out waiting for new server to open region
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:269)
   at 
 org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:205)
 {code}
 {code}
 2013-07-06 23:13:53,120 INFO  [pool-1-thread-1] hbase.TestIOFencing(266): 
 Waiting for the new server to pick up the region 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03.
 2013-07-06 23:13:54,120 INFO  [pool-1-thread-1] hbase.TestIOFencing(266): 
 Waiting for the new server to pick up the region 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03.
 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] 
 hbase.TestIOFencing$CompactionBlockerRegion(102): allowing compactions
 2013-07-06 23:13:55,121 INFO  [pool-1-thread-1] 
 hbase.HBaseTestingUtility(911): Shutting down minicluster
 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] util.JVMClusterUtil(237): 
 Shutting down HBase Cluster
 2013-07-06 23:13:55,121 INFO  
 [RS:0;asf002:39065-smallCompactions-1373152134716] regionserver.HStore(951): 
 Starting compaction of 2 file(s) in family of 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. into 
 tmpdir=hdfs://localhost:50140/user/jenkins/hbase/tabletest/6e62d3b24ea23160931362b60359ff03/.tmp,
  totalSize=108.4k
 ...
 2013-07-06 23:13:55,155 INFO  [RS:0;asf002:39065] 
 regionserver.HRegionServer(2476): Received CLOSE for the region: 
 6e62d3b24ea23160931362b60359ff03 ,which we are already trying to CLOSE
 2013-07-06 23:13:55,157 WARN  [RS:0;asf002:39065] 
 regionserver.HRegionServer(2414): Failed to close 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. - ignoring and 
 continuing
 org.apache.hadoop.hbase.exceptions.NotServingRegionException: The region 
 6e62d3b24ea23160931362b60359ff03 was already closing. New CLOSE request is 
 ignored.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2479)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2409)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2011)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:903)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:158)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:142)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:337)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.util.Methods.call(Methods.java:41)
   at org.apache.hadoop.hbase.security.User.call(User.java:420)
   at org.apache.hadoop.hbase.security.User.access$300(User.java:51)
   at 
 org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:260)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:140)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10298) [0.98] TestIOFencing fails occasionally

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10298:
---

Attachment: 10298.patch

Attaching what I am going to commit to disable TestIOFencing for the 0.98.0 
release. Will commit trivial test change using CTR shortly unless objection. We 
should make one of the other related issues a blocker for 1.0.

 [0.98] TestIOFencing fails occasionally
 ---

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10298) [0.98] TestIOFencing fails occasionally

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10298:
---

 Priority: Blocker  (was: Major)
Affects Version/s: 0.96.1.1
Fix Version/s: (was: 0.98.0)
   1.0.0
   0.99.0
   0.98.1
 Assignee: (was: Andrew Purtell)

 [0.98] TestIOFencing fails occasionally
 ---

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Priority: Blocker
 Fix For: 0.98.1, 0.99.0, 1.0.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10298) [0.98] TestIOFencing fails occasionally

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865701#comment-13865701
 ] 

Andrew Purtell commented on HBASE-10298:


Made this a blocker for 0.98.1 and 1.0.0.

 [0.98] TestIOFencing fails occasionally
 ---

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Priority: Blocker
 Fix For: 0.98.1, 0.99.0, 1.0.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-8889) TestIOFencing#testFencingAroundCompaction occasionally fails

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8889:
--

 Priority: Blocker  (was: Minor)
Fix Version/s: 1.0.0

As Andrew indicated in HBASE-10298, marking this a blocker for 1.0 release.

 TestIOFencing#testFencingAroundCompaction occasionally fails
 

 Key: HBASE-8889
 URL: https://issues.apache.org/jira/browse/HBASE-8889
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Blocker
 Fix For: 1.0.0

 Attachments: TestIOFencing.tar.gz


 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/6232//testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/
  :
 {code}
 java.lang.AssertionError: Timed out waiting for new server to open region
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:269)
   at 
 org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:205)
 {code}
 {code}
 2013-07-06 23:13:53,120 INFO  [pool-1-thread-1] hbase.TestIOFencing(266): 
 Waiting for the new server to pick up the region 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03.
 2013-07-06 23:13:54,120 INFO  [pool-1-thread-1] hbase.TestIOFencing(266): 
 Waiting for the new server to pick up the region 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03.
 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] 
 hbase.TestIOFencing$CompactionBlockerRegion(102): allowing compactions
 2013-07-06 23:13:55,121 INFO  [pool-1-thread-1] 
 hbase.HBaseTestingUtility(911): Shutting down minicluster
 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] util.JVMClusterUtil(237): 
 Shutting down HBase Cluster
 2013-07-06 23:13:55,121 INFO  
 [RS:0;asf002:39065-smallCompactions-1373152134716] regionserver.HStore(951): 
 Starting compaction of 2 file(s) in family of 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. into 
 tmpdir=hdfs://localhost:50140/user/jenkins/hbase/tabletest/6e62d3b24ea23160931362b60359ff03/.tmp,
  totalSize=108.4k
 ...
 2013-07-06 23:13:55,155 INFO  [RS:0;asf002:39065] 
 regionserver.HRegionServer(2476): Received CLOSE for the region: 
 6e62d3b24ea23160931362b60359ff03 ,which we are already trying to CLOSE
 2013-07-06 23:13:55,157 WARN  [RS:0;asf002:39065] 
 regionserver.HRegionServer(2414): Failed to close 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. - ignoring and 
 continuing
 org.apache.hadoop.hbase.exceptions.NotServingRegionException: The region 
 6e62d3b24ea23160931362b60359ff03 was already closing. New CLOSE request is 
 ignored.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2479)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2409)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2011)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:903)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:158)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:142)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:337)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.util.Methods.call(Methods.java:41)
   at org.apache.hadoop.hbase.security.User.call(User.java:420)
   at org.apache.hadoop.hbase.security.User.access$300(User.java:51)
   at 
 org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:260)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:140)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10298) TestIOFencing reveals an unhandled race

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10298:
---

Summary: TestIOFencing reveals an unhandled race  (was: [0.98] 
TestIOFencing fails occasionally)

 TestIOFencing reveals an unhandled race
 ---

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Priority: Blocker
 Fix For: 0.98.1, 0.99.0, 1.0.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10298) TestIOFencing occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10298:
---

Summary: TestIOFencing occasionally fails  (was: TestIOFencing reveals an 
unhandled race)

 TestIOFencing occasionally fails
 

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Priority: Blocker
 Fix For: 0.98.1, 0.99.0, 1.0.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Gustavo Anatoly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gustavo Anatoly updated HBASE-9948:
---

Attachment: HBASE-9948.patch

Ted, could you please review the patch?

Thanks.

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Gustavo Anatoly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gustavo Anatoly updated HBASE-9948:
---

Status: Patch Available  (was: Open)

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10297) LoadAndVerify Integration Test for cell visibility

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865720#comment-13865720
 ] 

Andrew Purtell commented on HBASE-10297:


+1

I don't think the test failures are related, but perhaps run HadoopQA one more 
time. 

The Javadoc for the test needs updating to describe what additional things this 
new integration test checks for. Can be done at commit time. 

 LoadAndVerify Integration Test for cell visibility
 --

 Key: HBASE-10297
 URL: https://issues.apache.org/jira/browse/HBASE-10297
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10297.patch, HBASE-10297_V2.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865722#comment-13865722
 ] 

Ted Yu commented on HBASE-9948:
---

Thanks for the patch.
{code}
+} catch (DuplicatedSplitLogException dsle) {
+  LOG.warn(dsle.getMessage());
+}
{code}
Should the method return in the catch block ? There is no need to do metrics 
for the duplicate log split request.

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865725#comment-13865725
 ] 

Andrew Purtell commented on HBASE-9846:
---

+1

Javadoc for the new integration test should be updated to describe what it 
checks for. Can be fixed up at commit time.

Thanks a lot Ram!

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10293) Master and RS GC logs can conflict when run on same host

2014-01-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865755#comment-13865755
 ] 

Enis Soztutar commented on HBASE-10293:
---

I guess we can do a similar thing to the log4j logs, where we will append the 
username and daemon name to the gc log. 

 Master and RS GC logs can conflict when run on same host
 

 Key: HBASE-10293
 URL: https://issues.apache.org/jira/browse/HBASE-10293
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.96.1.1
Reporter: Nick Dimiduk

 My issue manifests when I uncomment the line {{export SERVER_GC_OPTS=...}} in 
 hbase-env.sh and start HBase. It's a single node in distributed mode, so both 
 a Master and RegionServer are started on that host. Both start commands are 
 run in the same minute, so only one gc.log-`date` file is created. `lsof` 
 indicates two processes are writing to that file and the output of `ps` 
 confirms they both received the same 
 {{-Xloggc:/grid/0/var/log/hbase/gc.log-201401071515}} argument.
 Presumably, the same will happen for folks running the thrift and rest 
 gateways on the same box (any java process itemized in the server_cmds array 
 in bin/hbase).
 Related (the reason I discovered this issue in the first place), stopping the 
 master process results in its gc.log being truncated.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Gustavo Anatoly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865765#comment-13865765
 ] 

Gustavo Anatoly commented on HBASE-9948:


Hi, [~yuzhih...@gmail.com]

You're right. I can change this block to:
{code}
+} catch (DuplicatedSplitLogException dsle) {
+  return;
+}
{code}

Thanks for review.

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-8889) TestIOFencing#testFencingAroundCompaction occasionally fails

2014-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865767#comment-13865767
 ] 

Ted Yu commented on HBASE-8889:
---

I am looping the test with the following change so that I get better idea on 
timing of the failure.
{code}
Index: hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java
===
--- hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java 
(revision 1556059)
+++ hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java 
(working copy)
@@ -294,7 +298,8 @@
   // all the files we expect are still working when region is up in new 
location.
   FileSystem fs = newRegion.getFilesystem();
   for (String f: newRegion.getStoreFileList(new byte [][] {FAMILY})) {
-assertTrue(After compaction, does not exist:  + f, fs.exists(new 
Path(f)));
+assertTrue(After compaction, does not exist:  + f +  @ + 
System.currentTimeMillis(),
+  fs.exists(new Path(f)));
   }
   // If we survive the split keep going...
   // Now we make sure that the region isn't totally confused.  Load up 
more rows.
{code}

 TestIOFencing#testFencingAroundCompaction occasionally fails
 

 Key: HBASE-8889
 URL: https://issues.apache.org/jira/browse/HBASE-8889
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Blocker
 Fix For: 1.0.0

 Attachments: TestIOFencing.tar.gz


 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/6232//testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/
  :
 {code}
 java.lang.AssertionError: Timed out waiting for new server to open region
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:269)
   at 
 org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:205)
 {code}
 {code}
 2013-07-06 23:13:53,120 INFO  [pool-1-thread-1] hbase.TestIOFencing(266): 
 Waiting for the new server to pick up the region 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03.
 2013-07-06 23:13:54,120 INFO  [pool-1-thread-1] hbase.TestIOFencing(266): 
 Waiting for the new server to pick up the region 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03.
 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] 
 hbase.TestIOFencing$CompactionBlockerRegion(102): allowing compactions
 2013-07-06 23:13:55,121 INFO  [pool-1-thread-1] 
 hbase.HBaseTestingUtility(911): Shutting down minicluster
 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] util.JVMClusterUtil(237): 
 Shutting down HBase Cluster
 2013-07-06 23:13:55,121 INFO  
 [RS:0;asf002:39065-smallCompactions-1373152134716] regionserver.HStore(951): 
 Starting compaction of 2 file(s) in family of 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. into 
 tmpdir=hdfs://localhost:50140/user/jenkins/hbase/tabletest/6e62d3b24ea23160931362b60359ff03/.tmp,
  totalSize=108.4k
 ...
 2013-07-06 23:13:55,155 INFO  [RS:0;asf002:39065] 
 regionserver.HRegionServer(2476): Received CLOSE for the region: 
 6e62d3b24ea23160931362b60359ff03 ,which we are already trying to CLOSE
 2013-07-06 23:13:55,157 WARN  [RS:0;asf002:39065] 
 regionserver.HRegionServer(2414): Failed to close 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. - ignoring and 
 continuing
 org.apache.hadoop.hbase.exceptions.NotServingRegionException: The region 
 6e62d3b24ea23160931362b60359ff03 was already closing. New CLOSE request is 
 ignored.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2479)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2409)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2011)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:903)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:158)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:142)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:337)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 

[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Gustavo Anatoly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gustavo Anatoly updated HBASE-9948:
---

Attachment: HBASE-9948-v2.patch

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers

2014-01-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865772#comment-13865772
 ] 

Enis Soztutar commented on HBASE-10274:
---

bq. I think it's better to fix it because our codebase is 0.94.
Do you mind backporting the patch for HBASE-6820? We cannot commit this to 0.94 
unless the backport is there. 

 MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
 ---

 Key: HBASE-10274
 URL: https://issues.apache.org/jira/browse/HBASE-10274
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao
Priority: Minor
 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, 
 HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch


 HBASE-6820 points out the problem but not fix completely.
 killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will 
 shutdown the ZooKeeperServer and need to close ZKDatabase as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865797#comment-13865797
 ] 

Jeffrey Zhong commented on HBASE-9948:
--

We should check why TestRestartCluster has duplicate log split requests error 
at first place.

The current patch doesn't work because SplitLog has to be a blocking call till 
the requested logs complete log splitting process otherwise region assignment 
could happen before a log splitting completes which will cause data loss. 

I'd suggest the fix can skip scheduling dup log splitting but wait for them to 
finish. 

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865800#comment-13865800
 ] 

Hudson commented on HBASE-10292:


SUCCESS: Integrated in HBase-TRUNK #4797 (See 
[https://builds.apache.org/job/HBase-TRUNK/4797/])
HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails occasionally 
(apurtell: rev 1556586)
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865806#comment-13865806
 ] 

Hadoop QA commented on HBASE-9948:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622005/HBASE-9948.patch
  against trunk revision .
  ATTACHMENT ID: 12622005

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 1.3.9) to fail.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8366//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8366//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8366//console

This message is automatically generated.

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Gustavo Anatoly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865848#comment-13865848
 ] 

Gustavo Anatoly commented on HBASE-9948:


Hi, [~jeffreyz].

I will follow your suggestions and really to avoid data loss the request 
splitting log process should be an atomic operation, so the best way is 
investigate the root causes of dup log.

[~yuzhih...@gmail.com], How can I reproduce this scenario?

Thank you [~jeffreyz].

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9948:
--

Status: Open  (was: Patch Available)

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865851#comment-13865851
 ] 

Ted Yu commented on HBASE-9948:
---

Let me search my computer to see if I have the test output.

Meanwhile, you can loop TestRestartCluster and see if the duplicate message 
appears in test output.

Thanks

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9948:
--

Attachment: runtest.sh

Script I use for looping tests.

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Gustavo Anatoly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865858#comment-13865858
 ] 

Gustavo Anatoly commented on HBASE-9948:


Thanks, [~yuzhih...@gmail.com]

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails

2014-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865859#comment-13865859
 ] 

Hudson commented on HBASE-10298:


SUCCESS: Integrated in HBase-0.98 #63 (See 
[https://builds.apache.org/job/HBase-0.98/63/])
HBASE-10298. TestIOFencing occasionally fails

This flapping test produces low confidence results so temporarily
disable it while tracking down the cause. (apurtell: rev 1556597)
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java


 TestIOFencing occasionally fails
 

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Priority: Blocker
 Fix For: 0.98.1, 0.99.0, 1.0.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865860#comment-13865860
 ] 

Hudson commented on HBASE-10292:


SUCCESS: Integrated in HBase-0.98 #63 (See 
[https://builds.apache.org/job/HBase-0.98/63/])
HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails occasionally 
(apurtell: rev 1556590)
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865864#comment-13865864
 ] 

Ted Yu commented on HBASE-9948:
---

Please modify the following line in the script to look for 'duplicate log split 
scheduled' :
{code}
grep NullPointerException 
hbase-server/target/surefire-reports/*${test[$j]%\#*}-output.txt
{code}

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10300) Insure a throw of DoNotRetryIOException when a regionserver aborts

2014-01-08 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865866#comment-13865866
 ] 

Devaraj Das commented on HBASE-10300:
-

My understanding was that DoNotRetryIOException is thrown for cases where there 
is no point retrying at a global level (for e.g., a table is disabled and the 
client shouldn't try to get data from its regions). When a regionserver aborts, 
and a client was connected to it before should retry it's operations to the 
failed over regionserver(s), no? Why should it be a DoNotRetryIOException..

 Insure a throw of DoNotRetryIOException when a regionserver aborts
 --

 Key: HBASE-10300
 URL: https://issues.apache.org/jira/browse/HBASE-10300
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
 Fix For: 0.98.0, 0.99.0


 As discussed on HBASE-10292, we may not be throwing DoNotRetryIOExceptions 
 back to the client when aborting the server, especially when handling fatal 
 coprocessor exceptions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Gustavo Anatoly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865867#comment-13865867
 ] 

Gustavo Anatoly commented on HBASE-9948:


Okay

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605)
 at java.lang.Thread.run(Thread.java:724)
 2013-11-11 19:59:55,539 INFO  [M:0;kiyo:36213] master.HMaster(2386): Aborting
 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping 
 service threads
 {code}
 HMaster should handle duplicate log split requests, instead of aborting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-8889) TestIOFencing#testFencingAroundCompaction occasionally fails

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8889:
--

Attachment: TestIOFencing-#8362.tar.gz

Log for another failure on QA build #8362

 TestIOFencing#testFencingAroundCompaction occasionally fails
 

 Key: HBASE-8889
 URL: https://issues.apache.org/jira/browse/HBASE-8889
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Blocker
 Fix For: 1.0.0

 Attachments: TestIOFencing-#8362.tar.gz, TestIOFencing.tar.gz


 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/6232//testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/
  :
 {code}
 java.lang.AssertionError: Timed out waiting for new server to open region
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:269)
   at 
 org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:205)
 {code}
 {code}
 2013-07-06 23:13:53,120 INFO  [pool-1-thread-1] hbase.TestIOFencing(266): 
 Waiting for the new server to pick up the region 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03.
 2013-07-06 23:13:54,120 INFO  [pool-1-thread-1] hbase.TestIOFencing(266): 
 Waiting for the new server to pick up the region 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03.
 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] 
 hbase.TestIOFencing$CompactionBlockerRegion(102): allowing compactions
 2013-07-06 23:13:55,121 INFO  [pool-1-thread-1] 
 hbase.HBaseTestingUtility(911): Shutting down minicluster
 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] util.JVMClusterUtil(237): 
 Shutting down HBase Cluster
 2013-07-06 23:13:55,121 INFO  
 [RS:0;asf002:39065-smallCompactions-1373152134716] regionserver.HStore(951): 
 Starting compaction of 2 file(s) in family of 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. into 
 tmpdir=hdfs://localhost:50140/user/jenkins/hbase/tabletest/6e62d3b24ea23160931362b60359ff03/.tmp,
  totalSize=108.4k
 ...
 2013-07-06 23:13:55,155 INFO  [RS:0;asf002:39065] 
 regionserver.HRegionServer(2476): Received CLOSE for the region: 
 6e62d3b24ea23160931362b60359ff03 ,which we are already trying to CLOSE
 2013-07-06 23:13:55,157 WARN  [RS:0;asf002:39065] 
 regionserver.HRegionServer(2414): Failed to close 
 tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. - ignoring and 
 continuing
 org.apache.hadoop.hbase.exceptions.NotServingRegionException: The region 
 6e62d3b24ea23160931362b60359ff03 was already closing. New CLOSE request is 
 ignored.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2479)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2409)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2011)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:903)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:158)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:142)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:337)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.util.Methods.call(Methods.java:41)
   at org.apache.hadoop.hbase.security.User.call(User.java:420)
   at org.apache.hadoop.hbase.security.User.access$300(User.java:51)
   at 
 org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:260)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:140)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9999) Add support for small reverse scan

2014-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865895#comment-13865895
 ] 

Ted Yu commented on HBASE-:
---

Similar to ClientSmallScanner.java, ClientSmallReverseScanner.java can be added 
to facilitate small reverse scan.

 Add support for small reverse scan
 --

 Key: HBASE-
 URL: https://issues.apache.org/jira/browse/HBASE-
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu

 HBASE-4811 adds feature of reverse scan. This JIRA adds the support for small 
 reverse scan.
 This is activated when both 'reversed' and 'small' attributes are true in 
 Scan Object



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865897#comment-13865897
 ] 

Hudson commented on HBASE-10292:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #57 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/57/])
HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails occasionally 
(apurtell: rev 1556590)
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails

2014-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865896#comment-13865896
 ] 

Hudson commented on HBASE-10298:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #57 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/57/])
HBASE-10298. TestIOFencing occasionally fails

This flapping test produces low confidence results so temporarily
disable it while tracking down the cause. (apurtell: rev 1556597)
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java


 TestIOFencing occasionally fails
 

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Priority: Blocker
 Fix For: 0.98.1, 0.99.0, 1.0.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865910#comment-13865910
 ] 

Hadoop QA commented on HBASE-9948:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622017/HBASE-9948-v2.patch
  against trunk revision .
  ATTACHMENT ID: 12622017

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8367//console

This message is automatically generated.

 HMaster should handle duplicate log split requests
 --

 Key: HBASE-9948
 URL: https://issues.apache.org/jira/browse/HBASE-9948
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Gustavo Anatoly
 Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh


 I saw the following in test output for TestRestartCluster:
 {code}
 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): 
 Scheduling batch of logs to split
 2013-11-11 19:59:55,538 INFO  [M:0;kiyo:36213] master.SplitLogManager(329): 
 started splitting 1 logs in 
 [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting]
 2013-11-11 19:59:55,538 WARN  [M:0;kiyo:36213] master.SplitLogManager(1048): 
 Failure because two threads can't wait for the same task; 
 path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master 
 server abort: loaded coprocessors are: 
 [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): 
 Unhandled exception. Starting shutdown.
 java.io.IOException: duplicate log split scheduled for 
 hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta
 at 
 

[jira] [Created] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Ted Yu (JIRA)
Ted Yu created HBASE-10301:
--

 Summary: TestAssignmentManagerOnCluster#testOpenCloseRacing fails 
intermittently
 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


The test failure came from PreCommit build #8362
{code}
2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
previous transition plan found (or ignoring an existing plan) for 
testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
generated random 
plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., 
src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
available=4) available servers, forceNewPlan=false
...
2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., 
it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
org.apache.hadoop.hbase.NotServingRegionException: 
org.apache.hadoop.hbase.NotServingRegionException: The region 
c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
cancelled.
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
  at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
  at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
  at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
  at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
  at java.lang.Thread.run(Thread.java:662)

  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
  at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
  at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
  at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
  at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
  at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
  at 
org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
  at 
org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
  at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
  at 
org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
...
2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
previous transition plan found (or ignoring an existing plan) for 
testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
generated random 
plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., 
src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
available=4) available servers, forceNewPlan=true
{code}
The second call to getRegionPlan() returned the same server, thus leading to 
assertion failure:
{code}
  assertFalse(Region should assigned on a new region server,
oldServerName.equals(serverName));
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865946#comment-13865946
 ] 

Ted Yu commented on HBASE-10301:


Here is one way of making the test pass reliably:
Randomly choose a server which is different from oldServerName and utilize this 
method in AssignmentManager:
{code}
  boolean assign(final ServerName destination, final ListHRegionInfo regions) 
{
{code}

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:662)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
   at 
 org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
 ...
 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=true
 {code}
 The second call to getRegionPlan() returned the same server, thus leading to 
 assertion failure:
 {code}
   assertFalse(Region should assigned on a new region server,
 oldServerName.equals(serverName));
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10301:
---

Attachment: testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:662)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
   at 
 org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
 ...
 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=true
 {code}
 The second call to getRegionPlan() returned the same server, thus leading to 
 assertion failure:
 {code}
   assertFalse(Region should assigned on a new region server,
 oldServerName.equals(serverName));
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails

2014-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865972#comment-13865972
 ] 

Hudson commented on HBASE-10298:


SUCCESS: Integrated in HBase-TRUNK #4798 (See 
[https://builds.apache.org/job/HBase-TRUNK/4798/])
HBASE-10298. TestIOFencing occasionally fails

This flapping test produces low confidence results so temporarily
disable it while tracking down the cause. (apurtell: rev 1556596)
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java


 TestIOFencing occasionally fails
 

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Priority: Blocker
 Fix For: 0.98.1, 0.99.0, 1.0.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10301:
---

Attachment: 10301-v1.txt

Patch v1 selects server other than oldServerName for reassignment.

Also corrected grammar in assertion message.

TestAssignmentManagerOnCluster passes locally.

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: 10301-v1.txt, 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:662)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
   at 
 org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
 ...
 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=true
 {code}
 The second call to getRegionPlan() returned the same server, thus leading to 
 assertion failure:
 {code}
   assertFalse(Region should assigned on a new region server,
 oldServerName.equals(serverName));
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-10301:
--

Assignee: Ted Yu

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10301-v1.txt, 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:662)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
   at 
 org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
 ...
 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=true
 {code}
 The second call to getRegionPlan() returned the same server, thus leading to 
 assertion failure:
 {code}
   assertFalse(Region should assigned on a new region server,
 oldServerName.equals(serverName));
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10301:
---

Status: Patch Available  (was: Open)

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10301-v1.txt, 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:662)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
   at 
 org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
 ...
 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=true
 {code}
 The second call to getRegionPlan() returned the same server, thus leading to 
 assertion failure:
 {code}
   assertFalse(Region should assigned on a new region server,
 oldServerName.equals(serverName));
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10268:
---

Status: Patch Available  (was: Open)

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10268:
---

Attachment: 10268.patch

Add default test case timeouts. When waiting internally, uniformly use a 
timeout of 10s. 

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10268:
---

Attachment: 10268.patch

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10268:
---

Attachment: (was: 10268.patch)

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10302) Fix rat check issues in hbase-native-client.

2014-01-08 Thread Elliott Clark (JIRA)
Elliott Clark created HBASE-10302:
-

 Summary: Fix rat check issues in hbase-native-client.
 Key: HBASE-10302
 URL: https://issues.apache.org/jira/browse/HBASE-10302
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers

2014-01-08 Thread chendihao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866100#comment-13866100
 ] 

chendihao commented on HBASE-10274:
---

Backporting HBASE-6820 seems good for us. Thanks for considering. [~enis]
Let's opening another issue to do that.

 MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
 ---

 Key: HBASE-10274
 URL: https://issues.apache.org/jira/browse/HBASE-10274
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao
Priority: Minor
 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, 
 HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch


 HBASE-6820 points out the problem but not fix completely.
 killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will 
 shutdown the ZooKeeperServer and need to close ZKDatabase as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866105#comment-13866105
 ] 

Andrew Purtell commented on HBASE-10301:


lgtm

Maybe query the minicluster for the limit of the index you are using for 
getRegionServer instead of assuming 4? 

Please commit to 0.98 as well as trunk if you like.

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10301-v1.txt, 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:662)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
   at 
 org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
 ...
 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=true
 {code}
 The second call to getRegionPlan() returned the same server, thus leading to 
 assertion failure:
 {code}
   assertFalse(Region should assigned on a new region server,
 oldServerName.equals(serverName));
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866107#comment-13866107
 ] 

Andrew Purtell commented on HBASE-10298:


I see Ted made HBASE-8889 a blocker for 1.0 so I will change the scope of this 
JIRA for just the test disable change.

 TestIOFencing occasionally fails
 

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Priority: Blocker
 Fix For: 0.98.1, 0.99.0, 1.0.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10300) Insure a throw of DoNotRetryIOException when a regionserver aborts

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866106#comment-13866106
 ] 

Andrew Purtell commented on HBASE-10300:


Ping [~anoop.hbase]

 Insure a throw of DoNotRetryIOException when a regionserver aborts
 --

 Key: HBASE-10300
 URL: https://issues.apache.org/jira/browse/HBASE-10300
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
 Fix For: 0.98.0, 0.99.0


 As discussed on HBASE-10292, we may not be throwing DoNotRetryIOExceptions 
 back to the client when aborting the server, especially when handling fatal 
 coprocessor exceptions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10298) TestIOFencing occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10298:
---

 Priority: Major  (was: Blocker)
Fix Version/s: (was: 1.0.0)
   (was: 0.98.1)
   0.98.0
 Assignee: Andrew Purtell

 TestIOFencing occasionally fails
 

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HBASE-10298) TestIOFencing occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-10298.


Resolution: Fixed

Patch disabling this test for now committed to trunk and 0.98, resolving

 TestIOFencing occasionally fails
 

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866110#comment-13866110
 ] 

Hadoop QA commented on HBASE-10301:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622057/10301-v1.txt
  against trunk revision .
  ATTACHMENT ID: 12622057

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8368//console

This message is automatically generated.

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10301-v1.txt, 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 

[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866125#comment-13866125
 ] 

Andrew Purtell commented on HBASE-10296:


Use of ZK has issues but what we had before was much worse. We had heartbeating 
and partially desynchronized state in a bunch of places. Rather than implement 
our own consensus protocol we used the specialist component ZK. Engineering 
distributed consensus protocols is a long term endeavor full of corner cases 
and hard to debug problems. It is worth consideration, but maybe only as a last 
resort. Does something about our use of ZK or ZK itself have fatal issues?

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10293) Master and RS GC logs can conflict when run on same host

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866128#comment-13866128
 ] 

Andrew Purtell commented on HBASE-10293:


I run multiple servers on the same box. Each has their own config directory 
with their own hbase-env.sh. This is trivial to do with Puppet or pick your 
favorite configuration management tool. Kind of a non issue?

 Master and RS GC logs can conflict when run on same host
 

 Key: HBASE-10293
 URL: https://issues.apache.org/jira/browse/HBASE-10293
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.96.1.1
Reporter: Nick Dimiduk

 My issue manifests when I uncomment the line {{export SERVER_GC_OPTS=...}} in 
 hbase-env.sh and start HBase. It's a single node in distributed mode, so both 
 a Master and RegionServer are started on that host. Both start commands are 
 run in the same minute, so only one gc.log-`date` file is created. `lsof` 
 indicates two processes are writing to that file and the output of `ps` 
 confirms they both received the same 
 {{-Xloggc:/grid/0/var/log/hbase/gc.log-201401071515}} argument.
 Presumably, the same will happen for folks running the thrift and rest 
 gateways on the same box (any java process itemized in the server_cmds array 
 in bin/hbase).
 Related (the reason I discovered this issue in the first place), stopping the 
 master process results in its gc.log being truncated.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866132#comment-13866132
 ] 

Andrew Purtell commented on HBASE-10268:


I could get this test to fail within a few iterations on one box. Now 25 have 
passed in succession with the attached patch. Will continue out to 100 
iterations. If they all pass and HadoopQA provides a good result here, I am 
going to commit this test only fix.

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866135#comment-13866135
 ] 

Jeffrey Zhong commented on HBASE-10301:
---

The fix looks good to me. You may want to change
{code}
+  for (int i = 0; i  4; i++) {
+HRegionServer destServer = 
TEST_UTIL.getHBaseCluster().getRegionServer(i);
{code}

to the following avoiding hard code 4.

{code}
  for (RegionServerThread rst : cluster.getLiveRegionServerThreads()) {
HRegionServer hrs = rst.getRegionServer();
{code}

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10301-v1.txt, 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:662)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
   at 
 org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
 ...
 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=true
 {code}
 The second call to getRegionPlan() returned the same server, thus leading to 
 assertion failure:
 {code}
   assertFalse(Region should assigned on a new region server,
 oldServerName.equals(serverName));
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866150#comment-13866150
 ] 

Hudson commented on HBASE-10292:


SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #46 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/46/])
HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails occasionally 
(apurtell: rev 1556586)
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails

2014-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866149#comment-13866149
 ] 

Hudson commented on HBASE-10298:


SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #46 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/46/])
HBASE-10298. TestIOFencing occasionally fails

This flapping test produces low confidence results so temporarily
disable it while tracking down the cause. (apurtell: rev 1556596)
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java


 TestIOFencing occasionally fails
 

 Key: HBASE-10298
 URL: https://issues.apache.org/jira/browse/HBASE-10298
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10298.patch


 I can reproduce this using JDK 6 on Ubuntu 13.10.
 {noformat}
 Running org.apache.hadoop.hbase.TestIOFencing
 Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec 
  FAILURE!
 {noformat}
 No failure trace captured yet. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866173#comment-13866173
 ] 

Lars Hofhansl commented on HBASE-10268:
---

Should be good for 0.94 (and presumably 0.96) as well.

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866174#comment-13866174
 ] 

Andrew Purtell commented on HBASE-10268:


The test itself timed out on run #56. Trying again with 120s timeouts per test. 
Beyond that, this is going to need a deeper look.

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10303) Have snappy support properly documented would be helpful to hadoop and hbase users

2014-01-08 Thread Rural Hunter (JIRA)
Rural Hunter created HBASE-10303:


 Summary: Have snappy support properly documented would be helpful 
to hadoop and hbase users
 Key: HBASE-10303
 URL: https://issues.apache.org/jira/browse/HBASE-10303
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Rural Hunter


The currentl document for configuring snappy 
support(http://hbase.apache.org/book/snappy.compression.html) is not complete 
and it's a bit obscure. IMO, there are several improvments can be made:
1. Describe the relationship among hadoop,hbase,snappy. Is the snappy actually 
needed by hadoop hdfs or hbase itself? That's to make clear if you need to 
configure snappy support in hbase or hadoop.
2. It didn't mention the default hadoop binary package is compiled without 
snappy support and you need to compile it with snappy option manually. Actually 
it didn't work with any native libs on 64 bits OS as the libhadoop.so in the 
binary package is only for 32 bits OS(this of course is a hadoop issue not 
hbase. but it's good to mention it.).
3. In my experience, I actually need to install both snappy and hadoop-snappy. 
So the doc lack of the steps to install hadoop-snappy. 
4. During my set up, I found difference where hadoop and hbase to pick up the 
native lib files. hadoop picks those files in ./lib while hbase picks in 
./lib/[PLATFORM]. If it's correct, it can also be mentioned.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10293) Master and RS GC logs can conflict when run on same host

2014-01-08 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866180#comment-13866180
 ] 

Nick Dimiduk commented on HBASE-10293:
--

That's probably true for someone with production-level automation 
infrastructure. I'm thinking of the fellow who downloads a tarball, follows the 
instructions in the hbase-env.sh comments and is surprised by the outcome. 
FWIW, I don't think it's common to run each process out of it's own config 
directory. Likewise, I don't think it's common to set each process to log to 
it's own log directory either. Rather I tend to see /var/log/hbase containing 
all the HBase process logs for the machine.

*shrug* I don't feel strongly about the issue, it just surprised me while I was 
setting up some performance infra recently. If you'd prefer to defer this kind 
of concern to the puppetiers, I guess resolve as not a problem.

 Master and RS GC logs can conflict when run on same host
 

 Key: HBASE-10293
 URL: https://issues.apache.org/jira/browse/HBASE-10293
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.96.1.1
Reporter: Nick Dimiduk

 My issue manifests when I uncomment the line {{export SERVER_GC_OPTS=...}} in 
 hbase-env.sh and start HBase. It's a single node in distributed mode, so both 
 a Master and RegionServer are started on that host. Both start commands are 
 run in the same minute, so only one gc.log-`date` file is created. `lsof` 
 indicates two processes are writing to that file and the output of `ps` 
 confirms they both received the same 
 {{-Xloggc:/grid/0/var/log/hbase/gc.log-201401071515}} argument.
 Presumably, the same will happen for folks running the thrift and rest 
 gateways on the same box (any java process itemized in the server_cmds array 
 in bin/hbase).
 Related (the reason I discovered this issue in the first place), stopping the 
 master process results in its gc.log being truncated.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866179#comment-13866179
 ] 

Andrew Purtell commented on HBASE-10268:


bq. Should be good for 0.94 (and presumably 0.96) as well.

Sure [~lhofhansl]. And [~stack] made a comment up on dev@ about 
TestSplitLogWorker, I suspect he won't mind a test fix in 0.96 also. If I don't 
make things worse by introducing too short junit test timeouts (working on it) 
then I will commit this everywhere. 

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10301:
---

Attachment: 10301-v2.txt

Patch v2 addresses Andy and Jeff's comments.

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10301-v1.txt, 10301-v2.txt, 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:662)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
   at 
 org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
 ...
 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=true
 {code}
 The second call to getRegionPlan() returned the same server, thus leading to 
 assertion failure:
 {code}
   assertFalse(Region should assigned on a new region server,
 oldServerName.equals(serverName));
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10293) Master and RS GC logs can conflict when run on same host

2014-01-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866183#comment-13866183
 ] 

Andrew Purtell commented on HBASE-10293:


bq. *shrug* I don't feel strongly about the issue

Same here. :-)

 Master and RS GC logs can conflict when run on same host
 

 Key: HBASE-10293
 URL: https://issues.apache.org/jira/browse/HBASE-10293
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.96.1.1
Reporter: Nick Dimiduk

 My issue manifests when I uncomment the line {{export SERVER_GC_OPTS=...}} in 
 hbase-env.sh and start HBase. It's a single node in distributed mode, so both 
 a Master and RegionServer are started on that host. Both start commands are 
 run in the same minute, so only one gc.log-`date` file is created. `lsof` 
 indicates two processes are writing to that file and the output of `ps` 
 confirms they both received the same 
 {{-Xloggc:/grid/0/var/log/hbase/gc.log-201401071515}} argument.
 Presumably, the same will happen for folks running the thrift and rest 
 gateways on the same box (any java process itemized in the server_cmds array 
 in bin/hbase).
 Related (the reason I discovered this issue in the first place), stopping the 
 master process results in its gc.log being truncated.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866187#comment-13866187
 ] 

Hadoop QA commented on HBASE-10268:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622072/10268.patch
  against trunk revision .
  ATTACHMENT ID: 12622072

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8369//console

This message is automatically generated.

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866211#comment-13866211
 ] 

Hadoop QA commented on HBASE-10268:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622076/10268.patch
  against trunk revision .
  ATTACHMENT ID: 12622076

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8370//console

This message is automatically generated.

 TestSplitLogWorker occasionally fails
 -

 Key: HBASE-10268
 URL: https://issues.apache.org/jira/browse/HBASE-10268
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.99.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10268.patch


 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, 
 but only when using JDK 6 on Ubuntu 12.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers

2014-01-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866226#comment-13866226
 ] 

Enis Soztutar commented on HBASE-10274:
---

bq. Do you mind backporting the patch for HBASE-6820
I meant do you want to do the backport : ) 

 MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
 ---

 Key: HBASE-10274
 URL: https://issues.apache.org/jira/browse/HBASE-10274
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao
Priority: Minor
 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, 
 HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch


 HBASE-6820 points out the problem but not fix completely.
 killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will 
 shutdown the ZooKeeperServer and need to close ZKDatabase as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10302) Fix rat check issues in hbase-native-client.

2014-01-08 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866238#comment-13866238
 ] 

Liang Xie commented on HBASE-10302:
---

It was this, right ? :)
 !? hbase-native-client/cmake_modules/FindGTest.cmake
 !? hbase-native-client/cmake_modules/FindLibEv.cmake
 !? hbase-native-client/README.md
 !? hbase-native-client/src/rpc/CMakeLists.txt
Lines that start with ? in the release audit report indicate files that do 
not have an Apache license header.


 Fix rat check issues in hbase-native-client.
 

 Key: HBASE-10302
 URL: https://issues.apache.org/jira/browse/HBASE-10302
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block

2014-01-08 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866239#comment-13866239
 ] 

Liang Xie commented on HBASE-10263:
---

Integrated into trunk. Thanks all for review, thanks making the patch [~fenghh] 
:)
P.S. the release audit was not related with current jira, just checked new 
jira, should be HBASE-10302 [~eclark]


 make LruBlockCache single/multi/in-memory ratio user-configurable and provide 
 preemptive mode for in-memory type block
 --

 Key: HBASE-10263
 URL: https://issues.apache.org/jira/browse/HBASE-10263
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, 
 HBASE-10263-trunk_v2.patch


 currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 
 1:2:1, which can lead to somewhat counter-intuition behavior for some user 
 scenario where in-memory table's read performance is much worse than ordinary 
 table when two tables' data size is almost equal and larger than 
 regionserver's cache size (we ever did some such experiment and verified that 
 in-memory table random read performance is two times worse than ordinary 
 table).
 this patch fixes above issue and provides:
 1. make single/multi/in-memory ratio user-configurable
 2. provide a configurable switch which can make in-memory block preemptive, 
 by preemptive means when this switch is on in-memory block can kick out any 
 ordinary block to make room until no ordinary block, when this switch is off 
 (by default) the behavior is the same as previous, using 
 single/multi/in-memory ratio to determine evicting.
 by default, above two changes are both off and the behavior keeps the same as 
 before applying this patch. it's client/user's choice to determine whether or 
 which behavior to use by enabling one of these two enhancements.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block

2014-01-08 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-10263:
--

   Resolution: Fixed
Fix Version/s: 0.99.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 make LruBlockCache single/multi/in-memory ratio user-configurable and provide 
 preemptive mode for in-memory type block
 --

 Key: HBASE-10263
 URL: https://issues.apache.org/jira/browse/HBASE-10263
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Feng Honghua
Assignee: Feng Honghua
 Fix For: 0.99.0

 Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, 
 HBASE-10263-trunk_v2.patch


 currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 
 1:2:1, which can lead to somewhat counter-intuition behavior for some user 
 scenario where in-memory table's read performance is much worse than ordinary 
 table when two tables' data size is almost equal and larger than 
 regionserver's cache size (we ever did some such experiment and verified that 
 in-memory table random read performance is two times worse than ordinary 
 table).
 this patch fixes above issue and provides:
 1. make single/multi/in-memory ratio user-configurable
 2. provide a configurable switch which can make in-memory block preemptive, 
 by preemptive means when this switch is on in-memory block can kick out any 
 ordinary block to make room until no ordinary block, when this switch is off 
 (by default) the behavior is the same as previous, using 
 single/multi/in-memory ratio to determine evicting.
 by default, above two changes are both off and the behavior keeps the same as 
 before applying this patch. it's client/user's choice to determine whether or 
 which behavior to use by enabling one of these two enhancements.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866250#comment-13866250
 ] 

Hadoop QA commented on HBASE-10301:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622082/10301-v2.txt
  against trunk revision .
  ATTACHMENT ID: 12622082

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8371//console

This message is automatically generated.

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10301-v1.txt, 10301-v2.txt, 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 

[jira] [Updated] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block

2014-01-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10263:
---


Honghua:
Minding filling in release notes ?

 make LruBlockCache single/multi/in-memory ratio user-configurable and provide 
 preemptive mode for in-memory type block
 --

 Key: HBASE-10263
 URL: https://issues.apache.org/jira/browse/HBASE-10263
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Feng Honghua
Assignee: Feng Honghua
 Fix For: 0.99.0

 Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, 
 HBASE-10263-trunk_v2.patch


 currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 
 1:2:1, which can lead to somewhat counter-intuition behavior for some user 
 scenario where in-memory table's read performance is much worse than ordinary 
 table when two tables' data size is almost equal and larger than 
 regionserver's cache size (we ever did some such experiment and verified that 
 in-memory table random read performance is two times worse than ordinary 
 table).
 this patch fixes above issue and provides:
 1. make single/multi/in-memory ratio user-configurable
 2. provide a configurable switch which can make in-memory block preemptive, 
 by preemptive means when this switch is on in-memory block can kick out any 
 ordinary block to make room until no ordinary block, when this switch is off 
 (by default) the behavior is the same as previous, using 
 single/multi/in-memory ratio to determine evicting.
 by default, above two changes are both off and the behavior keeps the same as 
 before applying this patch. it's client/user's choice to determine whether or 
 which behavior to use by enabling one of these two enhancements.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently

2014-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866253#comment-13866253
 ] 

Ted Yu commented on HBASE-10301:


With patch v2, the test passed on QA:
{code} 
Running org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 42.036 sec
{code}

 TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
 ---

 Key: HBASE-10301
 URL: https://issues.apache.org/jira/browse/HBASE-10301
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: 10301-v1.txt, 10301-v2.txt, 
 testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html


 The test failure came from PreCommit build #8362
 {code}
 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=false
 ...
 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): 
 Offline 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's 
 not anymore on asf002.sp2.ygridcore.net,59479,1389170993670
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: The region 
 c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is 
 cancelled.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:662)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
   at 
 org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622)
 ...
 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No 
 previous transition plan found (or ignoring an existing plan) for 
 testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; 
 generated random 
 plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.,
  src=, dest=asf002.sp2.  ygridcore.net,59479,1389170993670; 4 (online=4, 
 available=4) available servers, forceNewPlan=true
 {code}
 The second call to getRegionPlan() returned the same server, thus leading to 
 assertion failure:
 {code}
   assertFalse(Region should assigned on a new region server,
 oldServerName.equals(serverName));
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >