date:20140108


[ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865186#comment-13865186
 ] 

Hadoop QA commented on HBASE-9846:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621925/HBASE-9846_5.patch
  against trunk revision .
  ATTACHMENT ID: 12621925

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 34 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8360//console

This message is automatically generated.

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865189#comment-13865189
 ] 

Anoop Sam John edited comment on HBASE-10292 at 1/8/14 8:14 AM:


Oh sorry for the confusion.  I was not saying about retry in this test .
What I was refering CoprocessorHost#handleCoprocessorThrowable
{code}
if (e instanceof IOException) {
  throw (IOException)e;
}
// If we got here, e is not an IOException. A loaded coprocessor has a
// fatal bug, and the server (master or regionserver) should remove the
// faulty coprocessor from its set of active coprocessors. Setting
// 'hbase.coprocessor.abortonerror' to true will cause abortServer(),
// which may be useful in development and testing environments where
// 'failing fast' for error analysis is desired.
if 
(env.getConfiguration().getBoolean(hbase.coprocessor.abortonerror,false)) {
  // server is configured to abort.
  abortServer(env, e);
} else {
  LOG.error(Removing coprocessor ' + env.toString() + ' from  +
  environment because it threw:   + e,e);
  coprocessors.remove(env);
  try {
shutdown(env);
  } catch (Exception x) {
LOG.error(Uncaught exception when shutting down coprocessor '
+ env.toString() + ', x);
  }
  throw new DoNotRetryIOException(Coprocessor: ' + env.toString() +
  ' threw: ' + e + ' and has been removed from the active  +
  coprocessor set., e);
}
{code}
Was wondering why we are not throwing back DNRIOE to client when RS also 
aborts.  May be that is correct and intended. 



was (Author: anoop.hbase):
Oh sorry for the confusion.  I was not saying about retry in this test .
What I was refering CoprocessorHost#handleCoprocessorThrowable
{code}
if (e instanceof IOException) {
  throw (IOException)e;
}
// If we got here, e is not an IOException. A loaded coprocessor has a
// fatal bug, and the server (master or regionserver) should remove the
// faulty coprocessor from its set of active coprocessors. Setting
// 'hbase.coprocessor.abortonerror' to true will cause abortServer(),
// which may be useful in development and testing environments where
// 'failing fast' for error analysis is desired.
if 
(env.getConfiguration().getBoolean(hbase.coprocessor.abortonerror,false)) {
  // server is configured to abort.
  abortServer(env, e);
} else {
  LOG.error(Removing coprocessor ' + env.toString() + ' from  +
  environment because it threw:   + e,e);
  coprocessors.remove(env);
  try {
shutdown(env);
  } catch (Exception x) {
LOG.error(Uncaught exception when shutting down coprocessor '
+ env.toString() + ', x);
  }
  throw new DoNotRetryIOException(Coprocessor: ' + env.toString() +
  ' threw: ' + e + ' and has been removed from the active  +
  coprocessor set., e);
}
{code}
Was wondering why we are not throwing back DNRIOE to client when RS also 
aborts. 


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-08 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865189#comment-13865189
 ] 

Anoop Sam John commented on HBASE-10292:


Oh sorry for the confusion.  I was not saying about retry in this test .
What I was refering CoprocessorHost#handleCoprocessorThrowable
{code}
if (e instanceof IOException) {
  throw (IOException)e;
}
// If we got here, e is not an IOException. A loaded coprocessor has a
// fatal bug, and the server (master or regionserver) should remove the
// faulty coprocessor from its set of active coprocessors. Setting
// 'hbase.coprocessor.abortonerror' to true will cause abortServer(),
// which may be useful in development and testing environments where
// 'failing fast' for error analysis is desired.
if 
(env.getConfiguration().getBoolean(hbase.coprocessor.abortonerror,false)) {
  // server is configured to abort.
  abortServer(env, e);
} else {
  LOG.error(Removing coprocessor ' + env.toString() + ' from  +
  environment because it threw:   + e,e);
  coprocessors.remove(env);
  try {
shutdown(env);
  } catch (Exception x) {
LOG.error(Uncaught exception when shutting down coprocessor '
+ env.toString() + ', x);
  }
  throw new DoNotRetryIOException(Coprocessor: ' + env.toString() +
  ' threw: ' + e + ' and has been removed from the active  +
  coprocessor set., e);
}
{code}
Was wondering why we are not throwing back DNRIOE to client when RS also 
aborts. 


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865191#comment-13865191
 ] 

ramkrishna.s.vasudevan commented on HBASE-10292:


LGTM. +1

 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention

[
https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865216#comment-13865216
]

Hadoop QA commented on HBASE-10156:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12621934/10156v4.txt
against trunk revision .
ATTACHMENT ID: 12621934

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 9 new
or modified tests.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop
1.1 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}. The applied patch generated 4 release
audit warnings (more than the trunk's current 0 warnings).

{color:red}-1 lineLengths{color}. The patch introduces the following lines
longer than 100:
+ * In this implementation, there is one HLog/WAL. All edits for all
Regions are entered first in the HLog/WAL. Each
+ * HRegion is identified by a unique long codeint/code. HRegions do not
need to declare themselves before using the
+ * HLog/WAL; they simply include their HRegion-id in the codeappend/code
or codecompleteCacheFlush/code calls.
+ * pThis HLog/WAL implementation keeps multiple on-disk files kept in a
chronological order. As data is flushed to
+ * other (better) on-disk structures (files sorted by key, hfiles), the log
becomes obsolete. We can let go of all the
+ * log edits/entries for a given HRegion-id up to the most-recent CACHEFLUSH
message from that HRegion. A bunch of work
+ * in the below is done keeping account of these region sequence ids -- what
is flushed out to hfiles, and what is yet
+ * pIts only practical to delete entire files. Thus, we delete an entire
on-disk file codeF/code when all of the
+ * edits in codeF/code have a log-sequence-id that's older (smaller) than
the most-recent CACHEFLUSH message for
+ * pThis implementation performs logfile-rolling internal to the
implementation, so external callers do not have to be

{color:red}-1 site{color}. The patch appears to cause mvn site goal to
fail.

{color:red}-1 core tests{color}. The patch failed these unit tests:

{color:red}-1 core zombie tests{color}. There are 1 zombie test(s):
at
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.testContendedLogRolling(TestLogRollingNoCluster.java:74)

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//testReport/
Release audit warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/8363//console

This message is automatically generated.

Fix up the HBASE-8755 slowdown when low contention
--

Key: HBASE-10156
URL: https://issues.apache.org/jira/browse/HBASE-10156
Project: HBase
Issue Type: Sub-task
Components: wal
Reporter: stack

[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency

2014-01-08 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865230#comment-13865230
 ] 

Steve Loughran commented on HBASE-10296:


One aspect of ZK that is worth remembering is that it lets other apps keep an 
eye on what is going on

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10297) LoadAndVerify Integration Test for cell visibility


[ 
https://issues.apache.org/jira/browse/HBASE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865214#comment-13865214
 ] 

Hadoop QA commented on HBASE-10297:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621935/HBASE-10297_V2.patch
  against trunk revision .
  ATTACHMENT ID: 12621935

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster
  org.apache.hadoop.hbase.TestIOFencing

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8362//console

This message is automatically generated.

 LoadAndVerify Integration Test for cell visibility
 --

 Key: HBASE-10297
 URL: https://issues.apache.org/jira/browse/HBASE-10297
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10297.patch, HBASE-10297_V2.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs


 [ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-9846:
--

Status: Open  (was: Patch Available)

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-7386) Investigate providing some supervisor support for znode deletion


 [ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samir Ahmic updated HBASE-7386:
---

Attachment: HBASE-7386-conf-v2.patch
HBASE-7386-bin-v2.patch

Here is summary of v2 patches:
* Unnecessary comments removed 
* graceful_stop.sh modified to support case when supervisord is used (to reduce 
copy/paste), also script had issue with restoring balancer state that is now 
fixed 
* added  option clean_znode in hbase-daemon.sh that calls cleanZNode(). This 
used by zk_cleaner.py listener script.
* added zk_cleaner.py supervisord event listener which removes znode when 
regionserver crash and send mail notification about that event.  Sending email 
is optional
* i have verify that supervisor approach improves master failover in my testing 
this time is ~7s when using supervisor and when using standard scripts it is 
~40s
* since we have 'autorestart=true' in supervisord config if any process fails 
unexpectedly supervisor will restart it automatically 
 
 

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block

2014-01-08 Thread Liang Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865267#comment-13865267
 ] 

Liang Xie commented on HBASE-10263:
---

there're two +1 already. If no new comment/objection,  i'd like to commit 
trunk_v2 into trunk tomorrow.

 make LruBlockCache single/multi/in-memory ratio user-configurable and provide 
 preemptive mode for in-memory type block
 --

 Key: HBASE-10263
 URL: https://issues.apache.org/jira/browse/HBASE-10263
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, 
 HBASE-10263-trunk_v2.patch


 currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 
 1:2:1, which can lead to somewhat counter-intuition behavior for some user 
 scenario where in-memory table's read performance is much worse than ordinary 
 table when two tables' data size is almost equal and larger than 
 regionserver's cache size (we ever did some such experiment and verified that 
 in-memory table random read performance is two times worse than ordinary 
 table).
 this patch fixes above issue and provides:
 1. make single/multi/in-memory ratio user-configurable
 2. provide a configurable switch which can make in-memory block preemptive, 
 by preemptive means when this switch is on in-memory block can kick out any 
 ordinary block to make room until no ordinary block, when this switch is off 
 (by default) the behavior is the same as previous, using 
 single/multi/in-memory ratio to determine evicting.
 by default, above two changes are both off and the behavior keeps the same as 
 before applying this patch. it's client/user's choice to determine whether or 
 which behavior to use by enabling one of these two enhancements.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865272#comment-13865272
 ] 

Nicolas Liochon commented on HBASE-7386:


bq. i have verify that supervisor approach improves master failover in my 
testing this time is ~7s when using supervisor and when using standard scripts 
it is ~40s

This is strange. Do you know why? What is the test scenario?

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865289#comment-13865289
 ] 

Hadoop QA commented on HBASE-10292:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621938/10292.patch
  against trunk revision .
  ATTACHMENT ID: 12621938

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8364//console

This message is automatically generated.

 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0

 Attachments: 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs


 [ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-9846:
--

Attachment: HBASE-9846_6.patch

This should be good to go.

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs


 [ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-9846:
--

Status: Patch Available  (was: Open)

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

graceful_stop.sh hung

2014-01-08 Thread hzwangxx

Hi, all
  I restart a region server by using graceful_stop.sh (bin/graceful_stop.sh 
--restart --reload --debug hostname), when running a moment, the process 
hanging as follows:

2014-01-08 18:40:48,150 [main] INFO  region_mover - Moving region 
78c953d53f6498664d9a067701a7e7d7 (42 of 340) to 
server=inspur255.deu.edu.cn,60020,1388056934052
2014-01-08 18:40:50,097 [main] INFO  region_mover - Moving region 
c621b3bf29262ca5248c03a8d6ebb41e (43 of 340) to 
server=inspur253.deu.edu.cn,60020,1388053123213
2014-01-08 18:40:51,652 [main] INFO  region_mover - Moving region 
4dad873a6af4d3a9809339281c3cb34c (44 of 340) to 
server=inspur254.deu.edu.cn,60020,1388054917364
2014-01-08 18:40:56,701 [main] INFO  region_mover - Moving region 
0e311941f5ff202bcefe57aa4079a188 (45 of 340) to 
server=inspur253.deu.edu.cn,60020,1388053123213
2014-01-08 18:40:58,632 [main] INFO  region_mover - Moving region 
3a99fb65caf32a05e18b8b6b93f8 (46 of 340) to 
server=inspur308.deu.edu.cn,60020,1388059770705
2014-01-08 18:41:02,127 [main] INFO  region_mover - Moving region 
34f3ef516fe6fba940eeb0902b9acd3d (47 of 340) to 
server=inspur254.deu.edu.cn,60020,1388054917364
2014-01-08 18:41:03,689 [main] INFO  region_mover - Moving region 
ac201a56d80f13ca5357d474578a91c2 (48 of 340) to 
server=inspur308.deu.edu.cn,60020,1388059770705
2014-01-08 18:41:05,669 [main] INFO  region_mover - Moving region 
812912be704946d24c5f1b5e3184b2f5 (49 of 340) to 
server=inspur253.deu.edu.cn,60020,1388053123213

  I run ‘du' command to check the last region , which  has not any data.
hadoop@inspur249:~/hbase$ hdfs dfs -du -s -h 
/hbase/test/812912be704946d24c5f1b5e3184b2f5/*
486  /hbase/test/812912be704946d24c5f1b5e3184b2f5/.regioninfo
0  /hbase/test/812912be704946d24c5f1b5e3184b2f5/body
0  /hbase/test/812912be704946d24c5f1b5e3184b2f5/meta

hadoop version is cdh4.2.1 and hbase is 0.94

Thanks!
Best Regards~
Xiyi

[jira] [Commented] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs


[ 
https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865408#comment-13865408
 ] 

Hadoop QA commented on HBASE-9846:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621960/HBASE-9846_6.patch
  against trunk revision .
  ATTACHMENT ID: 12621960

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 40 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8365//console

This message is automatically generated.

 Integration test and LoadTestTool support for cell ACLs
 ---

 Key: HBASE-9846
 URL: https://issues.apache.org/jira/browse/HBASE-9846
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.98.0

 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, 
 HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, 
 HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch


 Cell level ACLs should have an integration test and LoadTestTool support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion


[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865479#comment-13865479
 ] 

Samir Ahmic commented on HBASE-7386:


From what i could see it is all about removing master znode in zookeeper. In 
supervisor scenario master znode is deleted by autorestart and in standard 
scripts we don't delete master znode. Is master znode ephemeral ?  It should 
be gone when master dies. 
Test scenario is very simple:
* distrubuted cluster 0.96
* start master and backup master on different machines 
* date; kill -9 master and watch logs on backup master to see when it become 
active
* i have also used python based script that watches '/hbase/master' znode and 
detect changes



 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865513#comment-13865513
 ] 

Nicolas Liochon commented on HBASE-7386:


bq. standard scripts we don't delete master znode
We should, that's what HBASE-5926 is about.It used to work for sure.
It's better to delete it just after the server death, as the restart may never 
happen...


 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion


[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865571#comment-13865571
 ] 

Samir Ahmic commented on HBASE-7386:


bq. It's better to delete it just after the server death, as the restart may 
never happen...

Are you suggesting that i modify 'zk_cleaner.py' listener script to delete 
master znode when detects that master is in one of this states 
('PROCESS_STATE_STOPPING', 'PROCESS_STATE_EXITED', 'PROCESS_STATE_UNKNOWN) ? 
I'm already doing this for regionservers so it should few lines of code.

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-08 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865594#comment-13865594
 ] 

Nicolas Liochon commented on HBASE-7386:


may be ;-). Just that if the process exit, we can clean the ZK node 
immediately, ideally w/o relying on a separate watchdog. 
What's the PROCESS_STATE_UNKNOWN? 

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion