[jira] [Commented] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs
[ https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865186#comment-13865186 ] Hadoop QA commented on HBASE-9846: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621925/HBASE-9846_5.patch against trunk revision . ATTACHMENT ID: 12621925 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 34 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8360//console This message is automatically generated. Integration test and LoadTestTool support for cell ACLs --- Key: HBASE-9846 URL: https://issues.apache.org/jira/browse/HBASE-9846 Project: HBase Issue Type: Sub-task Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, HBASE-9846_5.patch, HBASE-9846_5.patch Cell level ACLs should have an integration test and LoadTestTool support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865189#comment-13865189 ] Anoop Sam John edited comment on HBASE-10292 at 1/8/14 8:14 AM: Oh sorry for the confusion. I was not saying about retry in this test . What I was refering CoprocessorHost#handleCoprocessorThrowable {code} if (e instanceof IOException) { throw (IOException)e; } // If we got here, e is not an IOException. A loaded coprocessor has a // fatal bug, and the server (master or regionserver) should remove the // faulty coprocessor from its set of active coprocessors. Setting // 'hbase.coprocessor.abortonerror' to true will cause abortServer(), // which may be useful in development and testing environments where // 'failing fast' for error analysis is desired. if (env.getConfiguration().getBoolean(hbase.coprocessor.abortonerror,false)) { // server is configured to abort. abortServer(env, e); } else { LOG.error(Removing coprocessor ' + env.toString() + ' from + environment because it threw: + e,e); coprocessors.remove(env); try { shutdown(env); } catch (Exception x) { LOG.error(Uncaught exception when shutting down coprocessor ' + env.toString() + ', x); } throw new DoNotRetryIOException(Coprocessor: ' + env.toString() + ' threw: ' + e + ' and has been removed from the active + coprocessor set., e); } {code} Was wondering why we are not throwing back DNRIOE to client when RS also aborts. May be that is correct and intended. was (Author: anoop.hbase): Oh sorry for the confusion. I was not saying about retry in this test . What I was refering CoprocessorHost#handleCoprocessorThrowable {code} if (e instanceof IOException) { throw (IOException)e; } // If we got here, e is not an IOException. A loaded coprocessor has a // fatal bug, and the server (master or regionserver) should remove the // faulty coprocessor from its set of active coprocessors. Setting // 'hbase.coprocessor.abortonerror' to true will cause abortServer(), // which may be useful in development and testing environments where // 'failing fast' for error analysis is desired. if (env.getConfiguration().getBoolean(hbase.coprocessor.abortonerror,false)) { // server is configured to abort. abortServer(env, e); } else { LOG.error(Removing coprocessor ' + env.toString() + ' from + environment because it threw: + e,e); coprocessors.remove(env); try { shutdown(env); } catch (Exception x) { LOG.error(Uncaught exception when shutting down coprocessor ' + env.toString() + ', x); } throw new DoNotRetryIOException(Coprocessor: ' + env.toString() + ' threw: ' + e + ' and has been removed from the active + coprocessor set., e); } {code} Was wondering why we are not throwing back DNRIOE to client when RS also aborts. TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865189#comment-13865189 ] Anoop Sam John commented on HBASE-10292: Oh sorry for the confusion. I was not saying about retry in this test . What I was refering CoprocessorHost#handleCoprocessorThrowable {code} if (e instanceof IOException) { throw (IOException)e; } // If we got here, e is not an IOException. A loaded coprocessor has a // fatal bug, and the server (master or regionserver) should remove the // faulty coprocessor from its set of active coprocessors. Setting // 'hbase.coprocessor.abortonerror' to true will cause abortServer(), // which may be useful in development and testing environments where // 'failing fast' for error analysis is desired. if (env.getConfiguration().getBoolean(hbase.coprocessor.abortonerror,false)) { // server is configured to abort. abortServer(env, e); } else { LOG.error(Removing coprocessor ' + env.toString() + ' from + environment because it threw: + e,e); coprocessors.remove(env); try { shutdown(env); } catch (Exception x) { LOG.error(Uncaught exception when shutting down coprocessor ' + env.toString() + ', x); } throw new DoNotRetryIOException(Coprocessor: ' + env.toString() + ' threw: ' + e + ' and has been removed from the active + coprocessor set., e); } {code} Was wondering why we are not throwing back DNRIOE to client when RS also aborts. TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865191#comment-13865191 ] ramkrishna.s.vasudevan commented on HBASE-10292: LGTM. +1 TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention
[ https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865216#comment-13865216 ] Hadoop QA commented on HBASE-10156: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621934/10156v4.txt against trunk revision . ATTACHMENT ID: 12621934 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + * In this implementation, there is one HLog/WAL. All edits for all Regions are entered first in the HLog/WAL. Each + * HRegion is identified by a unique long codeint/code. HRegions do not need to declare themselves before using the + * HLog/WAL; they simply include their HRegion-id in the codeappend/code or codecompleteCacheFlush/code calls. + * pThis HLog/WAL implementation keeps multiple on-disk files kept in a chronological order. As data is flushed to + * other (better) on-disk structures (files sorted by key, hfiles), the log becomes obsolete. We can let go of all the + * log edits/entries for a given HRegion-id up to the most-recent CACHEFLUSH message from that HRegion. A bunch of work + * in the below is done keeping account of these region sequence ids -- what is flushed out to hfiles, and what is yet + * pIts only practical to delete entire files. Thus, we delete an entire on-disk file codeF/code when all of the + * edits in codeF/code have a log-sequence-id that's older (smaller) than the most-recent CACHEFLUSH message for + * pThis implementation performs logfile-rolling internal to the implementation, so external callers do not have to be {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.testContendedLogRolling(TestLogRollingNoCluster.java:74) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8363//console This message is automatically generated. Fix up the HBASE-8755 slowdown when low contention -- Key: HBASE-10156 URL: https://issues.apache.org/jira/browse/HBASE-10156 Project: HBase Issue Type: Sub-task Components: wal Reporter: stack
[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency
[ https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865230#comment-13865230 ] Steve Loughran commented on HBASE-10296: One aspect of ZK that is worth remembering is that it lets other apps keep an eye on what is going on Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency --- Key: HBASE-10296 URL: https://issues.apache.org/jira/browse/HBASE-10296 Project: HBase Issue Type: Brainstorming Components: master, Region Assignment, regionserver Reporter: Feng Honghua Currently master relies on ZK to elect active master, monitor liveness and store almost all of its states, such as region states, table info, replication info and so on. And zk also plays as a channel for master-regionserver communication(such as in region assigning) and client-regionserver communication(such as replication state/behavior change). But zk as a communication channel is fragile due to its one-time watch and asynchronous notification mechanism which together can leads to missed events(hence missed messages), for example the master must rely on the state transition logic's idempotence to maintain the region assigning state machine's correctness, actually almost all of the most tricky inconsistency issues can trace back their root cause to the fragility of zk as a communication channel. Replace zk with paxos running within master processes have following benefits: 1. better master failover performance: all master, either the active or the standby ones, have the same latest states in memory(except lag ones but which can eventually catch up later on). whenever the active master dies, the newly elected active master can immediately play its role without such failover work as building its in-memory states by consulting meta-table and zk. 2. better state consistency: master's in-memory states are the only truth about the system,which can eliminate inconsistency from the very beginning. and though the states are contained by all masters, paxos guarantees they are identical at any time. 3. more direct and simple communication pattern: client changes state by sending requests to master, master and regionserver talk directly to each other by sending request and response...all don't bother to using a third-party storage like zk which can introduce more uncertainty, worse latency and more complexity. 4. zk can only be used as liveness monitoring for determining if a regionserver is dead, and later on we can eliminate zk totally when we build heartbeat between master and regionserver. I know this might looks like a very crazy re-architect, but it deserves deep thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10297) LoadAndVerify Integration Test for cell visibility
[ https://issues.apache.org/jira/browse/HBASE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865214#comment-13865214 ] Hadoop QA commented on HBASE-10297: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621935/HBASE-10297_V2.patch against trunk revision . ATTACHMENT ID: 12621935 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster org.apache.hadoop.hbase.TestIOFencing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8362//console This message is automatically generated. LoadAndVerify Integration Test for cell visibility -- Key: HBASE-10297 URL: https://issues.apache.org/jira/browse/HBASE-10297 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10297.patch, HBASE-10297_V2.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs
[ https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-9846: -- Status: Open (was: Patch Available) Integration test and LoadTestTool support for cell ACLs --- Key: HBASE-9846 URL: https://issues.apache.org/jira/browse/HBASE-9846 Project: HBase Issue Type: Sub-task Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, HBASE-9846_5.patch, HBASE-9846_5.patch Cell level ACLs should have an integration test and LoadTestTool support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samir Ahmic updated HBASE-7386: --- Attachment: HBASE-7386-conf-v2.patch HBASE-7386-bin-v2.patch Here is summary of v2 patches: * Unnecessary comments removed * graceful_stop.sh modified to support case when supervisord is used (to reduce copy/paste), also script had issue with restoring balancer state that is now fixed * added option clean_znode in hbase-daemon.sh that calls cleanZNode(). This used by zk_cleaner.py listener script. * added zk_cleaner.py supervisord event listener which removes znode when regionserver crash and send mail notification about that event. Sending email is optional * i have verify that supervisor approach improves master failover in my testing this time is ~7s when using supervisor and when using standard scripts it is ~40s * since we have 'autorestart=true' in supervisord config if any process fails unexpectedly supervisor will restart it automatically Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865267#comment-13865267 ] Liang Xie commented on HBASE-10263: --- there're two +1 already. If no new comment/objection, i'd like to commit trunk_v2 into trunk tomorrow. make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block -- Key: HBASE-10263 URL: https://issues.apache.org/jira/browse/HBASE-10263 Project: HBase Issue Type: Improvement Components: io Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, HBASE-10263-trunk_v2.patch currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 1:2:1, which can lead to somewhat counter-intuition behavior for some user scenario where in-memory table's read performance is much worse than ordinary table when two tables' data size is almost equal and larger than regionserver's cache size (we ever did some such experiment and verified that in-memory table random read performance is two times worse than ordinary table). this patch fixes above issue and provides: 1. make single/multi/in-memory ratio user-configurable 2. provide a configurable switch which can make in-memory block preemptive, by preemptive means when this switch is on in-memory block can kick out any ordinary block to make room until no ordinary block, when this switch is off (by default) the behavior is the same as previous, using single/multi/in-memory ratio to determine evicting. by default, above two changes are both off and the behavior keeps the same as before applying this patch. it's client/user's choice to determine whether or which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865272#comment-13865272 ] Nicolas Liochon commented on HBASE-7386: bq. i have verify that supervisor approach improves master failover in my testing this time is ~7s when using supervisor and when using standard scripts it is ~40s This is strange. Do you know why? What is the test scenario? Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865289#comment-13865289 ] Hadoop QA commented on HBASE-10292: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621938/10292.patch against trunk revision . ATTACHMENT ID: 12621938 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8364//console This message is automatically generated. TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs
[ https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-9846: -- Attachment: HBASE-9846_6.patch This should be good to go. Integration test and LoadTestTool support for cell ACLs --- Key: HBASE-9846 URL: https://issues.apache.org/jira/browse/HBASE-9846 Project: HBase Issue Type: Sub-task Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch Cell level ACLs should have an integration test and LoadTestTool support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs
[ https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-9846: -- Status: Patch Available (was: Open) Integration test and LoadTestTool support for cell ACLs --- Key: HBASE-9846 URL: https://issues.apache.org/jira/browse/HBASE-9846 Project: HBase Issue Type: Sub-task Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch Cell level ACLs should have an integration test and LoadTestTool support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
graceful_stop.sh hung
Hi, all I restart a region server by using graceful_stop.sh (bin/graceful_stop.sh --restart --reload --debug hostname), when running a moment, the process hanging as follows: 2014-01-08 18:40:48,150 [main] INFO region_mover - Moving region 78c953d53f6498664d9a067701a7e7d7 (42 of 340) to server=inspur255.deu.edu.cn,60020,1388056934052 2014-01-08 18:40:50,097 [main] INFO region_mover - Moving region c621b3bf29262ca5248c03a8d6ebb41e (43 of 340) to server=inspur253.deu.edu.cn,60020,1388053123213 2014-01-08 18:40:51,652 [main] INFO region_mover - Moving region 4dad873a6af4d3a9809339281c3cb34c (44 of 340) to server=inspur254.deu.edu.cn,60020,1388054917364 2014-01-08 18:40:56,701 [main] INFO region_mover - Moving region 0e311941f5ff202bcefe57aa4079a188 (45 of 340) to server=inspur253.deu.edu.cn,60020,1388053123213 2014-01-08 18:40:58,632 [main] INFO region_mover - Moving region 3a99fb65caf32a05e18b8b6b93f8 (46 of 340) to server=inspur308.deu.edu.cn,60020,1388059770705 2014-01-08 18:41:02,127 [main] INFO region_mover - Moving region 34f3ef516fe6fba940eeb0902b9acd3d (47 of 340) to server=inspur254.deu.edu.cn,60020,1388054917364 2014-01-08 18:41:03,689 [main] INFO region_mover - Moving region ac201a56d80f13ca5357d474578a91c2 (48 of 340) to server=inspur308.deu.edu.cn,60020,1388059770705 2014-01-08 18:41:05,669 [main] INFO region_mover - Moving region 812912be704946d24c5f1b5e3184b2f5 (49 of 340) to server=inspur253.deu.edu.cn,60020,1388053123213 I run ‘du' command to check the last region , which has not any data. hadoop@inspur249:~/hbase$ hdfs dfs -du -s -h /hbase/test/812912be704946d24c5f1b5e3184b2f5/* 486 /hbase/test/812912be704946d24c5f1b5e3184b2f5/.regioninfo 0 /hbase/test/812912be704946d24c5f1b5e3184b2f5/body 0 /hbase/test/812912be704946d24c5f1b5e3184b2f5/meta hadoop version is cdh4.2.1 and hbase is 0.94 Thanks! Best Regards~ Xiyi
[jira] [Commented] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs
[ https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865408#comment-13865408 ] Hadoop QA commented on HBASE-9846: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621960/HBASE-9846_6.patch against trunk revision . ATTACHMENT ID: 12621960 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 40 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8365//console This message is automatically generated. Integration test and LoadTestTool support for cell ACLs --- Key: HBASE-9846 URL: https://issues.apache.org/jira/browse/HBASE-9846 Project: HBase Issue Type: Sub-task Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch Cell level ACLs should have an integration test and LoadTestTool support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865479#comment-13865479 ] Samir Ahmic commented on HBASE-7386: From what i could see it is all about removing master znode in zookeeper. In supervisor scenario master znode is deleted by autorestart and in standard scripts we don't delete master znode. Is master znode ephemeral ? It should be gone when master dies. Test scenario is very simple: * distrubuted cluster 0.96 * start master and backup master on different machines * date; kill -9 master and watch logs on backup master to see when it become active * i have also used python based script that watches '/hbase/master' znode and detect changes Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865513#comment-13865513 ] Nicolas Liochon commented on HBASE-7386: bq. standard scripts we don't delete master znode We should, that's what HBASE-5926 is about.It used to work for sure. It's better to delete it just after the server death, as the restart may never happen... Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865571#comment-13865571 ] Samir Ahmic commented on HBASE-7386: bq. It's better to delete it just after the server death, as the restart may never happen... Are you suggesting that i modify 'zk_cleaner.py' listener script to delete master znode when detects that master is in one of this states ('PROCESS_STATE_STOPPING', 'PROCESS_STATE_EXITED', 'PROCESS_STATE_UNKNOWN) ? I'm already doing this for regionservers so it should few lines of code. Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865594#comment-13865594 ] Nicolas Liochon commented on HBASE-7386: may be ;-). Just that if the process exit, we can clean the ZK node immediately, ideally w/o relying on a separate watchdog. What's the PROCESS_STATE_UNKNOWN? Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865644#comment-13865644 ] Samir Ahmic commented on HBASE-7386: bq. What's the PROCESS_STATE_UNKNOWN? According to supervisor documentation [http://supervisord.org/subprocess.html] The process is in an unknown state (supervisord programming error). Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865672#comment-13865672 ] Andrew Purtell commented on HBASE-10292: bq. Was wondering why we are not throwing back DNRIOE to client when RS also aborts Got it. I think the answer is this code predates DNRIOE. TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10292: --- Resolution: Fixed Fix Version/s: 0.99.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the reviews Stack, Anoop, and Ram. Committed to trunk and 0.98. I filed HBASE-10300 for followup. TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10300) Insure a throw of DoNotRetryIOException when a regionserver aborts
Andrew Purtell created HBASE-10300: -- Summary: Insure a throw of DoNotRetryIOException when a regionserver aborts Key: HBASE-10300 URL: https://issues.apache.org/jira/browse/HBASE-10300 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Fix For: 0.98.0, 0.99.0 As discussed on HBASE-10292, we may not be throwing DoNotRetryIOExceptions back to the client when aborting the server, especially when handling fatal coprocessor exceptions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8889) TestIOFencing#testFencingAroundCompaction occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865688#comment-13865688 ] Andrew Purtell commented on HBASE-8889: --- The code may be incorrect as indicated, but isn't there a larger issue also? Above, a file under compaction has a missing block. Blocks don't normally go missing in the test, so missing blocks suggest file deletion. And further above there is a warning about a missing file in another case. Might want to investigate why/how files can be pulled out from underneath compaction. TestIOFencing#testFencingAroundCompaction occasionally fails Key: HBASE-8889 URL: https://issues.apache.org/jira/browse/HBASE-8889 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Minor Attachments: TestIOFencing.tar.gz From https://builds.apache.org/job/PreCommit-HBASE-Build/6232//testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/ : {code} java.lang.AssertionError: Timed out waiting for new server to open region at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:269) at org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:205) {code} {code} 2013-07-06 23:13:53,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:54,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] hbase.TestIOFencing$CompactionBlockerRegion(102): allowing compactions 2013-07-06 23:13:55,121 INFO [pool-1-thread-1] hbase.HBaseTestingUtility(911): Shutting down minicluster 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] util.JVMClusterUtil(237): Shutting down HBase Cluster 2013-07-06 23:13:55,121 INFO [RS:0;asf002:39065-smallCompactions-1373152134716] regionserver.HStore(951): Starting compaction of 2 file(s) in family of tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. into tmpdir=hdfs://localhost:50140/user/jenkins/hbase/tabletest/6e62d3b24ea23160931362b60359ff03/.tmp, totalSize=108.4k ... 2013-07-06 23:13:55,155 INFO [RS:0;asf002:39065] regionserver.HRegionServer(2476): Received CLOSE for the region: 6e62d3b24ea23160931362b60359ff03 ,which we are already trying to CLOSE 2013-07-06 23:13:55,157 WARN [RS:0;asf002:39065] regionserver.HRegionServer(2414): Failed to close tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. - ignoring and continuing org.apache.hadoop.hbase.exceptions.NotServingRegionException: The region 6e62d3b24ea23160931362b60359ff03 was already closing. New CLOSE request is ignored. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2479) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2409) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2011) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:903) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:158) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:142) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:337) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.util.Methods.call(Methods.java:41) at org.apache.hadoop.hbase.security.User.call(User.java:420) at org.apache.hadoop.hbase.security.User.access$300(User.java:51) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:260) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:140) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10298) [0.98] TestIOFencing fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10298: --- Attachment: 10298.patch Attaching what I am going to commit to disable TestIOFencing for the 0.98.0 release. Will commit trivial test change using CTR shortly unless objection. We should make one of the other related issues a blocker for 1.0. [0.98] TestIOFencing fails occasionally --- Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10298) [0.98] TestIOFencing fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10298: --- Priority: Blocker (was: Major) Affects Version/s: 0.96.1.1 Fix Version/s: (was: 0.98.0) 1.0.0 0.99.0 0.98.1 Assignee: (was: Andrew Purtell) [0.98] TestIOFencing fails occasionally --- Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Priority: Blocker Fix For: 0.98.1, 0.99.0, 1.0.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10298) [0.98] TestIOFencing fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865701#comment-13865701 ] Andrew Purtell commented on HBASE-10298: Made this a blocker for 0.98.1 and 1.0.0. [0.98] TestIOFencing fails occasionally --- Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Priority: Blocker Fix For: 0.98.1, 0.99.0, 1.0.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8889) TestIOFencing#testFencingAroundCompaction occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8889: -- Priority: Blocker (was: Minor) Fix Version/s: 1.0.0 As Andrew indicated in HBASE-10298, marking this a blocker for 1.0 release. TestIOFencing#testFencingAroundCompaction occasionally fails Key: HBASE-8889 URL: https://issues.apache.org/jira/browse/HBASE-8889 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Blocker Fix For: 1.0.0 Attachments: TestIOFencing.tar.gz From https://builds.apache.org/job/PreCommit-HBASE-Build/6232//testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/ : {code} java.lang.AssertionError: Timed out waiting for new server to open region at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:269) at org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:205) {code} {code} 2013-07-06 23:13:53,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:54,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] hbase.TestIOFencing$CompactionBlockerRegion(102): allowing compactions 2013-07-06 23:13:55,121 INFO [pool-1-thread-1] hbase.HBaseTestingUtility(911): Shutting down minicluster 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] util.JVMClusterUtil(237): Shutting down HBase Cluster 2013-07-06 23:13:55,121 INFO [RS:0;asf002:39065-smallCompactions-1373152134716] regionserver.HStore(951): Starting compaction of 2 file(s) in family of tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. into tmpdir=hdfs://localhost:50140/user/jenkins/hbase/tabletest/6e62d3b24ea23160931362b60359ff03/.tmp, totalSize=108.4k ... 2013-07-06 23:13:55,155 INFO [RS:0;asf002:39065] regionserver.HRegionServer(2476): Received CLOSE for the region: 6e62d3b24ea23160931362b60359ff03 ,which we are already trying to CLOSE 2013-07-06 23:13:55,157 WARN [RS:0;asf002:39065] regionserver.HRegionServer(2414): Failed to close tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. - ignoring and continuing org.apache.hadoop.hbase.exceptions.NotServingRegionException: The region 6e62d3b24ea23160931362b60359ff03 was already closing. New CLOSE request is ignored. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2479) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2409) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2011) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:903) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:158) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:142) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:337) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.util.Methods.call(Methods.java:41) at org.apache.hadoop.hbase.security.User.call(User.java:420) at org.apache.hadoop.hbase.security.User.access$300(User.java:51) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:260) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:140) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10298) TestIOFencing reveals an unhandled race
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10298: --- Summary: TestIOFencing reveals an unhandled race (was: [0.98] TestIOFencing fails occasionally) TestIOFencing reveals an unhandled race --- Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Priority: Blocker Fix For: 0.98.1, 0.99.0, 1.0.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10298) TestIOFencing occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10298: --- Summary: TestIOFencing occasionally fails (was: TestIOFencing reveals an unhandled race) TestIOFencing occasionally fails Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Priority: Blocker Fix For: 0.98.1, 0.99.0, 1.0.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gustavo Anatoly updated HBASE-9948: --- Attachment: HBASE-9948.patch Ted, could you please review the patch? Thanks. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gustavo Anatoly updated HBASE-9948: --- Status: Patch Available (was: Open) HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10297) LoadAndVerify Integration Test for cell visibility
[ https://issues.apache.org/jira/browse/HBASE-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865720#comment-13865720 ] Andrew Purtell commented on HBASE-10297: +1 I don't think the test failures are related, but perhaps run HadoopQA one more time. The Javadoc for the test needs updating to describe what additional things this new integration test checks for. Can be done at commit time. LoadAndVerify Integration Test for cell visibility -- Key: HBASE-10297 URL: https://issues.apache.org/jira/browse/HBASE-10297 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10297.patch, HBASE-10297_V2.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865722#comment-13865722 ] Ted Yu commented on HBASE-9948: --- Thanks for the patch. {code} +} catch (DuplicatedSplitLogException dsle) { + LOG.warn(dsle.getMessage()); +} {code} Should the method return in the catch block ? There is no need to do metrics for the duplicate log split request. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9846) Integration test and LoadTestTool support for cell ACLs
[ https://issues.apache.org/jira/browse/HBASE-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865725#comment-13865725 ] Andrew Purtell commented on HBASE-9846: --- +1 Javadoc for the new integration test should be updated to describe what it checks for. Can be fixed up at commit time. Thanks a lot Ram! Integration test and LoadTestTool support for cell ACLs --- Key: HBASE-9846 URL: https://issues.apache.org/jira/browse/HBASE-9846 Project: HBase Issue Type: Sub-task Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0 Attachments: HBASE-9846.patch, HBASE-9846_1.patch, HBASE-9846_2.patch, HBASE-9846_3.patch, HBASE-9846_4.patch, HBASE-9846_5.patch, HBASE-9846_5.patch, HBASE-9846_6.patch Cell level ACLs should have an integration test and LoadTestTool support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10293) Master and RS GC logs can conflict when run on same host
[ https://issues.apache.org/jira/browse/HBASE-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865755#comment-13865755 ] Enis Soztutar commented on HBASE-10293: --- I guess we can do a similar thing to the log4j logs, where we will append the username and daemon name to the gc log. Master and RS GC logs can conflict when run on same host Key: HBASE-10293 URL: https://issues.apache.org/jira/browse/HBASE-10293 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.1.1 Reporter: Nick Dimiduk My issue manifests when I uncomment the line {{export SERVER_GC_OPTS=...}} in hbase-env.sh and start HBase. It's a single node in distributed mode, so both a Master and RegionServer are started on that host. Both start commands are run in the same minute, so only one gc.log-`date` file is created. `lsof` indicates two processes are writing to that file and the output of `ps` confirms they both received the same {{-Xloggc:/grid/0/var/log/hbase/gc.log-201401071515}} argument. Presumably, the same will happen for folks running the thrift and rest gateways on the same box (any java process itemized in the server_cmds array in bin/hbase). Related (the reason I discovered this issue in the first place), stopping the master process results in its gc.log being truncated. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865765#comment-13865765 ] Gustavo Anatoly commented on HBASE-9948: Hi, [~yuzhih...@gmail.com] You're right. I can change this block to: {code} +} catch (DuplicatedSplitLogException dsle) { + return; +} {code} Thanks for review. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8889) TestIOFencing#testFencingAroundCompaction occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865767#comment-13865767 ] Ted Yu commented on HBASE-8889: --- I am looping the test with the following change so that I get better idea on timing of the failure. {code} Index: hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java === --- hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java (revision 1556059) +++ hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java (working copy) @@ -294,7 +298,8 @@ // all the files we expect are still working when region is up in new location. FileSystem fs = newRegion.getFilesystem(); for (String f: newRegion.getStoreFileList(new byte [][] {FAMILY})) { -assertTrue(After compaction, does not exist: + f, fs.exists(new Path(f))); +assertTrue(After compaction, does not exist: + f + @ + System.currentTimeMillis(), + fs.exists(new Path(f))); } // If we survive the split keep going... // Now we make sure that the region isn't totally confused. Load up more rows. {code} TestIOFencing#testFencingAroundCompaction occasionally fails Key: HBASE-8889 URL: https://issues.apache.org/jira/browse/HBASE-8889 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Blocker Fix For: 1.0.0 Attachments: TestIOFencing.tar.gz From https://builds.apache.org/job/PreCommit-HBASE-Build/6232//testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/ : {code} java.lang.AssertionError: Timed out waiting for new server to open region at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:269) at org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:205) {code} {code} 2013-07-06 23:13:53,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:54,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] hbase.TestIOFencing$CompactionBlockerRegion(102): allowing compactions 2013-07-06 23:13:55,121 INFO [pool-1-thread-1] hbase.HBaseTestingUtility(911): Shutting down minicluster 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] util.JVMClusterUtil(237): Shutting down HBase Cluster 2013-07-06 23:13:55,121 INFO [RS:0;asf002:39065-smallCompactions-1373152134716] regionserver.HStore(951): Starting compaction of 2 file(s) in family of tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. into tmpdir=hdfs://localhost:50140/user/jenkins/hbase/tabletest/6e62d3b24ea23160931362b60359ff03/.tmp, totalSize=108.4k ... 2013-07-06 23:13:55,155 INFO [RS:0;asf002:39065] regionserver.HRegionServer(2476): Received CLOSE for the region: 6e62d3b24ea23160931362b60359ff03 ,which we are already trying to CLOSE 2013-07-06 23:13:55,157 WARN [RS:0;asf002:39065] regionserver.HRegionServer(2414): Failed to close tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. - ignoring and continuing org.apache.hadoop.hbase.exceptions.NotServingRegionException: The region 6e62d3b24ea23160931362b60359ff03 was already closing. New CLOSE request is ignored. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2479) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2409) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2011) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:903) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:158) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:142) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:337) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gustavo Anatoly updated HBASE-9948: --- Attachment: HBASE-9948-v2.patch HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865772#comment-13865772 ] Enis Soztutar commented on HBASE-10274: --- bq. I think it's better to fix it because our codebase is 0.94. Do you mind backporting the patch for HBASE-6820? We cannot commit this to 0.94 unless the backport is there. MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers --- Key: HBASE-10274 URL: https://issues.apache.org/jira/browse/HBASE-10274 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Priority: Minor Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch HBASE-6820 points out the problem but not fix completely. killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865797#comment-13865797 ] Jeffrey Zhong commented on HBASE-9948: -- We should check why TestRestartCluster has duplicate log split requests error at first place. The current patch doesn't work because SplitLog has to be a blocking call till the requested logs complete log splitting process otherwise region assignment could happen before a log splitting completes which will cause data loss. I'd suggest the fix can skip scheduling dup log splitting but wait for them to finish. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865800#comment-13865800 ] Hudson commented on HBASE-10292: SUCCESS: Integrated in HBase-TRUNK #4797 (See [https://builds.apache.org/job/HBase-TRUNK/4797/]) HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails occasionally (apurtell: rev 1556586) * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865806#comment-13865806 ] Hadoop QA commented on HBASE-9948: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622005/HBASE-9948.patch against trunk revision . ATTACHMENT ID: 12622005 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 1.3.9) to fail. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8366//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8366//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8366//console This message is automatically generated. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865848#comment-13865848 ] Gustavo Anatoly commented on HBASE-9948: Hi, [~jeffreyz]. I will follow your suggestions and really to avoid data loss the request splitting log process should be an atomic operation, so the best way is investigate the root causes of dup log. [~yuzhih...@gmail.com], How can I reproduce this scenario? Thank you [~jeffreyz]. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9948: -- Status: Open (was: Patch Available) HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865851#comment-13865851 ] Ted Yu commented on HBASE-9948: --- Let me search my computer to see if I have the test output. Meanwhile, you can loop TestRestartCluster and see if the duplicate message appears in test output. Thanks HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9948: -- Attachment: runtest.sh Script I use for looping tests. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865858#comment-13865858 ] Gustavo Anatoly commented on HBASE-9948: Thanks, [~yuzhih...@gmail.com] HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865859#comment-13865859 ] Hudson commented on HBASE-10298: SUCCESS: Integrated in HBase-0.98 #63 (See [https://builds.apache.org/job/HBase-0.98/63/]) HBASE-10298. TestIOFencing occasionally fails This flapping test produces low confidence results so temporarily disable it while tracking down the cause. (apurtell: rev 1556597) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java TestIOFencing occasionally fails Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Priority: Blocker Fix For: 0.98.1, 0.99.0, 1.0.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865860#comment-13865860 ] Hudson commented on HBASE-10292: SUCCESS: Integrated in HBase-0.98 #63 (See [https://builds.apache.org/job/HBase-0.98/63/]) HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails occasionally (apurtell: rev 1556590) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865864#comment-13865864 ] Ted Yu commented on HBASE-9948: --- Please modify the following line in the script to look for 'duplicate log split scheduled' : {code} grep NullPointerException hbase-server/target/surefire-reports/*${test[$j]%\#*}-output.txt {code} HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10300) Insure a throw of DoNotRetryIOException when a regionserver aborts
[ https://issues.apache.org/jira/browse/HBASE-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865866#comment-13865866 ] Devaraj Das commented on HBASE-10300: - My understanding was that DoNotRetryIOException is thrown for cases where there is no point retrying at a global level (for e.g., a table is disabled and the client shouldn't try to get data from its regions). When a regionserver aborts, and a client was connected to it before should retry it's operations to the failed over regionserver(s), no? Why should it be a DoNotRetryIOException.. Insure a throw of DoNotRetryIOException when a regionserver aborts -- Key: HBASE-10300 URL: https://issues.apache.org/jira/browse/HBASE-10300 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Fix For: 0.98.0, 0.99.0 As discussed on HBASE-10292, we may not be throwing DoNotRetryIOExceptions back to the client when aborting the server, especially when handling fatal coprocessor exceptions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865867#comment-13865867 ] Gustavo Anatoly commented on HBASE-9948: Okay HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:343) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:409) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1038) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:868) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:605) at java.lang.Thread.run(Thread.java:724) 2013-11-11 19:59:55,539 INFO [M:0;kiyo:36213] master.HMaster(2386): Aborting 2013-11-11 19:59:55,539 DEBUG [M:0;kiyo:36213] master.HMaster(1234): Stopping service threads {code} HMaster should handle duplicate log split requests, instead of aborting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8889) TestIOFencing#testFencingAroundCompaction occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8889: -- Attachment: TestIOFencing-#8362.tar.gz Log for another failure on QA build #8362 TestIOFencing#testFencingAroundCompaction occasionally fails Key: HBASE-8889 URL: https://issues.apache.org/jira/browse/HBASE-8889 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Blocker Fix For: 1.0.0 Attachments: TestIOFencing-#8362.tar.gz, TestIOFencing.tar.gz From https://builds.apache.org/job/PreCommit-HBASE-Build/6232//testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/ : {code} java.lang.AssertionError: Timed out waiting for new server to open region at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:269) at org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:205) {code} {code} 2013-07-06 23:13:53,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:54,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] hbase.TestIOFencing$CompactionBlockerRegion(102): allowing compactions 2013-07-06 23:13:55,121 INFO [pool-1-thread-1] hbase.HBaseTestingUtility(911): Shutting down minicluster 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] util.JVMClusterUtil(237): Shutting down HBase Cluster 2013-07-06 23:13:55,121 INFO [RS:0;asf002:39065-smallCompactions-1373152134716] regionserver.HStore(951): Starting compaction of 2 file(s) in family of tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. into tmpdir=hdfs://localhost:50140/user/jenkins/hbase/tabletest/6e62d3b24ea23160931362b60359ff03/.tmp, totalSize=108.4k ... 2013-07-06 23:13:55,155 INFO [RS:0;asf002:39065] regionserver.HRegionServer(2476): Received CLOSE for the region: 6e62d3b24ea23160931362b60359ff03 ,which we are already trying to CLOSE 2013-07-06 23:13:55,157 WARN [RS:0;asf002:39065] regionserver.HRegionServer(2414): Failed to close tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. - ignoring and continuing org.apache.hadoop.hbase.exceptions.NotServingRegionException: The region 6e62d3b24ea23160931362b60359ff03 was already closing. New CLOSE request is ignored. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2479) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2409) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2011) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:903) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:158) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:142) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:337) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.util.Methods.call(Methods.java:41) at org.apache.hadoop.hbase.security.User.call(User.java:420) at org.apache.hadoop.hbase.security.User.access$300(User.java:51) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:260) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:140) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9999) Add support for small reverse scan
[ https://issues.apache.org/jira/browse/HBASE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865895#comment-13865895 ] Ted Yu commented on HBASE-: --- Similar to ClientSmallScanner.java, ClientSmallReverseScanner.java can be added to facilitate small reverse scan. Add support for small reverse scan -- Key: HBASE- URL: https://issues.apache.org/jira/browse/HBASE- Project: HBase Issue Type: Improvement Reporter: Ted Yu HBASE-4811 adds feature of reverse scan. This JIRA adds the support for small reverse scan. This is activated when both 'reversed' and 'small' attributes are true in Scan Object -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865897#comment-13865897 ] Hudson commented on HBASE-10292: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #57 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/57/]) HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails occasionally (apurtell: rev 1556590) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865896#comment-13865896 ] Hudson commented on HBASE-10298: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #57 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/57/]) HBASE-10298. TestIOFencing occasionally fails This flapping test produces low confidence results so temporarily disable it while tracking down the cause. (apurtell: rev 1556597) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java TestIOFencing occasionally fails Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Priority: Blocker Fix For: 0.98.1, 0.99.0, 1.0.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9948) HMaster should handle duplicate log split requests
[ https://issues.apache.org/jira/browse/HBASE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865910#comment-13865910 ] Hadoop QA commented on HBASE-9948: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622017/HBASE-9948-v2.patch against trunk revision . ATTACHMENT ID: 12622017 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8367//console This message is automatically generated. HMaster should handle duplicate log split requests -- Key: HBASE-9948 URL: https://issues.apache.org/jira/browse/HBASE-9948 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Gustavo Anatoly Attachments: HBASE-9948-v2.patch, HBASE-9948.patch, runtest.sh I saw the following in test output for TestRestartCluster: {code} 2013-11-11 19:59:55,538 DEBUG [M:0;kiyo:36213] master.SplitLogManager(327): Scheduling batch of logs to split 2013-11-11 19:59:55,538 INFO [M:0;kiyo:36213] master.SplitLogManager(329): started splitting 1 logs in [hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting] 2013-11-11 19:59:55,538 WARN [M:0;kiyo:36213] master.SplitLogManager(1048): Failure because two threads can't wait for the same task; path=/hbase/splitWAL/WALs%2Fkk%2C44962%2C138410193-splitting%2Fkk%252C44962%252C138410193.138413702.meta 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2188): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2013-11-11 19:59:55,538 FATAL [M:0;kiyo:36213] master.HMaster(2193): Unhandled exception. Starting shutdown. java.io.IOException: duplicate log split scheduled for hdfs://localhost:46376/user/hortonzy/hbase/WALs/kk,44962,138410193-splitting/kk%2C44962%2C138410193.138413702.meta at
[jira] [Created] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
Ted Yu created HBASE-10301: -- Summary: TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Minor The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865946#comment-13865946 ] Ted Yu commented on HBASE-10301: Here is one way of making the test pass reliably: Randomly choose a server which is different from oldServerName and utilize this method in AssignmentManager: {code} boolean assign(final ServerName destination, final ListHRegionInfo regions) { {code} TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Minor The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10301: --- Attachment: testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Minor Attachments: testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865972#comment-13865972 ] Hudson commented on HBASE-10298: SUCCESS: Integrated in HBase-TRUNK #4798 (See [https://builds.apache.org/job/HBase-TRUNK/4798/]) HBASE-10298. TestIOFencing occasionally fails This flapping test produces low confidence results so temporarily disable it while tracking down the cause. (apurtell: rev 1556596) * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java TestIOFencing occasionally fails Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Priority: Blocker Fix For: 0.98.1, 0.99.0, 1.0.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10301: --- Attachment: 10301-v1.txt Patch v1 selects server other than oldServerName for reassignment. Also corrected grammar in assertion message. TestAssignmentManagerOnCluster passes locally. TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Minor Attachments: 10301-v1.txt, testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-10301: -- Assignee: Ted Yu TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10301-v1.txt, testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10301: --- Status: Patch Available (was: Open) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10301-v1.txt, testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10268: --- Status: Patch Available (was: Open) TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10268: --- Attachment: 10268.patch Add default test case timeouts. When waiting internally, uniformly use a timeout of 10s. TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10268: --- Attachment: 10268.patch TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10268: --- Attachment: (was: 10268.patch) TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10302) Fix rat check issues in hbase-native-client.
Elliott Clark created HBASE-10302: - Summary: Fix rat check issues in hbase-native-client. Key: HBASE-10302 URL: https://issues.apache.org/jira/browse/HBASE-10302 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866100#comment-13866100 ] chendihao commented on HBASE-10274: --- Backporting HBASE-6820 seems good for us. Thanks for considering. [~enis] Let's opening another issue to do that. MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers --- Key: HBASE-10274 URL: https://issues.apache.org/jira/browse/HBASE-10274 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Priority: Minor Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch HBASE-6820 points out the problem but not fix completely. killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866105#comment-13866105 ] Andrew Purtell commented on HBASE-10301: lgtm Maybe query the minicluster for the limit of the index you are using for getRegionServer instead of assuming 4? Please commit to 0.98 as well as trunk if you like. TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10301-v1.txt, testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866107#comment-13866107 ] Andrew Purtell commented on HBASE-10298: I see Ted made HBASE-8889 a blocker for 1.0 so I will change the scope of this JIRA for just the test disable change. TestIOFencing occasionally fails Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Priority: Blocker Fix For: 0.98.1, 0.99.0, 1.0.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10300) Insure a throw of DoNotRetryIOException when a regionserver aborts
[ https://issues.apache.org/jira/browse/HBASE-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866106#comment-13866106 ] Andrew Purtell commented on HBASE-10300: Ping [~anoop.hbase] Insure a throw of DoNotRetryIOException when a regionserver aborts -- Key: HBASE-10300 URL: https://issues.apache.org/jira/browse/HBASE-10300 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Fix For: 0.98.0, 0.99.0 As discussed on HBASE-10292, we may not be throwing DoNotRetryIOExceptions back to the client when aborting the server, especially when handling fatal coprocessor exceptions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10298) TestIOFencing occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10298: --- Priority: Major (was: Blocker) Fix Version/s: (was: 1.0.0) (was: 0.98.1) 0.98.0 Assignee: Andrew Purtell TestIOFencing occasionally fails Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10298) TestIOFencing occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10298. Resolution: Fixed Patch disabling this test for now committed to trunk and 0.98, resolving TestIOFencing occasionally fails Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866110#comment-13866110 ] Hadoop QA commented on HBASE-10301: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622057/10301-v1.txt against trunk revision . ATTACHMENT ID: 12622057 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8368//console This message is automatically generated. TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10301-v1.txt, testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at
[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency
[ https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866125#comment-13866125 ] Andrew Purtell commented on HBASE-10296: Use of ZK has issues but what we had before was much worse. We had heartbeating and partially desynchronized state in a bunch of places. Rather than implement our own consensus protocol we used the specialist component ZK. Engineering distributed consensus protocols is a long term endeavor full of corner cases and hard to debug problems. It is worth consideration, but maybe only as a last resort. Does something about our use of ZK or ZK itself have fatal issues? Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency --- Key: HBASE-10296 URL: https://issues.apache.org/jira/browse/HBASE-10296 Project: HBase Issue Type: Brainstorming Components: master, Region Assignment, regionserver Reporter: Feng Honghua Currently master relies on ZK to elect active master, monitor liveness and store almost all of its states, such as region states, table info, replication info and so on. And zk also plays as a channel for master-regionserver communication(such as in region assigning) and client-regionserver communication(such as replication state/behavior change). But zk as a communication channel is fragile due to its one-time watch and asynchronous notification mechanism which together can leads to missed events(hence missed messages), for example the master must rely on the state transition logic's idempotence to maintain the region assigning state machine's correctness, actually almost all of the most tricky inconsistency issues can trace back their root cause to the fragility of zk as a communication channel. Replace zk with paxos running within master processes have following benefits: 1. better master failover performance: all master, either the active or the standby ones, have the same latest states in memory(except lag ones but which can eventually catch up later on). whenever the active master dies, the newly elected active master can immediately play its role without such failover work as building its in-memory states by consulting meta-table and zk. 2. better state consistency: master's in-memory states are the only truth about the system,which can eliminate inconsistency from the very beginning. and though the states are contained by all masters, paxos guarantees they are identical at any time. 3. more direct and simple communication pattern: client changes state by sending requests to master, master and regionserver talk directly to each other by sending request and response...all don't bother to using a third-party storage like zk which can introduce more uncertainty, worse latency and more complexity. 4. zk can only be used as liveness monitoring for determining if a regionserver is dead, and later on we can eliminate zk totally when we build heartbeat between master and regionserver. I know this might looks like a very crazy re-architect, but it deserves deep thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10293) Master and RS GC logs can conflict when run on same host
[ https://issues.apache.org/jira/browse/HBASE-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866128#comment-13866128 ] Andrew Purtell commented on HBASE-10293: I run multiple servers on the same box. Each has their own config directory with their own hbase-env.sh. This is trivial to do with Puppet or pick your favorite configuration management tool. Kind of a non issue? Master and RS GC logs can conflict when run on same host Key: HBASE-10293 URL: https://issues.apache.org/jira/browse/HBASE-10293 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.1.1 Reporter: Nick Dimiduk My issue manifests when I uncomment the line {{export SERVER_GC_OPTS=...}} in hbase-env.sh and start HBase. It's a single node in distributed mode, so both a Master and RegionServer are started on that host. Both start commands are run in the same minute, so only one gc.log-`date` file is created. `lsof` indicates two processes are writing to that file and the output of `ps` confirms they both received the same {{-Xloggc:/grid/0/var/log/hbase/gc.log-201401071515}} argument. Presumably, the same will happen for folks running the thrift and rest gateways on the same box (any java process itemized in the server_cmds array in bin/hbase). Related (the reason I discovered this issue in the first place), stopping the master process results in its gc.log being truncated. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866132#comment-13866132 ] Andrew Purtell commented on HBASE-10268: I could get this test to fail within a few iterations on one box. Now 25 have passed in succession with the attached patch. Will continue out to 100 iterations. If they all pass and HadoopQA provides a good result here, I am going to commit this test only fix. TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866135#comment-13866135 ] Jeffrey Zhong commented on HBASE-10301: --- The fix looks good to me. You may want to change {code} + for (int i = 0; i 4; i++) { +HRegionServer destServer = TEST_UTIL.getHBaseCluster().getRegionServer(i); {code} to the following avoiding hard code 4. {code} for (RegionServerThread rst : cluster.getLiveRegionServerThreads()) { HRegionServer hrs = rst.getRegionServer(); {code} TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10301-v1.txt, testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866150#comment-13866150 ] Hudson commented on HBASE-10292: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #46 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/46/]) HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails occasionally (apurtell: rev 1556586) * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java TestRegionServerCoprocessorExceptionWithAbort fails occasionally Key: HBASE-10292 URL: https://issues.apache.org/jira/browse/HBASE-10292 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10292.patch, 10292.patch TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a very long time now. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10298) TestIOFencing occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866149#comment-13866149 ] Hudson commented on HBASE-10298: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #46 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/46/]) HBASE-10298. TestIOFencing occasionally fails This flapping test produces low confidence results so temporarily disable it while tracking down the cause. (apurtell: rev 1556596) * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java TestIOFencing occasionally fails Key: HBASE-10298 URL: https://issues.apache.org/jira/browse/HBASE-10298 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1.1 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10298.patch I can reproduce this using JDK 6 on Ubuntu 13.10. {noformat} Running org.apache.hadoop.hbase.TestIOFencing Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 111.55 sec FAILURE! {noformat} No failure trace captured yet. Fix or disable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866173#comment-13866173 ] Lars Hofhansl commented on HBASE-10268: --- Should be good for 0.94 (and presumably 0.96) as well. TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866174#comment-13866174 ] Andrew Purtell commented on HBASE-10268: The test itself timed out on run #56. Trying again with 120s timeouts per test. Beyond that, this is going to need a deeper look. TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10303) Have snappy support properly documented would be helpful to hadoop and hbase users
Rural Hunter created HBASE-10303: Summary: Have snappy support properly documented would be helpful to hadoop and hbase users Key: HBASE-10303 URL: https://issues.apache.org/jira/browse/HBASE-10303 Project: HBase Issue Type: Task Components: documentation Reporter: Rural Hunter The currentl document for configuring snappy support(http://hbase.apache.org/book/snappy.compression.html) is not complete and it's a bit obscure. IMO, there are several improvments can be made: 1. Describe the relationship among hadoop,hbase,snappy. Is the snappy actually needed by hadoop hdfs or hbase itself? That's to make clear if you need to configure snappy support in hbase or hadoop. 2. It didn't mention the default hadoop binary package is compiled without snappy support and you need to compile it with snappy option manually. Actually it didn't work with any native libs on 64 bits OS as the libhadoop.so in the binary package is only for 32 bits OS(this of course is a hadoop issue not hbase. but it's good to mention it.). 3. In my experience, I actually need to install both snappy and hadoop-snappy. So the doc lack of the steps to install hadoop-snappy. 4. During my set up, I found difference where hadoop and hbase to pick up the native lib files. hadoop picks those files in ./lib while hbase picks in ./lib/[PLATFORM]. If it's correct, it can also be mentioned. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10293) Master and RS GC logs can conflict when run on same host
[ https://issues.apache.org/jira/browse/HBASE-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866180#comment-13866180 ] Nick Dimiduk commented on HBASE-10293: -- That's probably true for someone with production-level automation infrastructure. I'm thinking of the fellow who downloads a tarball, follows the instructions in the hbase-env.sh comments and is surprised by the outcome. FWIW, I don't think it's common to run each process out of it's own config directory. Likewise, I don't think it's common to set each process to log to it's own log directory either. Rather I tend to see /var/log/hbase containing all the HBase process logs for the machine. *shrug* I don't feel strongly about the issue, it just surprised me while I was setting up some performance infra recently. If you'd prefer to defer this kind of concern to the puppetiers, I guess resolve as not a problem. Master and RS GC logs can conflict when run on same host Key: HBASE-10293 URL: https://issues.apache.org/jira/browse/HBASE-10293 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.1.1 Reporter: Nick Dimiduk My issue manifests when I uncomment the line {{export SERVER_GC_OPTS=...}} in hbase-env.sh and start HBase. It's a single node in distributed mode, so both a Master and RegionServer are started on that host. Both start commands are run in the same minute, so only one gc.log-`date` file is created. `lsof` indicates two processes are writing to that file and the output of `ps` confirms they both received the same {{-Xloggc:/grid/0/var/log/hbase/gc.log-201401071515}} argument. Presumably, the same will happen for folks running the thrift and rest gateways on the same box (any java process itemized in the server_cmds array in bin/hbase). Related (the reason I discovered this issue in the first place), stopping the master process results in its gc.log being truncated. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866179#comment-13866179 ] Andrew Purtell commented on HBASE-10268: bq. Should be good for 0.94 (and presumably 0.96) as well. Sure [~lhofhansl]. And [~stack] made a comment up on dev@ about TestSplitLogWorker, I suspect he won't mind a test fix in 0.96 also. If I don't make things worse by introducing too short junit test timeouts (working on it) then I will commit this everywhere. TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10301: --- Attachment: 10301-v2.txt Patch v2 addresses Andy and Jeff's comments. TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10301-v1.txt, 10301-v2.txt, testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10293) Master and RS GC logs can conflict when run on same host
[ https://issues.apache.org/jira/browse/HBASE-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866183#comment-13866183 ] Andrew Purtell commented on HBASE-10293: bq. *shrug* I don't feel strongly about the issue Same here. :-) Master and RS GC logs can conflict when run on same host Key: HBASE-10293 URL: https://issues.apache.org/jira/browse/HBASE-10293 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.1.1 Reporter: Nick Dimiduk My issue manifests when I uncomment the line {{export SERVER_GC_OPTS=...}} in hbase-env.sh and start HBase. It's a single node in distributed mode, so both a Master and RegionServer are started on that host. Both start commands are run in the same minute, so only one gc.log-`date` file is created. `lsof` indicates two processes are writing to that file and the output of `ps` confirms they both received the same {{-Xloggc:/grid/0/var/log/hbase/gc.log-201401071515}} argument. Presumably, the same will happen for folks running the thrift and rest gateways on the same box (any java process itemized in the server_cmds array in bin/hbase). Related (the reason I discovered this issue in the first place), stopping the master process results in its gc.log being truncated. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866187#comment-13866187 ] Hadoop QA commented on HBASE-10268: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622072/10268.patch against trunk revision . ATTACHMENT ID: 12622072 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8369//console This message is automatically generated. TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10268) TestSplitLogWorker occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866211#comment-13866211 ] Hadoop QA commented on HBASE-10268: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622076/10268.patch against trunk revision . ATTACHMENT ID: 12622076 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8370//console This message is automatically generated. TestSplitLogWorker occasionally fails - Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10268.patch TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866226#comment-13866226 ] Enis Soztutar commented on HBASE-10274: --- bq. Do you mind backporting the patch for HBASE-6820 I meant do you want to do the backport : ) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers --- Key: HBASE-10274 URL: https://issues.apache.org/jira/browse/HBASE-10274 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Priority: Minor Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch HBASE-6820 points out the problem but not fix completely. killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10302) Fix rat check issues in hbase-native-client.
[ https://issues.apache.org/jira/browse/HBASE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866238#comment-13866238 ] Liang Xie commented on HBASE-10302: --- It was this, right ? :) !? hbase-native-client/cmake_modules/FindGTest.cmake !? hbase-native-client/cmake_modules/FindLibEv.cmake !? hbase-native-client/README.md !? hbase-native-client/src/rpc/CMakeLists.txt Lines that start with ? in the release audit report indicate files that do not have an Apache license header. Fix rat check issues in hbase-native-client. Key: HBASE-10302 URL: https://issues.apache.org/jira/browse/HBASE-10302 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866239#comment-13866239 ] Liang Xie commented on HBASE-10263: --- Integrated into trunk. Thanks all for review, thanks making the patch [~fenghh] :) P.S. the release audit was not related with current jira, just checked new jira, should be HBASE-10302 [~eclark] make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block -- Key: HBASE-10263 URL: https://issues.apache.org/jira/browse/HBASE-10263 Project: HBase Issue Type: Improvement Components: io Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, HBASE-10263-trunk_v2.patch currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 1:2:1, which can lead to somewhat counter-intuition behavior for some user scenario where in-memory table's read performance is much worse than ordinary table when two tables' data size is almost equal and larger than regionserver's cache size (we ever did some such experiment and verified that in-memory table random read performance is two times worse than ordinary table). this patch fixes above issue and provides: 1. make single/multi/in-memory ratio user-configurable 2. provide a configurable switch which can make in-memory block preemptive, by preemptive means when this switch is on in-memory block can kick out any ordinary block to make room until no ordinary block, when this switch is off (by default) the behavior is the same as previous, using single/multi/in-memory ratio to determine evicting. by default, above two changes are both off and the behavior keeps the same as before applying this patch. it's client/user's choice to determine whether or which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-10263: -- Resolution: Fixed Fix Version/s: 0.99.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block -- Key: HBASE-10263 URL: https://issues.apache.org/jira/browse/HBASE-10263 Project: HBase Issue Type: Improvement Components: io Reporter: Feng Honghua Assignee: Feng Honghua Fix For: 0.99.0 Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, HBASE-10263-trunk_v2.patch currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 1:2:1, which can lead to somewhat counter-intuition behavior for some user scenario where in-memory table's read performance is much worse than ordinary table when two tables' data size is almost equal and larger than regionserver's cache size (we ever did some such experiment and verified that in-memory table random read performance is two times worse than ordinary table). this patch fixes above issue and provides: 1. make single/multi/in-memory ratio user-configurable 2. provide a configurable switch which can make in-memory block preemptive, by preemptive means when this switch is on in-memory block can kick out any ordinary block to make room until no ordinary block, when this switch is off (by default) the behavior is the same as previous, using single/multi/in-memory ratio to determine evicting. by default, above two changes are both off and the behavior keeps the same as before applying this patch. it's client/user's choice to determine whether or which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866250#comment-13866250 ] Hadoop QA commented on HBASE-10301: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622082/10301-v2.txt against trunk revision . ATTACHMENT ID: 12622082 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8371//console This message is automatically generated. TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10301-v1.txt, 10301-v2.txt, testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at
[jira] [Updated] (HBASE-10263) make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block
[ https://issues.apache.org/jira/browse/HBASE-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10263: --- Honghua: Minding filling in release notes ? make LruBlockCache single/multi/in-memory ratio user-configurable and provide preemptive mode for in-memory type block -- Key: HBASE-10263 URL: https://issues.apache.org/jira/browse/HBASE-10263 Project: HBase Issue Type: Improvement Components: io Reporter: Feng Honghua Assignee: Feng Honghua Fix For: 0.99.0 Attachments: HBASE-10263-trunk_v0.patch, HBASE-10263-trunk_v1.patch, HBASE-10263-trunk_v2.patch currently the single/multi/in-memory ratio in LruBlockCache is hardcoded 1:2:1, which can lead to somewhat counter-intuition behavior for some user scenario where in-memory table's read performance is much worse than ordinary table when two tables' data size is almost equal and larger than regionserver's cache size (we ever did some such experiment and verified that in-memory table random read performance is two times worse than ordinary table). this patch fixes above issue and provides: 1. make single/multi/in-memory ratio user-configurable 2. provide a configurable switch which can make in-memory block preemptive, by preemptive means when this switch is on in-memory block can kick out any ordinary block to make room until no ordinary block, when this switch is off (by default) the behavior is the same as previous, using single/multi/in-memory ratio to determine evicting. by default, above two changes are both off and the behavior keeps the same as before applying this patch. it's client/user's choice to determine whether or which behavior to use by enabling one of these two enhancements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10301) TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866253#comment-13866253 ] Ted Yu commented on HBASE-10301: With patch v2, the test passed on QA: {code} Running org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 42.036 sec {code} TestAssignmentManagerOnCluster#testOpenCloseRacing fails intermittently --- Key: HBASE-10301 URL: https://issues.apache.org/jira/browse/HBASE-10301 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10301-v1.txt, 10301-v2.txt, testAssignmentManagerOnCluster.testOpenCloseRacing-8362.html The test failure came from PreCommit build #8362 {code} 2014-01-08 08:50:01,584 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=false ... 2014-01-08 08:50:01,908 DEBUG [Thread-415] master.AssignmentManager(1694): Offline testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., it's not anymore on asf002.sp2.ygridcore.net,59479,1389170993670 org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: The region c18ad6dfb0055258336e96a299f57263 was opening but not yet served. Opening is cancelled. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2553) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:3725) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19797) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:280) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1594) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1672) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1773) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testOpenCloseRacing(TestAssignmentManagerOnCluster.java:622) ... 2014-01-08 08:50:01,919 DEBUG [Thread-415] master.AssignmentManager(2181): No previous transition plan found (or ignoring an existing plan) for testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263.; generated random plan=hri=testOpenCloseRacing,A,1389171001573.c18ad6dfb0055258336e96a299f57263., src=, dest=asf002.sp2. ygridcore.net,59479,1389170993670; 4 (online=4, available=4) available servers, forceNewPlan=true {code} The second call to getRegionPlan() returned the same server, thus leading to assertion failure: {code} assertFalse(Region should assigned on a new region server, oldServerName.equals(serverName)); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)