[jira] [Commented] (HBASE-21332) HBase scan with PageFilter cannot get all rows, non-edge region skiped
[ https://issues.apache.org/jira/browse/HBASE-21332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661820#comment-16661820 ] pddNick commented on HBASE-21332: - Thank u so much , it helps me a lot. '+PageFilter isn't a global filter which consider the cross regions case, it only consider the region level+'. This is the thing that matters. But{color:#d04437} it causes problem only when scanner crosses region.{color} {color:#33}Say now the scanner is in [111, 222) and there are plenty of rows to retrieve which are more than PAGE_SIZE.The scanner will get PAGE_SIZE rows and returns them to client , everything works just fine.{color} {color:#33}But when there are not enough rows left in [111, 222).The scanner will combine all rows of parallel scanner in region [222, 333) and [333, 444) and return them to client.That means the client will get (RowsLeftInFirstRegion + PAGE_SIZE + PAGE_SIZE).In this way the clinet skips region [222,333) while the last rowkey is in [333,444).I add some log to the UT and the output is exactly what i expect.{color} {color:#33}So the solution is to get PAGE_SIZE rows by ourselves every time we do scan, or to limit scan range in single region(this is what i do in my real life project).{color} > HBase scan with PageFilter cannot get all rows, non-edge region skiped > -- > > Key: HBASE-21332 > URL: https://issues.apache.org/jira/browse/HBASE-21332 > Project: HBase > Issue Type: Bug > Components: regionserver, scan >Affects Versions: 1.1.2 > Environment: * Server version:1.1.2.2.6.5.0-292, > revision=897822d4dd5956ca186974c10382e9094683fa29 > * 2 region servers > * 4 regions > * HBase client:1.3.1 > >Reporter: pddNick >Assignee: Zheng Hu >Priority: Minor > Attachments: HBaseTest.java, image-2018-10-17-21-14-25-354.png, > image-2018-10-17-21-15-23-439.png, image-2018-10-23-17-37-22-028.png > > > When using scan with pagefilter to get data from hbase, the scanner will > skip{color:#ff} 'non-edge'{color} regions.The code i use comes from the > book _HBase: Definitive Guide, Example 4.8, PageFilter example._ Difference > is i use scan with startRow and stopRow. > Say i have regions with start and end keys like \{'111', '222', '333', > '444'}, which means i have 3 regions \{111, 222}, \{222, 333}, \{333, 444} > and they are in different region servers. When scan with startRow '111' and > stopRow '444' , most data in region \{222, 333} will be skiped and won't be > returned by ResultScanner.Region \{111,222} or \{333,444} works just fine and > because region \{222,333} doesn't contain startRowkey or stopRowkey i call it > non-edge region. > Below is some explanation with log: > > {code:java} > // Here scanner works just fine in region {111,222}, it gets exactly > {pageSize} rows each time, which is 1000 > ... > 2018-10-17 21:25:57.810 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [213971861069] to [2179067497952422], sum [1000 : 64000], cost: > [77ms] > 2018-10-17 21:25:57.885 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [2179098921079755] to [21c2879280113661], sum [1000 : 65000], cost: > [75ms] > 2018-10-17 21:25:57.962 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [21c2899018774688] to [2203180876471552], sum [1000 : 66000], cost: > [77ms] > // Here scanner goes from region {111,222} to {222,333}. As you can see, the > scanner gets 2405 rows with stopRow '3373621463365126'.The scanner moves to > regin {333,444} too early and most data in {222,333} are skiped. > 2018-10-17 21:25:58.321 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [2203223414254308] to [3373621463365126], sum [2405 : 68405], cost: > [359ms] > // Now the scanner is in region {333,444}, everything works just fine > 2018-10-17 21:25:58.396 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [3373764408525604] to [33b3849714659525], sum [1000 : 69405], cost: > [74ms] > 2018-10-17 21:25:58.467 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results > from [33b3882378177107] to [33f5221377695765], sum [1000 : 70405], cost: > [71ms] > ...{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661819#comment-16661819 ] Hadoop QA commented on HBASE-21363: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 15s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 20s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 1s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 16s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 10s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21363 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945343/HBASE-21363-v5.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 5615fc1d5f9b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1f437ac221 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14841/testReport/ | | Max. process+thread count | 272 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14841/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Rewrit
[jira] [Commented] (HBASE-21215) Figure how to invoke hbck2; make it easy to find
[ https://issues.apache.org/jira/browse/HBASE-21215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661814#comment-16661814 ] Duo Zhang commented on HBASE-21215: --- This is the only one left for 2.1.1. > Figure how to invoke hbck2; make it easy to find > > > Key: HBASE-21215 > URL: https://issues.apache.org/jira/browse/HBASE-21215 > Project: HBase > Issue Type: Sub-task > Components: amv2, hbck2 >Reporter: stack >Assignee: Sean Busbey >Priority: Major > Fix For: 2.1.1 > > > In > https://docs.google.com/document/d/1Oun4G3M5fyrM0OxXcCKYF8td0KD7gJQjnU9Ad-2t-uk/edit#, > the doc on hbck2 'form', one item to figure is how to invoke hbck2. Related, > how to make it easy to find? [~busbey] has some ideas (posted in doc). This > issue is for implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20540) [umbrella] Hadoop 3 compatibility
[ https://issues.apache.org/jira/browse/HBASE-20540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-20540: -- Fix Version/s: 2.0.3 2.2.0 > [umbrella] Hadoop 3 compatibility > - > > Key: HBASE-20540 > URL: https://issues.apache.org/jira/browse/HBASE-20540 > Project: HBase > Issue Type: Umbrella >Reporter: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > > There are known issues about the hadoop 3 compatibility for hbase 2. But > hadoop 3 is still not production ready. So we will link the issues here and > once there is a production ready hadoop 3 release, we will fix these issues > soon and upgrade our dependencies on hadoop, and also update the support > matrix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20540) [umbrella] Hadoop 3 compatibility
[ https://issues.apache.org/jira/browse/HBASE-20540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-20540. --- Resolution: Fixed All sub tasks are done. Let's resolve. > [umbrella] Hadoop 3 compatibility > - > > Key: HBASE-20540 > URL: https://issues.apache.org/jira/browse/HBASE-20540 > Project: HBase > Issue Type: Umbrella >Reporter: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.1.1 > > > There are known issues about the hadoop 3 compatibility for hbase 2. But > hadoop 3 is still not production ready. So we will link the issues here and > once there is a production ready hadoop 3 release, we will fix these issues > soon and upgrade our dependencies on hadoop, and also update the support > matrix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21363: -- Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to branch-2.0+. Thanks [~allan163] for reviewing. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363-v5.patch, > HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21351) The force update thread may have race with PE worker when the procedure is rolling back
[ https://issues.apache.org/jira/browse/HBASE-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21351: -- Fix Version/s: (was: 2.0.3) (was: 2.1.1) > The force update thread may have race with PE worker when the procedure is > rolling back > --- > > Key: HBASE-21351 > URL: https://issues.apache.org/jira/browse/HBASE-21351 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0 > > > We will acquire the procExecutionLock for a procedure when force updating its > state to prevent race with PE worker, but this does not work then the > procedure is rolling back. > If a procedure is failed, we will mark the root procedure stack as FAILED, > and then start to rollback the whole procedure stack. We will pop every > procedure in the stack and try to rollback them. So we may change the state > of a procedure without holding its procExecutionLock when rolling back. > This means we may persist an intermediate state of a procedure and cause > corruption when loading procedures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated
[ https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21365: --- Fix Version/s: 3.0.0 > Throw exception when user put data with skip wal to a table which may be > replicated > --- > > Key: HBASE-21365 > URL: https://issues.apache.org/jira/browse/HBASE-21365 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 3.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-21365.master.001.patch > > > A real problem in our production cluster. A user point that his table's data > can't be replicate to the peer cluster. Then we start to debug the reason. We > checked the replication scope, checked the replication wal entry filter, and > check the namespace,tablecfs config. But didn't found any problem. We enabled > the RS's debug log to find the reason. Finally, we found use use put with > skip wal to write data. But it taked a long time... Our replication use wal > to replicate data. So the data can't be replicated to peer cluster. I thought > throw a exception may be better for user if the table's replication scope is > not 0. (as 0 means not replicated). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated
[ https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21365: --- Component/s: Client > Throw exception when user put data with skip wal to a table which may be > replicated > --- > > Key: HBASE-21365 > URL: https://issues.apache.org/jira/browse/HBASE-21365 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 3.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-21365.master.001.patch > > > A real problem in our production cluster. A user point that his table's data > can't be replicate to the peer cluster. Then we start to debug the reason. We > checked the replication scope, checked the replication wal entry filter, and > check the namespace,tablecfs config. But didn't found any problem. We enabled > the RS's debug log to find the reason. Finally, we found use use put with > skip wal to write data. But it taked a long time... Our replication use wal > to replicate data. So the data can't be replicated to peer cluster. I thought > throw a exception may be better for user if the table's replication scope is > not 0. (as 0 means not replicated). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated
[ https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21365: --- Affects Version/s: 3.0.0 > Throw exception when user put data with skip wal to a table which may be > replicated > --- > > Key: HBASE-21365 > URL: https://issues.apache.org/jira/browse/HBASE-21365 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-21365.master.001.patch > > > A real problem in our production cluster. A user point that his table's data > can't be replicate to the peer cluster. Then we start to debug the reason. We > checked the replication scope, checked the replication wal entry filter, and > check the namespace,tablecfs config. But didn't found any problem. We enabled > the RS's debug log to find the reason. Finally, we found use use put with > skip wal to write data. But it taked a long time... Our replication use wal > to replicate data. So the data can't be replicated to peer cluster. I thought > throw a exception may be better for user if the table's replication scope is > not 0. (as 0 means not replicated). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21376) Add some verbose log to MasterProcedureScheduler
[ https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661791#comment-16661791 ] Hadoop QA commented on HBASE-21376: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 56s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 57s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} hbase-server: The patch generated 0 new + 7 unchanged - 1 fixed = 7 total (was 8) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 4s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 0s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}162m 51s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}198m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21376 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945328/HBASE-21376.branch-2.0.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 8714fdd00ecf 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / 169e3bafc8 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14838/testReport/ | | Max. process+thread count | 4544 (vs. ulimit of 1) | | mod
[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated
[ https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21365: --- Attachment: HBASE-21365.master.001.patch > Throw exception when user put data with skip wal to a table which may be > replicated > --- > > Key: HBASE-21365 > URL: https://issues.apache.org/jira/browse/HBASE-21365 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Guanghao Zhang >Priority: Minor > Attachments: HBASE-21365.master.001.patch > > > A real problem in our production cluster. A user point that his table's data > can't be replicate to the peer cluster. Then we start to debug the reason. We > checked the replication scope, checked the replication wal entry filter, and > check the namespace,tablecfs config. But didn't found any problem. We enabled > the RS's debug log to find the reason. Finally, we found use use put with > skip wal to write data. But it taked a long time... Our replication use wal > to replicate data. So the data can't be replicated to peer cluster. I thought > throw a exception may be better for user if the table's replication scope is > not 0. (as 0 means not replicated). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated
[ https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21365: --- Assignee: Guanghao Zhang Status: Patch Available (was: Open) > Throw exception when user put data with skip wal to a table which may be > replicated > --- > > Key: HBASE-21365 > URL: https://issues.apache.org/jira/browse/HBASE-21365 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-21365.master.001.patch > > > A real problem in our production cluster. A user point that his table's data > can't be replicate to the peer cluster. Then we start to debug the reason. We > checked the replication scope, checked the replication wal entry filter, and > check the namespace,tablecfs config. But didn't found any problem. We enabled > the RS's debug log to find the reason. Finally, we found use use put with > skip wal to write data. But it taked a long time... Our replication use wal > to replicate data. So the data can't be replicated to peer cluster. I thought > throw a exception may be better for user if the table's replication scope is > not 0. (as 0 means not replicated). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated
[ https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21365: --- Hadoop Flags: Incompatible change > Throw exception when user put data with skip wal to a table which may be > replicated > --- > > Key: HBASE-21365 > URL: https://issues.apache.org/jira/browse/HBASE-21365 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-21365.master.001.patch > > > A real problem in our production cluster. A user point that his table's data > can't be replicate to the peer cluster. Then we start to debug the reason. We > checked the replication scope, checked the replication wal entry filter, and > check the namespace,tablecfs config. But didn't found any problem. We enabled > the RS's debug log to find the reason. Finally, we found use use put with > skip wal to write data. But it taked a long time... Our replication use wal > to replicate data. So the data can't be replicated to peer cluster. I thought > throw a exception may be better for user if the table's replication scope is > not 0. (as 0 means not replicated). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661788#comment-16661788 ] Hadoop QA commented on HBASE-21344: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 15s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 59s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 3s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 4s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 44s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}119m 36s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}158m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21344 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945333/HBASE-21344.branch-2.0.003.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux df63925f8666 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / 169e3bafc8 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14839/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14839/testReport/ | | Max. process+thread count | 4057
[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661783#comment-16661783 ] Hudson commented on HBASE-21349: Results for branch branch-2 [build #1435 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Cluster is going down but CatalogJanitor and Normalizer try to run and fail > noisely > --- > > Key: HBASE-21349 > URL: https://issues.apache.org/jira/browse/HBASE-21349 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: stack >Assignee: Xu Cang >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21349.master.002.patch, > HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, > HBASE-22349.master.001.patch > > > Shutting down can take a while. Meantime catalog janitor and or normalizer > (etc?) try to run and when they can't, they fail noisely. Looks bad: > {code} > 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: > Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; > onlineServers=51 > 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-19 21:25:54,507 ERROR > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to > normalize regions. > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240) > at > org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189) > at > org.apache.hadoop.hbase.master.HMaster.normalizeRegion
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661782#comment-16661782 ] Hudson commented on HBASE-21342: Results for branch branch-2 [build #1435 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication
[ https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661781#comment-16661781 ] Hudson commented on HBASE-21338: Results for branch branch-2 [build #1435 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > [balancer] If balancer is an ill-fit for cluster size, it gives little > indication > - > > Key: HBASE-21338 > URL: https://issues.apache.org/jira/browse/HBASE-21338 > Project: HBase > Issue Type: Sub-task > Components: Balancer, Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21338.master.001.patch > > > See parent issue. Running balancer on a cluster where the max steps was way > inadequate, the balancer gave little to no indication that it was > ill-configured. In fact, it only logged its starting and then that there was > nothing to do though the cluster was obviously out-of-whack. > Ideally the balancer would complain when say the maxSteps limit is a small > fraction of what the cluster's calculated max steps are, or it would notice > that the balancer is making little progress on an imbalanced cluster and > shout. Can we set balancer configs w/o having to restart Master? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21363: -- Attachment: HBASE-21363-v5.patch > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363-v5.patch, > HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661751#comment-16661751 ] Hudson commented on HBASE-20952: Results for branch HBASE-20952 [build #27 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup&restore. Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B&R doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661744#comment-16661744 ] Hudson commented on HBASE-21342: Results for branch branch-2.0 [build #1004 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication
[ https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661742#comment-16661742 ] Hudson commented on HBASE-21338: Results for branch branch-2.0 [build #1004 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > [balancer] If balancer is an ill-fit for cluster size, it gives little > indication > - > > Key: HBASE-21338 > URL: https://issues.apache.org/jira/browse/HBASE-21338 > Project: HBase > Issue Type: Sub-task > Components: Balancer, Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21338.master.001.patch > > > See parent issue. Running balancer on a cluster where the max steps was way > inadequate, the balancer gave little to no indication that it was > ill-configured. In fact, it only logged its starting and then that there was > nothing to do though the cluster was obviously out-of-whack. > Ideally the balancer would complain when say the maxSteps limit is a small > fraction of what the cluster's calculated max steps are, or it would notice > that the balancer is making little progress on an imbalanced cluster and > shout. Can we set balancer configs w/o having to restart Master? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661743#comment-16661743 ] Hudson commented on HBASE-21349: Results for branch branch-2.0 [build #1004 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Cluster is going down but CatalogJanitor and Normalizer try to run and fail > noisely > --- > > Key: HBASE-21349 > URL: https://issues.apache.org/jira/browse/HBASE-21349 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: stack >Assignee: Xu Cang >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21349.master.002.patch, > HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, > HBASE-22349.master.001.patch > > > Shutting down can take a while. Meantime catalog janitor and or normalizer > (etc?) try to run and when they can't, they fail noisely. Looks bad: > {code} > 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: > Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; > onlineServers=51 > 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-19 21:25:54,507 ERROR > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to > normalize regions. > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240) > at > org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189) > at > org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718) > at > org.ap
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661729#comment-16661729 ] Hadoop QA commented on HBASE-21363: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 33s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 32s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 36s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 41m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21363 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945335/HBASE-21363-v4.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux b207abd0da91 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 1f437ac221 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14840/testReport/ | | Max. process+thread count | 273 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14840/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated.
[jira] [Commented] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere
[ https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661720#comment-16661720 ] Hadoop QA commented on HBASE-21325: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 26s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 5s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 32s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 32s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 12s{color} | {color:red} hbase-server: The patch generated 3 new + 76 unchanged - 0 fixed = 79 total (was 76) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 15s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 26s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}133m 55s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}174m 13s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.replication.TestSyncReplicationRemoveRemoteWAL | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21325 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945315/HBASE-21325.master.003.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 3106f256c7e3 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1f437ac221 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | compile | https://builds.apache.org/job/PreCommit-HBASE-Build/14836/artifact/patchprocess/patch-compile-hbase-server.txt | | javac | https://builds.apache.org/job/PreCommit-HBASE-Build/14836/artifact/patchproces
[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication
[ https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661716#comment-16661716 ] Hudson commented on HBASE-21338: Results for branch branch-2.1 [build #522 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > [balancer] If balancer is an ill-fit for cluster size, it gives little > indication > - > > Key: HBASE-21338 > URL: https://issues.apache.org/jira/browse/HBASE-21338 > Project: HBase > Issue Type: Sub-task > Components: Balancer, Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21338.master.001.patch > > > See parent issue. Running balancer on a cluster where the max steps was way > inadequate, the balancer gave little to no indication that it was > ill-configured. In fact, it only logged its starting and then that there was > nothing to do though the cluster was obviously out-of-whack. > Ideally the balancer would complain when say the maxSteps limit is a small > fraction of what the cluster's calculated max steps are, or it would notice > that the balancer is making little progress on an imbalanced cluster and > shout. Can we set balancer configs w/o having to restart Master? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661718#comment-16661718 ] Hudson commented on HBASE-21342: Results for branch branch-2.1 [build #522 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661717#comment-16661717 ] Hudson commented on HBASE-21349: Results for branch branch-2.1 [build #522 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Cluster is going down but CatalogJanitor and Normalizer try to run and fail > noisely > --- > > Key: HBASE-21349 > URL: https://issues.apache.org/jira/browse/HBASE-21349 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: stack >Assignee: Xu Cang >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21349.master.002.patch, > HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, > HBASE-22349.master.001.patch > > > Shutting down can take a while. Meantime catalog janitor and or normalizer > (etc?) try to run and when they can't, they fail noisely. Looks bad: > {code} > 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: > Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; > onlineServers=51 > 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: > Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-19 21:25:54,507 ERROR > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to > normalize regions. > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240) > at > org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189) > at > org.apache.hadoop.hbase.master.HMaster.normal
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661708#comment-16661708 ] Allan Yang commented on HBASE-21363: {code} +this.partial = resetDelete ? false : other.partial; {code} Add a comment for this one The v4 patch looks great, +1 for it. You can add a comment while committing > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661707#comment-16661707 ] Hadoop QA commented on HBASE-21372: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 52s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} branch-2.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 35s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}176m 23s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}216m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 | | JIRA Issue | HBASE-21372 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945308/HBASE-21372.branch-2.1.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux d021c3937513 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.1 / d35f65f396 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14833/testReport/ | | Max. process+thread count | 4249 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://bui
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661695#comment-16661695 ] Duo Zhang commented on HBASE-21363: --- Add a UT to confirm that we will reset all the deleted flags when building holdingCleanupTracker even if the original tracker is partial. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21363: -- Attachment: HBASE-21363-v4.patch > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated HBASE-21344: -- Attachment: HBASE-21344.branch-2.0.003.patch > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch, > HBASE-21344.branch-2.0.003.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.po
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661668#comment-16661668 ] Duo Zhang commented on HBASE-21363: --- Talked with [~allan163] offline, there are still problems. I'm writing a UT now. Will be back soon. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661667#comment-16661667 ] stack commented on HBASE-21364: --- Thanks boys. > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661666#comment-16661666 ] stack commented on HBASE-21344: --- You need to add .branch-2.0. into the name of your patch [~an...@apache.org] Also, this is an important patch because the left-over start will prevent us getting to the wait-on-meta holding pattern. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) >
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661663#comment-16661663 ] Duo Zhang commented on HBASE-21254: --- Can do this later. > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21357) RS should abort if OOM in Reader thread
[ https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21357: --- Resolution: Fixed Fix Version/s: 1.4.9 Status: Resolved (was: Patch Available) > RS should abort if OOM in Reader thread > --- > > Key: HBASE-21357 > URL: https://issues.apache.org/jira/browse/HBASE-21357 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.8 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 1.4.9 > > Attachments: HBASE-21357.branch-1.001.patch, > HBASE-21357.branch-1.001.patch > > > It is a bit strange, we will abort the RS if OOM in Listener thread, > Responder thread and in CallRunner thread, only not in Reader thread... > We should abort RS if OOM happens in Reader thread, too. If not, the reader > thread exists because of OOM, and the selector closes. Later connection > select to this reader will be ignored > {code} > try { > if (key.isValid()) { > if (key.isAcceptable()) > doAccept(key); > } > } catch (IOException ignored) { > if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored); > } > {code} > Leaving the client (or Master and other RS)'s call wait until SocketTimeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21357) RS should abort if OOM in Reader thread
[ https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661659#comment-16661659 ] Allan Yang commented on HBASE-21357: Pushed to branch-1, thanks [~stack] for reviewing. > RS should abort if OOM in Reader thread > --- > > Key: HBASE-21357 > URL: https://issues.apache.org/jira/browse/HBASE-21357 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.8 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21357.branch-1.001.patch, > HBASE-21357.branch-1.001.patch > > > It is a bit strange, we will abort the RS if OOM in Listener thread, > Responder thread and in CallRunner thread, only not in Reader thread... > We should abort RS if OOM happens in Reader thread, too. If not, the reader > thread exists because of OOM, and the selector closes. Later connection > select to this reader will be ignored > {code} > try { > if (key.isValid()) { > if (key.isAcceptable()) > doAccept(key); > } > } catch (IOException ignored) { > if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored); > } > {code} > Leaving the client (or Master and other RS)'s call wait until SocketTimeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661658#comment-16661658 ] Hadoop QA commented on HBASE-21344: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HBASE-21344 does not apply to branch-2. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-21344 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945321/HBASE-21344-branch-2.0_v3.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14837/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAs
[jira] [Updated] (HBASE-21376) Add some verbose log to MasterProcedureScheduler
[ https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21376: --- Status: Patch Available (was: Open) > Add some verbose log to MasterProcedureScheduler > > > Key: HBASE-21376 > URL: https://issues.apache.org/jira/browse/HBASE-21376 > Project: HBase > Issue Type: Sub-task >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21376.branch-2.0.001.patch > > > As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the > critical one is already submitted in HBASE-21364 to branch-2.0 and > branch-2.1, but I also added some useful logs which need to commit to all > branches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21376) Add some verbose log to MasterProcedureScheduler
[ https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21376: --- Attachment: HBASE-21376.branch-2.0.001.patch > Add some verbose log to MasterProcedureScheduler > > > Key: HBASE-21376 > URL: https://issues.apache.org/jira/browse/HBASE-21376 > Project: HBase > Issue Type: Sub-task >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21376.branch-2.0.001.patch > > > As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the > critical one is already submitted in HBASE-21364 to branch-2.0 and > branch-2.1, but I also added some useful logs which need to commit to all > branches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21344: -- Fix Version/s: 2.0.3 > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.C
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21344: -- Status: Patch Available (was: Open) > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFutur
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661655#comment-16661655 ] stack commented on HBASE-21344: --- The change in HBaseTestingUtility is just formatting? Otherwise, patch seems good. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 2.0.3 > > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > j
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661653#comment-16661653 ] Allan Yang commented on HBASE-21254: But we can use public Entry tryLockEntry(long id, long time) instead, we can still block the forceUpdateExecutor thread if some procedure is stuck. > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661646#comment-16661646 ] Allan Yang edited comment on HBASE-21364 at 10/24/18 2:58 AM: -- The patch without verbose log has already committed to branch-2.0 and branch-2.1, thanks! Opened HBASE-21376 to commit the verbose log. was (Author: allan163): The patch without verbose log has already committed to branch-2.0 and branch-2.1, thanks! > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21364: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661646#comment-16661646 ] Allan Yang commented on HBASE-21364: The patch without verbose log has already committed to branch-2.0 and branch-2.1, thanks! > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21376) Add some verbose log to MasterProcedureScheduler
Allan Yang created HBASE-21376: -- Summary: Add some verbose log to MasterProcedureScheduler Key: HBASE-21376 URL: https://issues.apache.org/jira/browse/HBASE-21376 Project: HBase Issue Type: Sub-task Reporter: Allan Yang Assignee: Allan Yang As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the critical one is already submitted in HBASE-21364 to branch-2.0 and branch-2.1, but I also added some useful logs which need to commit to all branches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661644#comment-16661644 ] Duo Zhang commented on HBASE-21254: --- But we do need to review the implementation again. As I found that, the actual deletion for a root procedure is done later in CompletedProcedureCleaner, so a successful procedure could also block us from removing a file... > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661643#comment-16661643 ] Duo Zhang commented on HBASE-21254: --- No it will be executed in a separated thread. In the log rolling thread we will just schedule a task into the forceUpdateExecutor. > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661642#comment-16661642 ] Allan Yang commented on HBASE-21254: In forceUpdateProcedure, we will acquire the execution lock before force update: {code} private void forceUpdateProcedure(long procId) throws IOException { IdLock.Entry lockEntry = procExecutionLock.getLockEntry(procId); try { {code} We will wait forever here if the procedure is stuck. And IIRC, forceUpdateProcedure will be executed in the roll procedure log thread, which will stuck it too. > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661628#comment-16661628 ] Duo Zhang commented on HBASE-21254: --- Which lock? I do not get your point, about the 'wait for the lock when rolling log'... > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files
[ https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661625#comment-16661625 ] Allan Yang commented on HBASE-21254: I have a question for this one, what if a procedure is stuck there, and can't get the lock for it, will it stuck to wait for the lock when rolling log? > Need to find a way to limit the number of proc wal files > > > Key: HBASE-21254 > URL: https://issues.apache.org/jira/browse/HBASE-21254 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, > HBASE-21254-v3.patch, HBASE-21254.patch > > > For regionserver, we have a max wal file limitation, if we reach the > limitation, we will trigger a flush on specific regions so that we can delete > old wal files. But for proc wals, we do not have this mechanism, and it will > be worse after HBASE-21233, as if there is an old procedure which can not > make progress and do not persist its state, we need to keep the old proc wal > file for ever... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated HBASE-21344: -- Attachment: HBASE-21344-branch-2.0_v3.patch > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.conc
[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661612#comment-16661612 ] Duo Zhang commented on HBASE-21372: --- TRSP uses the same config to decide whether to give up retrying. > Set hbase.assignment.maximum.attempts to Long.MAX > - > > Key: HBASE-21372 > URL: https://issues.apache.org/jira/browse/HBASE-21372 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Attachments: HBASE-21372.branch-2.1.001.patch, > HBASE-21372.branch-2.1.001.patch > > > From parent issue, [~allan163] suggests that we not give up on assign unless > there a change -- an SCP triggers failure -- or at the extreme, an operator > intervenes. This jibes w/ how we're thinking about assign (or to put it > another way, we have no handling for the case where we exhaust retries). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661613#comment-16661613 ] Duo Zhang commented on HBASE-21363: --- Any other concerns? [~allan163]. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661611#comment-16661611 ] Hadoop QA commented on HBASE-21363: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 17s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 15s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 30s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 8s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 10s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21363 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945314/HBASE-21363-v3.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 1194f80ac5e4 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1f437ac221 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14835/testReport/ | | Max. process+thread count | 278 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14835/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Rewri
[jira] [Comment Edited] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661606#comment-16661606 ] Allan Yang edited comment on HBASE-21372 at 10/24/18 2:06 AM: -- +1 for it. TransitRegionStateProcedure is introduced in branch-2+ by [~Apache9], can [~Apache9] conform that in branch-2+, we have already retry forever for assignment? was (Author: allan163): +1 for it, TransitRegionStateProcedure is introduced in branch-2+ by [~Apache9], can [~Apache9] conform that in branch-2+, we have already retry forever for assignment? > Set hbase.assignment.maximum.attempts to Long.MAX > - > > Key: HBASE-21372 > URL: https://issues.apache.org/jira/browse/HBASE-21372 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Attachments: HBASE-21372.branch-2.1.001.patch, > HBASE-21372.branch-2.1.001.patch > > > From parent issue, [~allan163] suggests that we not give up on assign unless > there a change -- an SCP triggers failure -- or at the extreme, an operator > intervenes. This jibes w/ how we're thinking about assign (or to put it > another way, we have no handling for the case where we exhaust retries). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661606#comment-16661606 ] Allan Yang commented on HBASE-21372: +1 for it, TransitRegionStateProcedure is introduced in branch-2+ by [~Apache9], can [~Apache9] conform that in branch-2+, we have already retry forever for assignment? > Set hbase.assignment.maximum.attempts to Long.MAX > - > > Key: HBASE-21372 > URL: https://issues.apache.org/jira/browse/HBASE-21372 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Attachments: HBASE-21372.branch-2.1.001.patch, > HBASE-21372.branch-2.1.001.patch > > > From parent issue, [~allan163] suggests that we not give up on assign unless > there a change -- an SCP triggers failure -- or at the extreme, an operator > intervenes. This jibes w/ how we're thinking about assign (or to put it > another way, we have no handling for the case where we exhaust retries). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661593#comment-16661593 ] Allan Yang commented on HBASE-21364: The checkstyle error is not valid, we add too many comments in the method, so it exceeds 150 lines. Will commit the actual fix to branch-2.1 and branch-2.0, and open another issue to commit the verbose code to all branches as [~Apache9] said. > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"
[ https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661584#comment-16661584 ] Hadoop QA commented on HBASE-21373: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 0s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 49s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 22s{color} | {color:red} hbase-server: The patch generated 2 new + 37 unchanged - 0 fixed = 39 total (was 37) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 45s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 44s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 40s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}150m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.procedure.TestFailedProcCleanup | | | hadoop.hbase.mapreduce.TestLoadIncrementalHFiles | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 | | JIRA Issue | HBASE-21373 | | JIRA Patch URL | https://issues.apache.org/jira
[jira] [Created] (HBASE-21375) Forward port "HBASE-21364 Procedure holds the lock should put to front of the queue after restart"
Duo Zhang created HBASE-21375: - Summary: Forward port "HBASE-21364 Procedure holds the lock should put to front of the queue after restart" Key: HBASE-21375 URL: https://issues.apache.org/jira/browse/HBASE-21375 Project: HBase Issue Type: Sub-task Components: proc-v2 Reporter: Duo Zhang Assignee: Duo Zhang Fix For: 3.0.0, 2.2.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661580#comment-16661580 ] stack commented on HBASE-21344: --- Make new patch [~an...@apache.org]? Thanks. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util
[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661576#comment-16661576 ] Hadoop QA commented on HBASE-21371: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 9s{color} | {color:green} hbase-resource-bundle in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 12m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21371 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945311/HBASE-21371.master.001.patch | | Optional Tests | dupname asflicense javac javadoc unit xml | | uname | Linux 92319fb5b492 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1f437ac221 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14834/testReport/ | | Max. process+thread count | 87 (vs. ulimit of 1) | | modules | C: hbase-resource-bundle U: hbase-resource-bundle | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14834/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quo
[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661554#comment-16661554 ] Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:36 AM: - {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash). Anyways, a sub-task to increase the no. of Assign max attempt , will not let the call go in FAILED_OPEN path(I think). {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. was (Author: an...@apache.org): {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash). Anyways, a sub-task to increase the no. of Assign procedure , will not let the call go in FAILED_OPEN path(I think). {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashP
[jira] [Updated] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere
[ https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21325: --- Attachment: HBASE-21325.master.003.patch > Force to terminate regionserver when abort hang in somewhere > > > Key: HBASE-21325 > URL: https://issues.apache.org/jira/browse/HBASE-21325 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang >Assignee: Guanghao Zhang >Priority: Major > Attachments: HBASE-21325.master.001.patch, > HBASE-21325.master.001.patch, HBASE-21325.master.002.patch, > HBASE-21325.master.003.patch > > > When testing sync replication, I found that, if I transit the remote cluster > to DA, while the local cluster is still in A, the region server will hang > when shutdown. As the fsOk flag only test the local cluster(which is > reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is > broken(the remote wal directory is gone) so we will never succeed. And this > lead to an infinite wait inside waitOnAllRegionsToClose. > So I think here we should have an upper bound for the wait time in > waitOnAllRegionsToClose method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661570#comment-16661570 ] Duo Zhang commented on HBASE-21363: --- Add simple comments for the ProcedureWALFormat.load method. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661554#comment-16661554 ] Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:30 AM: - {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash). Anyways, a sub-task to increase the no. of Assign procedure , will not let the call go in FAILED_OPEN path(I think). {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. was (Author: an...@apache.org): {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash) {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.ja
[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21363: -- Attachment: HBASE-21363-v3.patch > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363-v3.patch, HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661554#comment-16661554 ] Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:24 AM: - {quote}What you doing here from patch?{quote} I'm just removing duplicate tableStateManager.start() (and keeping the tableStateManager.start() after checking meta is actually online). And the test in the patch is to check if we have OPENING state for meta also, still SCP can succeed , so we don't need to change state of meta znode to offline during FAILED_OPEN of assign procedure(this meta znode state will intern also help in avoiding IMP if meta is in transition due to Server crash) {quote} 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. was (Author: an...@apache.org): {quote}1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThre
[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HBASE-21371: Attachment: (was: HBASE-21371.001.patch) > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HBASE-21371: Status: Patch Available (was: Open) > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661560#comment-16661560 ] Wei-Chiu Chuang commented on HBASE-21371: - yes, this will need to be done wherever we ship the relevant jar. if we already have bouncycastle as a dependency shouldn't we have an entry for it though? I've checked the so called "Bouncy Castle License" is literally the same as the MIT License, character-by-character. Interestingly I found LICENSE.vm contains this hard coded license text: {quote}Bouncycastle is released under the MIT license (available above), and is Copyright (c) 2000 - 2006 The Legion Of The Bouncy Castle. {quote} Maybe there's a bug in the license checker plugin and it didn't find Bouncycastle before so you had to manually add this license text? > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HBASE-21371: Attachment: HBASE-21371.master.001.patch > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.001.patch, HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error
[ https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661558#comment-16661558 ] Wei-Chiu Chuang commented on HBASE-21371: - {code:java} $ mvn dependency:tree -Dhadoop.profile=3.0 -Dhadoop-three.version=3.3.0-SNAPSHOT{code} javax.activation is included indirectly from hadoop-common: {quote}[INFO] +- org.apache.hadoop:hadoop-common:jar:3.3.0-SNAPSHOT:compile [INFO] | +- javax.activation:javax.activation-api:jar:1.2.0:runtime {quote} bouncycastle is included in test jars only: {quote}[INFO] +- org.apache.hadoop:hadoop-minicluster:jar:3.3.0-SNAPSHOT:test [INFO] | - org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.3.0-SNAPSHOT:test [INFO] | +- org.bouncycastle:bcprov-jdk15on:jar:1.60:test [INFO] | - org.bouncycastle:bcpkix-jdk15on:jar:1.60:test {quote} the new jetty dependencies are included in test jars only too: {quote}[INFO] +- org.apache.hadoop:hadoop-minicluster:jar:3.3.0-SNAPSHOT:test [INFO] | +- org.apache.hadoop:hadoop-yarn-server-tests:test-jar:tests:3.3.0-SNAPSHOT:test [INFO] | | +- org.apache.hadoop:hadoop-yarn-server-nodemanager:jar:3.3.0-SNAPSHOT:test [INFO] | | | +- org.eclipse.jetty.websocket:javax-websocket-server-impl:jar:9.3.19.v20170502:test {quote} > Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license > error > -- > > Key: HBASE-21371 > URL: https://issues.apache.org/jira/browse/HBASE-21371 > Project: HBase > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HBASE-21371.001.patch, HBASE-21371.master.001.patch > > > Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional > licenses that break HBase's license check plugin. > CDDL/GPLv2+CE license > {quote}This product includes JavaBeans Activation Framework API jar licensed > under the CDDL/GPLv2+CE. > CDDL or GPL version 2 plus the Classpath Exception > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > javax.activation > javax.activation-api > 1.2.0 > maven central search > g:javax.activation AND a:javax.activation-api AND v:1.2.0 > project website > [http://java.net/all/javax.activation-api/] > project source > [https://github.com/javaee/activation/javax.activation-api] > {quote} > Bouncy Castle License > {quote}– > This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, > and CRMF APIs licensed under the Bouncy Castle Licence. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.bouncycastle > bcpkix-jdk15on > 1.60 > maven central search > g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60 > project website > [http://www.bouncycastle.org/java.html] > project source > [https://github.com/bcgit/bc-java] > – > {quote} > > And a long list of "Apache Software License - Version 2.0" licensed Jetty > dependencies like this: > {quote} > This product includes Jetty :: Servlet Annotations licensed under the Apache > Software License - Version 2.0. > ERROR: Please check this License for acceptability here: > [https://www.apache.org/legal/resolved] > If it is okay, then update the list named 'non_aggregate_fine' in the > LICENSE.vm file. > If it isn't okay, then revert the change that added the dependency. > More info on the dependency: > org.eclipse.jetty > jetty-annotations > 9.3.19.v20170502 > maven central search > g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502 > project website > [http://www.eclipse.org/jetty] > project source > [https://github.com/eclipse/jetty.project/jetty-annotations] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661554#comment-16661554 ] Ankit Singhal commented on HBASE-21344: --- {quote}1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? {quote} Yes, My bad, we don't need this particular change. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$ge
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661555#comment-16661555 ] Duo Zhang commented on HBASE-21363: --- OK, it is in ProcedureWALFormat... Let me update the patch. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661553#comment-16661553 ] Duo Zhang commented on HBASE-21363: --- [~allan163] Where is the code we call resetModified on the tracker in ProcedureWALFormatReader? I can find it. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661548#comment-16661548 ] stack commented on HBASE-21344: --- bq. Actually, I'm working against branch-2.0 only, here you can see tableStateManager is started 2 times, My bad. Indeed, 2.0 has this. 2.1 does not. What you doing here from patch? 1096 Optional optProc = this.procedureExecutor.getProcedures().stream() 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> (ServerCrashProcedure) o) 1098 .filter(s -> s.hasMetaTableRegion()).findAny(); You are reporting SCPs only if they have meta on them? Isn't this method more generic than just meta searches? > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661544#comment-16661544 ] Ankit Singhal commented on HBASE-21344: --- bq. You don't seem to be working against the tip of branch-2.0 or branch-2.1. You seem to be working in your own branch? Is that so? If so, startup has changed pretty radically since 2.0.0. Actually, I'm working against branch-2.0 only, here you can see tableStateManager is started 2 times, At this instance, we only wait for IMP( which will be ok during the first start after deploy) but not when there are SCPs. https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L929 TableStateManger is started after meta is actually online(which is correct). https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L958 > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at >
[jira] [Commented] (HBASE-21224) Handle compaction queue duplication
[ https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661536#comment-16661536 ] Hadoop QA commented on HBASE-21224: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 57s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 1s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 3s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}285m 41s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}325m 6s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestFromClientSide | | | hadoop.hbase.namespace.TestNamespaceAuditor | | | hadoop.hbase.client.TestSnapshotTemporaryDirectoryWithRegionReplicas | | | hadoop.hbase.replication.TestReplicationKillSlaveRS | | | hadoop.hbase.client.TestSnapshotDFSTemporaryDirectory | | | hadoop.hbase.quotas.TestSpaceQuotas | | | hadoop.hbase.regionserver.TestRegionReplicaFailover | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21224 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945266/HBASE-21224-master.004.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 9607297f8ebf 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 3b68e5393e | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661533#comment-16661533 ] stack commented on HBASE-21364: --- Ok. Thanks. Yeah, want to cut an RC0 if I can. Thanks for working on this. > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661528#comment-16661528 ] Duo Zhang commented on HBASE-21364: --- This is a critical problem so mark it as blocker for 2.1.1. And for the patch, I suggest that we split it into two piece. The verbose related code can be done in a separated issue, and can be committed to all branches. And the code for fixing the actual problem should be committed to branch-2.1 and branch-2.0, which should be done ASAP as we want to push out 2.1.1 now. Ping [~stack]. > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21364: -- Priority: Blocker (was: Major) > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Blocker > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart
[ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21364: -- Fix Version/s: 2.0.3 2.1.1 > Procedure holds the lock should put to front of the queue after restart > --- > > Key: HBASE-21364 > URL: https://issues.apache.org/jira/browse/HBASE-21364 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 2.1.1, 2.0.3 > > Attachments: HBASE-21364.branch-2.0.001.patch, > HBASE-21364.branch-2.0.002.patch > > > After restore the procedures form Procedure WALs. We will put the runable > procedures back to the queue to execute. The order is not the problem before > HBASE-20846 since the first one to execute will acquire the lock itself. But > since the locks will restored after HBASE-20846. If we execute a procedure > without the lock first before a procedure with the lock in the same queue, > there is a race condition that we may not be able to execute all procedures > in the same queue at all. > The race condtion is: > 1. A procedure need to take the table's exclusive lock was put into the > table's queue, but the table's shard lock was lock by a Region Procedure. > Since no one takes the exclusive lock, the queue is put to run queue to > execute. But soon, the worker thread see the procedure can't execute because > it doesn't hold the lock, so it will stop execute and remove the queue from > run queue. > 2. At the same time, the Region procedure which holds the table's shard lock > and the region's exclusive lock is put to the table's queue. But, since the > queue already added to the run queue, it won't add again. > 3. Since 1, the table's queue was removed from the run queue. > 4. Then, no one will put the table's queue back, thus no worker will execute > the procedures inside > A test case in the patch shows how. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
[ https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661525#comment-16661525 ] Duo Zhang commented on HBASE-21363: --- Oh shit. Let me check the code. I think this should be in 2.1. The patch is almost there. Thanks. > Rewrite the buildingHoldCleanupTracker method in WALProcedureStore > -- > > Key: HBASE-21363 > URL: https://issues.apache.org/jira/browse/HBASE-21363 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, > HBASE-21363.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation
[ https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661524#comment-16661524 ] stack commented on HBASE-20828: --- I need to write up what is in here. The subtasks have changed AMv2 for the better. Stuff like HBASE-21278 where now we do not try to rollback successful procedures but rather the parent needs to schedule compensatory, new Procedures needs evangelizing. Ditto the background task that is trying to limit our backlog of master proc wals TODO. > Finish-up AMv2 Design/List of Tenets/Specification of operation > --- > > Key: HBASE-20828 > URL: https://issues.apache.org/jira/browse/HBASE-20828 > Project: HBase > Issue Type: Umbrella > Components: amv2 >Reporter: stack >Priority: Major > > AMv2 is missing specification. There are too many grey-areas still. Also > missing are a concise listing of the tenets of AMv2 operation. Here are some > examples: > * HBASE-19529 "Handle null states in AM": Asks how we should treat null > state in hbase:meta. What does it 'mean'. We seem to treat it differently > dependent on context. Needs clarification. [~Apache9] recently asked similar > about the meaning of OFFLINE. > * Logging needs to have a particular form to help trace Procedure progress; > needs a write-up. > Lets fill in items to address in this umbrella issue. Can address in > subissues and produce specification doc too. We have the below but these are > mostly (incomplete) description for devs on pv2 and amv2; the specification > is missing: > http://hbase.apache.org/book.html#pv2 > http://hbase.apache.org/book.html#amv2 > (Other areas include addressing what is up w/ rollback -- when, how much, and > when it is not appropriate -- as well as recommendation on Procedures > coarseness, locking -- is it ok to lock table in alter table procedure for > the life of the procedure? -- and so on). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21372: -- Attachment: HBASE-21372.branch-2.1.001.patch > Set hbase.assignment.maximum.attempts to Long.MAX > - > > Key: HBASE-21372 > URL: https://issues.apache.org/jira/browse/HBASE-21372 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Attachments: HBASE-21372.branch-2.1.001.patch, > HBASE-21372.branch-2.1.001.patch > > > From parent issue, [~allan163] suggests that we not give up on assign unless > there a change -- an SCP triggers failure -- or at the extreme, an operator > intervenes. This jibes w/ how we're thinking about assign (or to put it > another way, we have no handling for the case where we exhaust retries). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX
[ https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661515#comment-16661515 ] Hadoop QA commented on HBASE-21372: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 15s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 13s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 58s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} branch-2.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 16s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 13m 20s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}210m 22s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}261m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 | | JIRA Issue | HBASE-21372 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945268/HBASE-21372.branch-2.1.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux cd3b360c9c27 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.1 / e29ce9f937 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14830/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14830/testReport/ | | Max. process+thread cou
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661513#comment-16661513 ] stack commented on HBASE-21344: --- [~an...@apache.org] You don't seem to be working against the tip of branch-2.0 or branch-2.1. You seem to be working in your own branch? Is that so? If so, startup has changed pretty radically since 2.0.0. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenCo
[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely
[ https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661486#comment-16661486 ] Hadoop QA commented on HBASE-21349: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 19s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 29s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 40s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}173m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21349 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945273/HBASE-21349.master.002.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux fc32ee6e94ae 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1e9d998727 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14831/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/jo
[jira] [Work stopped] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"
[ https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-21373 stopped by Xu Cang. --- > Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for > cluster size, it gives little indication" > - > > Key: HBASE-21373 > URL: https://issues.apache.org/jira/browse/HBASE-21373 > Project: HBase > Issue Type: Bug > Components: Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-21373.branch-1.001.patch > > > Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu > Cang. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"
[ https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-21373 started by Xu Cang. --- > Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for > cluster size, it gives little indication" > - > > Key: HBASE-21373 > URL: https://issues.apache.org/jira/browse/HBASE-21373 > Project: HBase > Issue Type: Bug > Components: Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-21373.branch-1.001.patch > > > Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu > Cang. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"
[ https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-21373: Attachment: HBASE-21373.branch-1.001.patch Status: Patch Available (was: Open) > Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for > cluster size, it gives little indication" > - > > Key: HBASE-21373 > URL: https://issues.apache.org/jira/browse/HBASE-21373 > Project: HBase > Issue Type: Bug > Components: Operability >Reporter: stack >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-21373.branch-1.001.patch > > > Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu > Cang. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21073) "Maintenance mode" master
[ https://issues.apache.org/jira/browse/HBASE-21073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661421#comment-16661421 ] Hudson commented on HBASE-21073: Results for branch branch-2.0 [build #1002 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > "Maintenance mode" master > - > > Key: HBASE-21073 > URL: https://issues.apache.org/jira/browse/HBASE-21073 > Project: HBase > Issue Type: Sub-task > Components: amv2, hbck2, master >Reporter: stack >Assignee: Mike Drob >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21073.branch-2.001.patch, > HBASE-21073.branch-2.1.001.patch, HBASE-21073.branch-2.1.002.patch, > HBASE-21073.master.001.patch, HBASE-21073.master.002.patch, > HBASE-21073.master.003.patch, HBASE-21073.master.004.patch, > HBASE-21073.master.005.patch, HBASE-21073.master.006.patch, > HBASE-21073.master.007.patch, HBASE-21073.master.008.patch, > HBASE-21073.master.009.patch, HBASE-21073.master.010.patch, > HBASE-21073.master.011.patch > > > Make it so we can bring up a Master in "maintenance mode". This is parse of > master wal procs but not taking on regionservers. It would be in a state > where "repair" Procedures could run; e.g. a Procedure that could recover meta > by looking for meta WALs, splitting them, dropping recovered.edits, and even > making it so meta is readable. See parent issue for why needed (disaster > recovery). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661395#comment-16661395 ] Ankit Singhal commented on HBASE-21344: --- bq. This should be happening already. We wait on meta assign. If SCPs, they'll run and recover meta if one of them was holding it. If no assign for meta in the procedure store, then something untoward and at least for now, operator needs to figure what happened until we fix the bug. Operator can schedule an assign with hbck2 bq. branch-2.0 will go into a holding pattern if hbase:meta is not assigned (ditto if hbase:namespace is not assigned) waiting on operator intevention to clear the lack-of-assign. Thanks [~stack] for the pointer, I didn't go down as the problem was started when we are starting tableStateManager without waiting for meta assignment by SCPs. I think we can just remove this from here as we already starting after waiting for meta to get online.(attached patch for the same) {code} if (initMetaProc != null) { initMetaProc.await(); } -tableStateManager.start(); {code} bq. That said, I see some value in this patch. In particular the bit around resetting hbase:meta state if failure. We shouldn't offline the meta if we are failing the assignment as it will start the InitMetaProcedure (which we don't want as SCP need to take care of recovering of Meta). > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7
[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated HBASE-21344: -- Attachment: HBASE-21344-branch-2.0_v2.patch > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch, > HBASE-21344-branch-2.0_v2.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.complete
[jira] [Resolved] (HBASE-21353) TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to HBCK2#checkHBCKSupport
[ https://issues.apache.org/jira/browse/HBASE-21353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-21353. --- Resolution: Fixed Assignee: stack Fix Version/s: hbck2-1.0.0 Pushed fix over on hbase-operator-tools/hbase-hbck2. > TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to > HBCK2#checkHBCKSupport > - > > Key: HBASE-21353 > URL: https://issues.apache.org/jira/browse/HBASE-21353 > Project: HBase > Issue Type: Test > Components: hbase-operator-tools, hbck2 >Reporter: Ted Yu >Assignee: stack >Priority: Major > Fix For: hbck2-1.0.0 > > > I noticed the following when running > TestHBCKCommandLineParsing#testCommandWithOptions : > {code} > "main" #1 prio=5 os_prio=31 tid=0x7f851c80 nid=0x1703 waiting on > condition [0x70216000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00076d3055d8> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:564) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:297) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:229) > at > org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$11/502838712.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:227) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:127) > at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:93) > at org.apache.hbase.HBCK2.run(HBCK2.java:352) > at > org.apache.hbase.TestHBCKCommandLineParsing.testCommandWithOptions(TestHBCKCommandLineParsing.java:62) > {code} > The test doesn't spin up hbase cluster. > Hence the call to check hbck support hangs. > In HBCK2#run, we can refactor the code such that argument parsing is done > prior to calling HBCK2#checkHBCKSupport . -- This message was sent by Atlassian JIRA (v7.6.3#76005)