[jira] [Updated] (HBASE-21164) reportForDuty to spew less log if master is initializing
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HBASE-21164: -- Attachment: HBASE-21164.008.patch > reportForDuty to spew less log if master is initializing > > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.007.patch, HBASE-21164.008.patch, > HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, > HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. We should spew less those logs. Here > is example: > {code:java} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message – every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21164) reportForDuty to spew less log if master is initializing
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614388#comment-16614388 ] Hadoop QA commented on HBASE-21164: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 47s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 11s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} hbase-common: The patch generated 0 new + 2 unchanged - 1 fixed = 2 total (was 3) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s{color} | {color:red} hbase-server: The patch generated 3 new + 230 unchanged - 0 fixed = 233 total (was 230) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 7s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 53s{color} | {color:red} The patch causes 10 errors with Hadoop v3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 43s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}217m 24s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}263m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21164 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939636/HBASE-21164.006.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux cc00e5c51b84 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Updated] (HBASE-21164) reportForDuty to spew less log if master is initializing
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HBASE-21164: -- Description: RegionServers do reportForDuty on startup to tell Master they are available. If Master is initializing, and especially on a big cluster when it can take a while particularly if something is amiss, the log every three seconds is annoying and doesn't do anything of use. We should spew less those logs. Here is example: {code:java} 2018-09-06 14:01:39,312 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, startcode=1536266763109 2018-09-06 14:01:39,312 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. {code} For example, I am looking at a large cluster now that had a backlog of procedure WALs. It is taking a couple of hours recreating the procedure-state because there are millions of procedures outstanding. Meantime, the Master log is just full of the above message – every three seconds... was: RegionServers do reportForDuty on startup to tell Master they are available. If Master is initializing, and especially on a big cluster when it can take a while particularly if something is amiss, the log every three seconds is annoying and doesn't do anything of use. Do backoff if fails up to a reasonable maximum period. Here is example: {code} 2018-09-06 14:01:39,312 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, startcode=1536266763109 2018-09-06 14:01:39,312 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying. {code} For example, I am looking at a large cluster now that had a backlog of procedure WALs. It is taking a couple of hours recreating the procedure-state because there are millions of procedures outstanding. Meantime, the Master log is just full of the above message -- every three seconds... > reportForDuty to spew less log if master is initializing > > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.007.patch, HBASE-21164.branch-2.1.001.patch, > HBASE-21164.branch-2.1.002.patch, HBASE-21164.branch-2.1.003.patch, > HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. We should spew less those logs. Here > is example: > {code:java} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message – every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21164) reportForDuty to spew less log if master is initializing
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HBASE-21164: -- Summary: reportForDuty to spew less log if master is initializing (was: reportForDuty should do (expotential) backoff rather than retry every 3 seconds (default).) > reportForDuty to spew less log if master is initializing > > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.007.patch, HBASE-21164.branch-2.1.001.patch, > HBASE-21164.branch-2.1.002.patch, HBASE-21164.branch-2.1.003.patch, > HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. Do backoff if fails up to a > reasonable maximum period. Here is example: > {code} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message -- every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21164) reportForDuty should do (expotential) backoff rather than retry every 3 seconds (default).
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614383#comment-16614383 ] Mingliang Liu commented on HBASE-21164: --- V7 patch to address Allan's concern. Refactoring Sleeper seems not necessary. We can remove the unit test if it's not necessary either, as the change is not as major as previous version. > reportForDuty should do (expotential) backoff rather than retry every 3 > seconds (default). > -- > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.007.patch, HBASE-21164.branch-2.1.001.patch, > HBASE-21164.branch-2.1.002.patch, HBASE-21164.branch-2.1.003.patch, > HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. Do backoff if fails up to a > reasonable maximum period. Here is example: > {code} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message -- every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21035) Meta Table should be able to online even if all procedures are lost
[ https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614382#comment-16614382 ] stack commented on HBASE-21035: --- I've been trying to make basic progress on HBCK2. I pushed up an HBCK2 tool that can call our only Hbck method over in the hbase-operator-tools project: https://github.com/apache/hbase-operator-tools/commit/0cf0e0ecf2d4a33522e0e273f9310f11aa2eaee6. It is missing so much -- test, how to package, how to pass in pointer to the cluster to fix, doc., etc., but I'm working on it. Next is adding assign and bulk assign to Hbck Service. This Hbck assign will be different to Admin Assign in that it should work even though the Master is 'initializing' (Admin assign fails because we check master state before we do anything -- which makes it so can't schedule meta assign if it offlined). The hbck assign bypass stuff like calling CPs too. I also want bulk assign -- i.e. passing a thousand regions at a time to assign -- because when doing repairs, clusters will probably be big with lots of regions in odd states. I've been running a fixup job on a cluster where I have thousands of regions in OPENING state (I removed the Master WAL Procs after crashing it... ). Doing assigns one at a time on the command-line doesn't cut it... It takes from 10-40 seconds per assign. > Meta Table should be able to online even if all procedures are lost > --- > > Key: HBASE-21035 > URL: https://issues.apache.org/jira/browse/HBASE-21035 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21035.branch-2.0.001.patch, > HBASE-21035.branch-2.1.001.patch > > > After HBASE-20708, we changed the way we init after master starts. It will > only check WAL dirs and compare to Zookeeper RS nodes to decide which server > need to expire. For servers which's dir is ending with 'SPLITTING', we assure > that there will be a SCP for it. > But, if the server with the meta region crashed before master restarts, and > if all the procedure wals are lost (due to bug, or deleted manually, > whatever), the new restarted master will be stuck when initing. Since no one > will bring meta region online. > Although it is an anomaly case, but I think no matter what happens, we need > to online meta region. Otherwise, we are sitting ducks, noting can be done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21164) reportForDuty should do (expotential) backoff rather than retry every 3 seconds (default).
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HBASE-21164: -- Attachment: HBASE-21164.007.patch > reportForDuty should do (expotential) backoff rather than retry every 3 > seconds (default). > -- > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.007.patch, HBASE-21164.branch-2.1.001.patch, > HBASE-21164.branch-2.1.002.patch, HBASE-21164.branch-2.1.003.patch, > HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. Do backoff if fails up to a > reasonable maximum period. Here is example: > {code} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message -- every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21102) ServerCrashProcedure should select target server where no other replicas exist for the current region
[ https://issues.apache.org/jira/browse/HBASE-21102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614380#comment-16614380 ] ramkrishna.s.vasudevan commented on HBASE-21102: Oh I see. Some one has raised an issue for the failures. I am just seeing it. > ServerCrashProcedure should select target server where no other replicas > exist for the current region > - > > Key: HBASE-21102 > URL: https://issues.apache.org/jira/browse/HBASE-21102 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 3.0.0, 2.2.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Attachments: HBASE-21102_1.patch, HBASE-21102_2.patch, > HBASE-21102_3.patch, HBASE-21102_4.patch, HBASE-21102_initial.patch > > > Currently when a server with region replica crashes, when the target server > is created for the replica region assignment there is no guarentee that a > server is selected where there is no other replica for the current region > getting assigned. It so happens that currently we do an assignment randomly > and later the LB comes and identifies these cases and again does MOVE for > such regions. It will be better if we can identify target servers at least > minimally ensuring that replicas are not colocated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21178) Get and Scan operation with converter_class not working
[ https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614377#comment-16614377 ] Subrat Mishra commented on HBASE-21178: --- {quote}Which version break it? U know which jira ? {quote} Jira id: HBASE-18067 and version 2.0.0 > Get and Scan operation with converter_class not working > --- > > Key: HBASE-21178 > URL: https://issues.apache.org/jira/browse/HBASE-21178 > Project: HBase > Issue Type: Bug >Reporter: Subrat Mishra >Assignee: Subrat Mishra >Priority: Major > Attachments: HBASE-21178.master.001.patch > > > Consider a simple scenario: > {code:java} > create 'foo', {NAME => 'f1'} > put 'foo','r1','f1:a',1000 > get 'foo','r1',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} > scan 'foo',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code} > Both get and scan fails with ERROR > {code:java} > ERROR: wrong number of arguments (3 for 1) {code} > Looks like in table.rb file converter_method expects 3 arguments [(bytes, > offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only > 1 argument [(bytes)] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work
[ https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614374#comment-16614374 ] Hadoop QA commented on HBASE-20993: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 42s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 10s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 17s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 5m 3s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 4s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 18s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 59s{color} | {color:green} root: The patch generated 0 new + 81 unchanged - 1 fixed = 81 total (was 82) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 ill-formed XML file(s). {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 59s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 46s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 10s{color} | {color:green} hbase-checkstyle in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 32s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}107m 49s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} |
[jira] [Commented] (HBASE-21102) ServerCrashProcedure should select target server where no other replicas exist for the current region
[ https://issues.apache.org/jira/browse/HBASE-21102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614373#comment-16614373 ] Duo Zhang commented on HBASE-21102: --- HBASE-21197 is for this problem? I think we can resolve this issue first and start working on fixing the UT there. After that we can open backport issues. > ServerCrashProcedure should select target server where no other replicas > exist for the current region > - > > Key: HBASE-21102 > URL: https://issues.apache.org/jira/browse/HBASE-21102 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 3.0.0, 2.2.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Attachments: HBASE-21102_1.patch, HBASE-21102_2.patch, > HBASE-21102_3.patch, HBASE-21102_4.patch, HBASE-21102_initial.patch > > > Currently when a server with region replica crashes, when the target server > is created for the replica region assignment there is no guarentee that a > server is selected where there is no other replica for the current region > getting assigned. It so happens that currently we do an assignment randomly > and later the LB comes and identifies these cases and again does MOVE for > such regions. It will be better if we can identify target servers at least > minimally ensuring that replicas are not colocated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21102) ServerCrashProcedure should select target server where no other replicas exist for the current region
[ https://issues.apache.org/jira/browse/HBASE-21102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614372#comment-16614372 ] ramkrishna.s.vasudevan commented on HBASE-21102: [~Apache9] I already downloaded the artificats from the build and verified the logs. The existing logs does not give much info. When I wanted to fix the flakiness before I committed the patch I added intermediatory logs and only then found that the randomAssignment() was having some issues. Since in my local cluster the test passes consistently I would like to push in some Logs and then rerun the tests. Probably in precommits. If pre-commit does not fail then let me push those logs and then generate a build. But one question I have is will the flakey test re run after the commit with Log msgs added is done? > ServerCrashProcedure should select target server where no other replicas > exist for the current region > - > > Key: HBASE-21102 > URL: https://issues.apache.org/jira/browse/HBASE-21102 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 3.0.0, 2.2.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Attachments: HBASE-21102_1.patch, HBASE-21102_2.patch, > HBASE-21102_3.patch, HBASE-21102_4.patch, HBASE-21102_initial.patch > > > Currently when a server with region replica crashes, when the target server > is created for the replica region assignment there is no guarentee that a > server is selected where there is no other replica for the current region > getting assigned. It so happens that currently we do an assignment randomly > and later the LB comes and identifies these cases and again does MOVE for > such regions. It will be better if we can identify target servers at least > minimally ensuring that replicas are not colocated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21164) reportForDuty should do (expotential) backoff rather than retry every 3 seconds (default).
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614360#comment-16614360 ] Mingliang Liu commented on HBASE-21164: --- I assumed the slow down of master starting up was acceptable for this case and I favored the backoff. Unlike cluster shutdown(), the RS sleeper can not be waken if master is ready early. The up to 1min is unfortunately unavoidable in the worst case. {quote}We'd just turn off the log spew? {quote} My initial idea was to dump log every dozens of retries (say, 100, so 3s*100 = 5min). We go with that? > reportForDuty should do (expotential) backoff rather than retry every 3 > seconds (default). > -- > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, > HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. Do backoff if fails up to a > reasonable maximum period. Here is example: > {code} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message -- every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614317#comment-16614317 ] stack commented on HBASE-20952: --- Where is the 'design doc' that we're talking about? Is it the google doc attached in middle of this JIRA? The overview? If so, I was wondering if this doc was going to get a revision? Seems like plenty of questions and back-and-forth above that might get consideration and that might have an impact on the API and on general subsystem thinking? (Lets add link to this doc up at the top of this issue) I like the [~Apache9] questions. Is it that he's done more homework that he has these questions or that he just has a better understanding of how the system works? IMO, it tends to be easier working through concerns in a design than in comments in JIRA/RB and or in code review; the latter tends to get distributed all over and moving code without the high-level figured can bring on myopia. Is the 'design doc' we talk of above the place to work through his concerns or is that somewhere else? Thanks > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15631) Backport Regionserver Groups (HBASE-6721) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614314#comment-16614314 ] loushang commented on HBASE-15631: -- Hi, any configuration suggestion or example for this patch to be used at branch 1-4? It seems that the configuration in HBASE-6721 does not work here. greate appreciation! > Backport Regionserver Groups (HBASE-6721) to branch-1 > -- > > Key: HBASE-15631 > URL: https://issues.apache.org/jira/browse/HBASE-15631 > Project: HBase > Issue Type: New Feature >Affects Versions: 1.4.0 >Reporter: Francis Liu >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.4.0 > > Attachments: HBASE-15631-branch-1-addendum.patch, > HBASE-15631-branch-1.patch, HBASE-15631-branch-1.patch, > HBASE-15631-branch-1.patch, HBASE-15631.branch-1.patch, HBASE-15631.patch > > > Based on dev list discussion backporting region server group should not be an > issue as it does not: 1. destabilize the code. 2. cause backward > incompatibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21164) reportForDuty should do (expotential) backoff rather than retry every 3 seconds (default).
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614306#comment-16614306 ] stack commented on HBASE-21164: --- This is a fair point [~allan163]. So, RS should just continue to bang on the Master every three seconds? We'd just turn off the log spew? > reportForDuty should do (expotential) backoff rather than retry every 3 > seconds (default). > -- > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, > HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. Do backoff if fails up to a > reasonable maximum period. Here is example: > {code} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message -- every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614292#comment-16614292 ] Duo Zhang commented on HBASE-20952: --- API is not the first thing to decide. As I said above, the first thing is we need to know the overall solution. You can see our design doc for serial replication and sync replication https://docs.google.com/document/d/1LHC3IRUc5i2V4_roNw8BDAOKGM4bEapR_hefpZxDT00/edit https://docs.google.com/document/d/193D3aOxD-muPIZuQfI4Zo3_qg6-Nepeu_kraYJVQkiE/edit#heading=h.e8l9k556m3wi There is no API design in it, but we try our best to describe how we plan to do it in HBase. {quote} This is good; I hadn't thought about abstracting out fencing. We should have API which pushes this fencing impl down into the Provider. For the Ratis LogService, we designed api to be able to close() a Log; make it read-only. In the context of HBase, we would close the Log before we start recovery/re-assignment, and have the net-effect of preventing any half-dead RS from continuing to try to add more edits to the Log. This effectively would work like recoverLease() does now for the HDFS case. {quote} Yes this is what I really want to discuss, not something like whether we should use WALInfo or WALIdentity. The information you described is still not enough to solve all the problems. In the old time we will roll the wal writer, and it is done by RS, so closing the wal file is not enough, as the RS will try to open a new one and write to it. That's why we need to rename the wal directory. In your words above, it seems to me that we will only have one stream opened forever for a RS, then how do we drop the old edits after flush? And how do we setup the wal stream? Only once at the RS start up? And if there are errors later, we just abort? Without trying to recover or open a new stream? Or it will be handled by ratis? And for the FileSystem, we will use multi wal to increase the performance, and the logic is messed up with WALProvider. Does ratis still need multi wal to increase the performance? And if not, what's the plan? We need to refactor the multi wal related code, to not work against the WALProvider but something with the FileSystem related stuffs directly? For the sync replication thing, it is just a DualAsyncWriter, which writes to two HDFS clusters at once, I think it is possible to write to other log systems, such as ratis, if you still share the AsyncWriter interface. The problem here is that how to describe the place where we write the remote wals. For FileSystem based wals, it is just a directory on a remote cluster, for example, "hdfs://cluster-name/path". We need to find a way to describe other log systems. > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614288#comment-16614288 ] Reid Chan commented on HBASE-20734: --- I think it is ok to go. Please provide a patch for branch-1 and ping Andrew to take a look. > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20734.branch-1.001.patch, > HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, > HBASE-20734.branch-1.004.patch, HBASE-20734.master.001.patch, > HBASE-20734.master.002.patch, HBASE-20734.master.003.patch, > HBASE-20734.master.004.patch, HBASE-20734.master.005.patch, > HBASE-20734.master.006.patch, HBASE-20734.master.007.patch, > HBASE-20734.master.008.patch, HBASE-20734.master.009.patch, > HBASE-20734.master.010.patch, HBASE-20734.master.011.patch, > HBASE-20734.master.012.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21188) Print heap and gc informations in our junit ResourceChecker
[ https://issues.apache.org/jira/browse/HBASE-21188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614280#comment-16614280 ] Duo Zhang commented on HBASE-21188: --- Just want to confirm that whether GC is a problem for our slow test. At least GC count and GC time should not be considered as a 'Resource'. Will revert the patch later, as it seems that GC is not the actual problem. > Print heap and gc informations in our junit ResourceChecker > --- > > Key: HBASE-21188 > URL: https://issues.apache.org/jira/browse/HBASE-21188 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21188.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21160) Assertion in TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels is ignored
[ https://issues.apache.org/jira/browse/HBASE-21160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614279#comment-16614279 ] Ted Yu commented on HBASE-21160: As I said above, when there is no assertion at the end of try block, you don't need to make change. Please also keep the try-with-resources structure which releases resource. > Assertion in > TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels > is ignored > --- > > Key: HBASE-21160 > URL: https://issues.apache.org/jira/browse/HBASE-21160 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: liubangchen >Priority: Trivial > > From > https://builds.apache.org/job/PreCommit-HBASE-Build/14327/artifact/patchprocess/diff-compile-javac-hbase-server.txt > (HBASE-21138 QA run): > {code} > [WARNING] > /testptch/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithDeletes.java:[315,25] > [AssertionFailureIgnored] This assertion throws an AssertionError if it > fails, which will be caught by an enclosing try block. > {code} > Here is related code: > {code} > PrivilegedExceptionAction scanAction = new > PrivilegedExceptionAction() { > @Override > public Void run() throws Exception { > try (Connection connection = > ConnectionFactory.createConnection(conf); > ... > assertEquals(1, next.length); > } catch (Throwable t) { > throw new IOException(t); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21112) [Auth] IPC client fallback to simple auth (forward-port to master)
[ https://issues.apache.org/jira/browse/HBASE-21112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan updated HBASE-21112: -- Fix Version/s: 2.2.0 3.0.0 > [Auth] IPC client fallback to simple auth (forward-port to master) > -- > > Key: HBASE-21112 > URL: https://issues.apache.org/jira/browse/HBASE-21112 > Project: HBase > Issue Type: Bug >Reporter: Jack Bearden >Assignee: Jack Bearden >Priority: Critical > Labels: master > Fix For: 3.0.0, 2.2.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21112) [Auth] IPC client fallback to simple auth (forward-port to master)
[ https://issues.apache.org/jira/browse/HBASE-21112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan reopened HBASE-21112: --- > [Auth] IPC client fallback to simple auth (forward-port to master) > -- > > Key: HBASE-21112 > URL: https://issues.apache.org/jira/browse/HBASE-21112 > Project: HBase > Issue Type: Bug >Reporter: Jack Bearden >Assignee: Jack Bearden >Priority: Critical > Labels: master > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21160) Assertion in TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels is ignored
[ https://issues.apache.org/jira/browse/HBASE-21160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614271#comment-16614271 ] liubangchen edited comment on HBASE-21160 at 9/14/18 2:44 AM: -- Hi [~yuzhih...@gmail.com] code review link is here [reviews-68699|https://reviews.apache.org/r/68699/] was (Author: liubangchen): Hi [~yuzhih...@gmail.com] code review link is here [#https://reviews.apache.org/r/68699/] > Assertion in > TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels > is ignored > --- > > Key: HBASE-21160 > URL: https://issues.apache.org/jira/browse/HBASE-21160 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: liubangchen >Priority: Trivial > > From > https://builds.apache.org/job/PreCommit-HBASE-Build/14327/artifact/patchprocess/diff-compile-javac-hbase-server.txt > (HBASE-21138 QA run): > {code} > [WARNING] > /testptch/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithDeletes.java:[315,25] > [AssertionFailureIgnored] This assertion throws an AssertionError if it > fails, which will be caught by an enclosing try block. > {code} > Here is related code: > {code} > PrivilegedExceptionAction scanAction = new > PrivilegedExceptionAction() { > @Override > public Void run() throws Exception { > try (Connection connection = > ConnectionFactory.createConnection(conf); > ... > assertEquals(1, next.length); > } catch (Throwable t) { > throw new IOException(t); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work
[ https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan updated HBASE-20993: -- Fix Version/s: (was: 2.2.0) (was: 3.0.0) > [Auth] IPC client fallback to simple auth allowed doesn't work > -- > > Key: HBASE-20993 > URL: https://issues.apache.org/jira/browse/HBASE-20993 > Project: HBase > Issue Type: Bug > Components: Client, security >Affects Versions: 1.2.6 >Reporter: Reid Chan >Assignee: Jack Bearden >Priority: Critical > Fix For: 1.5.0, 1.4.8 > > Attachments: HBASE-20993.001.patch, > HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, > HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, > HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, > HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, > HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, > HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, > HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt > > > It is easily reproducible. > client's hbase-site.xml: hadoop.security.authentication:kerberos, > hbase.security.authentication:kerberos, > hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal > are right set > A simple auth hbase cluster, a kerberized hbase client application. > application trying to r/w/c/d table will have following exception: > {code} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738) > at > org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4297) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4289) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsyncV2(HBaseAdmin.java:753) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:674) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:607) > at > org.playground.hbase.KerberizedClientFallback.main(KerberizedClientFallback.java:55) >
[jira] [Commented] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work
[ https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614272#comment-16614272 ] Reid Chan commented on HBASE-20993: --- Thanks for the info Jack, just enjoy your time, not in a rush. Trigger branch-1 v9 again, if qa +1, we should let it go first. I'll reopen the jira for port master branch. > [Auth] IPC client fallback to simple auth allowed doesn't work > -- > > Key: HBASE-20993 > URL: https://issues.apache.org/jira/browse/HBASE-20993 > Project: HBase > Issue Type: Bug > Components: Client, security >Affects Versions: 1.2.6 >Reporter: Reid Chan >Assignee: Jack Bearden >Priority: Critical > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8 > > Attachments: HBASE-20993.001.patch, > HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, > HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, > HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, > HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, > HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, > HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, > HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt > > > It is easily reproducible. > client's hbase-site.xml: hadoop.security.authentication:kerberos, > hbase.security.authentication:kerberos, > hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal > are right set > A simple auth hbase cluster, a kerberized hbase client application. > application trying to r/w/c/d table will have following exception: > {code} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738) > at > org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4297) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4289) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsyncV2(HBaseAdmin.java:753) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:674) > at >
[jira] [Commented] (HBASE-21160) Assertion in TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels is ignored
[ https://issues.apache.org/jira/browse/HBASE-21160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614271#comment-16614271 ] liubangchen commented on HBASE-21160: - Hi [~yuzhih...@gmail.com] code review link is here [#https://reviews.apache.org/r/68699/] > Assertion in > TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels > is ignored > --- > > Key: HBASE-21160 > URL: https://issues.apache.org/jira/browse/HBASE-21160 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: liubangchen >Priority: Trivial > > From > https://builds.apache.org/job/PreCommit-HBASE-Build/14327/artifact/patchprocess/diff-compile-javac-hbase-server.txt > (HBASE-21138 QA run): > {code} > [WARNING] > /testptch/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithDeletes.java:[315,25] > [AssertionFailureIgnored] This assertion throws an AssertionError if it > fails, which will be caught by an enclosing try block. > {code} > Here is related code: > {code} > PrivilegedExceptionAction scanAction = new > PrivilegedExceptionAction() { > @Override > public Void run() throws Exception { > try (Connection connection = > ConnectionFactory.createConnection(conf); > ... > assertEquals(1, next.length); > } catch (Throwable t) { > throw new IOException(t); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work
[ https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan updated HBASE-20993: -- Attachment: HBASE-20993.branch-1.009.patch > [Auth] IPC client fallback to simple auth allowed doesn't work > -- > > Key: HBASE-20993 > URL: https://issues.apache.org/jira/browse/HBASE-20993 > Project: HBase > Issue Type: Bug > Components: Client, security >Affects Versions: 1.2.6 >Reporter: Reid Chan >Assignee: Jack Bearden >Priority: Critical > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8 > > Attachments: HBASE-20993.001.patch, > HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, > HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, > HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, > HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, > HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, > HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, > HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt > > > It is easily reproducible. > client's hbase-site.xml: hadoop.security.authentication:kerberos, > hbase.security.authentication:kerberos, > hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal > are right set > A simple auth hbase cluster, a kerberized hbase client application. > application trying to r/w/c/d table will have following exception: > {code} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738) > at > org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4297) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4289) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsyncV2(HBaseAdmin.java:753) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:674) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:607) > at > org.playground.hbase.KerberizedClientFallback.main(KerberizedClientFallback.java:55) > Caused by:
[jira] [Resolved] (HBASE-9469) Synchronous replication
[ https://issues.apache.org/jira/browse/HBASE-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-9469. -- Resolution: Duplicate > Synchronous replication > --- > > Key: HBASE-9469 > URL: https://issues.apache.org/jira/browse/HBASE-9469 > Project: HBase > Issue Type: New Feature >Reporter: Honghua Feng >Priority: Major > > Scenario: > A/B clusters with master-master replication, client writes to A cluster and A > pushes all writes to B cluster, and when A cluster is down, client switches > writing to B cluster. > But the client's write switch is unsafe due to the replication between A/B is > asynchronous: a delete to B cluster which aims to delete a put written > earlier can fail due to that put is written to A cluster and isn't > successfully pushed to B before A is down. It can be worse if this delete is > collected(flush and then major compact occurs) before A cluster is up and > that put is eventually pushed to B, the put won't ever be deleted. > Can we provide per-table/per-peer synchronous replication which ships the > according hlog entry of write before responsing write success to client? By > this we can guarantee the client that all write requests for which he got > success response when he wrote to A cluster must already have been in B > cluster as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-9469) Synchronous replication
[ https://issues.apache.org/jira/browse/HBASE-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-9469: -- > Synchronous replication > --- > > Key: HBASE-9469 > URL: https://issues.apache.org/jira/browse/HBASE-9469 > Project: HBase > Issue Type: New Feature >Reporter: Honghua Feng >Priority: Major > > Scenario: > A/B clusters with master-master replication, client writes to A cluster and A > pushes all writes to B cluster, and when A cluster is down, client switches > writing to B cluster. > But the client's write switch is unsafe due to the replication between A/B is > asynchronous: a delete to B cluster which aims to delete a put written > earlier can fail due to that put is written to A cluster and isn't > successfully pushed to B before A is down. It can be worse if this delete is > collected(flush and then major compact occurs) before A cluster is up and > that put is eventually pushed to B, the put won't ever be deleted. > Can we provide per-table/per-peer synchronous replication which ships the > according hlog entry of write before responsing write success to client? By > this we can guarantee the client that all write requests for which he got > success response when he wrote to A cluster must already have been in B > cluster as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614257#comment-16614257 ] Josh Elser commented on HBASE-20952: Good questions! Thanks for taking the time to write them, Duo. {quote}How do we do fencing when RS crashes? Now we need to rename the wal directory for a RS, and then call recoverLease for all the files to confirm that they are all closed. And at RS side, when creating a wal write, we use createNonRecursive intentionally, so that if the wal directory has been renamed, we can not create wal writers any more. How do we want to abstract these operations in the new WAL API? How does other log systems, such as ratis, deal with this? {quote} This is good; I hadn't thought about abstracting out fencing. We should have API which pushes this fencing impl down into the Provider. For the Ratis LogService, we designed api to be able to {{close()}} a Log; make it read-only. In the context of HBase, we would close the Log before we start recovery/re-assignment, and have the net-effect of preventing any half-dead RS from continuing to try to add more edits to the Log. This effectively would work like recoverLease() does now for the HDFS case. {quote}For sync replication, we have a config called remote wal directory, which exposes the file system to user. As it is implemented by us at Xiaomi, we can help to find a work around on this. {quote} Ok. I'm definitely dense here :). Do you have a pointer to some code to look at? Or, based on my previous, is a solution obvious to you? {quote}looking at the code on the RB, we have already started to change the stuffs in replication? And for RecoveredReplicationSource, we make it abstract and introduce a new FSRecoveredReplicationSource? Then where is the FSReplicationSource? {quote} There is a second RB open which has a much-reduced version of that original patch. Looks like this might not have gotten attached to this Jira issue (oops, will make sure that's linked). [https://reviews.apache.org/r/68672] This should help give a much smaller view of API only. Trying to make some of the other "systems" using WALs work with a new API was a good exercise to make sure we didn't miss something obvious. Totally in agreement that we want a good API before we start throwing out implementation. > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614238#comment-16614238 ] Hadoop QA commented on HBASE-21196: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 39s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 51s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 59s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 35s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 19s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}243m 14s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}292m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.replication.TestReplicationKillSlaveRSWithSeparateOldWALs | | | hadoop.hbase.replication.TestReplicationKillSlaveRS | | | hadoop.hbase.client.TestSnapshotTemporaryDirectoryWithRegionReplicas | | | hadoop.hbase.replication.TestReplicationSmallTests | | | hadoop.hbase.replication.TestReplicationSmallTestsSync | | | hadoop.hbase.client.TestRegionLocationCaching | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21196 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939613/HBASE-21196.master.001.patch | | Optional Tests | asflicense javac javadoc unit
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614232#comment-16614232 ] Duo Zhang commented on HBASE-20952: --- The design doc does not help, it is just like pseudo-code. What I want to know is that, how do we deal with several key problems if we want to remove the direct dependency on FileSystem. There is a simple list that comes immediately to my mind: 1. How do we do fencing when RS crashes? Now we need to rename the wal directory for a RS, and then call recoverLease for all the files to confirm that they are all closed. And at RS side, when creating a wal write, we use createNonRecursive intentionally, so that if the wal directory has been renamed, we can not create wal writers any more. How do we want to abstract these operations in the new WAL API? How does other log systems, such as ratis, deal with this? 2. For sync replication, we have a config called remote wal directory, which exposes the file system to user. As it is implemented by us at Xiaomi, we can help to find a work around on this. And the sync replication also replies on the rename operation to do fencing. 3. The replication related stuffs. I have been asking this from long long ago, but no one gives an overall solution. And looking at the code on the RB, we have already started to change the stuffs in replication? And for RecoveredReplicationSource, we make it abstract and introduce a new FSRecoveredReplicationSource? Then where is the FSReplicationSource? I always say, we should have an overall solution first, i.e., we should know what the system looks like when we finish. Then we start to work things out. Thanks. > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21035) Meta Table should be able to online even if all procedures are lost
[ https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614231#comment-16614231 ] Allan Yang commented on HBASE-21035: {quote} So let's start helping on HBCK2? {quote} Sure! > Meta Table should be able to online even if all procedures are lost > --- > > Key: HBASE-21035 > URL: https://issues.apache.org/jira/browse/HBASE-21035 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21035.branch-2.0.001.patch, > HBASE-21035.branch-2.1.001.patch > > > After HBASE-20708, we changed the way we init after master starts. It will > only check WAL dirs and compare to Zookeeper RS nodes to decide which server > need to expire. For servers which's dir is ending with 'SPLITTING', we assure > that there will be a SCP for it. > But, if the server with the meta region crashed before master restarts, and > if all the procedure wals are lost (due to bug, or deleted manually, > whatever), the new restarted master will be stuck when initing. Since no one > will bring meta region online. > Although it is an anomaly case, but I think no matter what happens, we need > to online meta region. Otherwise, we are sitting ducks, noting can be done. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614222#comment-16614222 ] Ted Yu commented on HBASE-20734: I wouldn't have big chunk of time to review - working on WAL refactoring. FYI > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20734.branch-1.001.patch, > HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, > HBASE-20734.branch-1.004.patch, HBASE-20734.master.001.patch, > HBASE-20734.master.002.patch, HBASE-20734.master.003.patch, > HBASE-20734.master.004.patch, HBASE-20734.master.005.patch, > HBASE-20734.master.006.patch, HBASE-20734.master.007.patch, > HBASE-20734.master.008.patch, HBASE-20734.master.009.patch, > HBASE-20734.master.010.patch, HBASE-20734.master.011.patch, > HBASE-20734.master.012.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21164) reportForDuty should do (expotential) backoff rather than retry every 3 seconds (default).
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614228#comment-16614228 ] Allan Yang commented on HBASE-21164: Another concern is that if the master is down for a long time, the regionserver will report for duty in a max time of one minute, that will slow down the master start up process. Since we need to count enough RS before continue. > reportForDuty should do (expotential) backoff rather than retry every 3 > seconds (default). > -- > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, > HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. Do backoff if fails up to a > reasonable maximum period. Here is example: > {code} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message -- every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21197) TestServerCrashProcedureWithReplicas fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HBASE-21197: -- Description: Example failure reports are: [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] and [https://builds.apache.org/job/PreCommit-HBASE-Build/14381/testReport/] Failing test methods are: - {{testRecoveryAndDoubleExecutionOnRsWithMeta}} - {{testRecoveryAndDoubleExecutionOnRsWithoutMeta}} - {{testCrashTargetRs}}. Specially, the exception trace is: {code:java} java.lang.AssertionError: Crashed replica regions should not be assigned to same region server at org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} was: Example failure reports are: [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] and [https://builds.apache.org/job/PreCommit-HBASE-Build/14381/testReport/] Failing test methods are: {{testRecoveryAndDoubleExecutionOnRsWithMeta}}, {{testRecoveryAndDoubleExecutionOnRsWithoutMeta}} and {{testCrashTargetRs}}. Specially, the exception trace is: {code:java} java.lang.AssertionError: Crashed replica regions should not be assigned to same region server at org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} > TestServerCrashProcedureWithReplicas fails intermittently > - > > Key: HBASE-21197 > URL: https://issues.apache.org/jira/browse/HBASE-21197 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Priority: Major > > Example failure reports are: > [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] and > [https://builds.apache.org/job/PreCommit-HBASE-Build/14381/testReport/] > Failing test methods are: > - {{testRecoveryAndDoubleExecutionOnRsWithMeta}} > - {{testRecoveryAndDoubleExecutionOnRsWithoutMeta}} > - {{testCrashTargetRs}}. > Specially, the exception trace is: > {code:java} > java.lang.AssertionError: Crashed replica regions should not be assigned to > same region server > at > org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21197) TestServerCrashProcedureWithReplicas fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HBASE-21197: -- Description: Example failure reports are: [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] and [https://builds.apache.org/job/PreCommit-HBASE-Build/14381/testReport/] Failing test methods are: {{testRecoveryAndDoubleExecutionOnRsWithMeta}}, {{testRecoveryAndDoubleExecutionOnRsWithoutMeta}} and {{testCrashTargetRs}}. Specially, the exception trace is: {code:java} java.lang.AssertionError: Crashed replica regions should not be assigned to same region server at org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} was: An example failure report is like: [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] Failing test methods are: {{testRecoveryAndDoubleExecutionOnRsWithMeta}}, {{testRecoveryAndDoubleExecutionOnRsWithoutMeta}} and {{testCrashTargetRs}}. Specially, the exception trace is: {code:java} java.lang.AssertionError: Crashed replica regions should not be assigned to same region server at org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} > TestServerCrashProcedureWithReplicas fails intermittently > - > > Key: HBASE-21197 > URL: https://issues.apache.org/jira/browse/HBASE-21197 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Priority: Major > > Example failure reports are: > [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] and > [https://builds.apache.org/job/PreCommit-HBASE-Build/14381/testReport/] > Failing test methods are: {{testRecoveryAndDoubleExecutionOnRsWithMeta}}, > {{testRecoveryAndDoubleExecutionOnRsWithoutMeta}} and {{testCrashTargetRs}}. > Specially, the exception trace is: > {code:java} > java.lang.AssertionError: Crashed replica regions should not be assigned to > same region server > at > org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614219#comment-16614219 ] Zach York commented on HBASE-20734: --- [~yuzhih...@gmail.com] any further thoughts? > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20734.branch-1.001.patch, > HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, > HBASE-20734.branch-1.004.patch, HBASE-20734.master.001.patch, > HBASE-20734.master.002.patch, HBASE-20734.master.003.patch, > HBASE-20734.master.004.patch, HBASE-20734.master.005.patch, > HBASE-20734.master.006.patch, HBASE-20734.master.007.patch, > HBASE-20734.master.008.patch, HBASE-20734.master.009.patch, > HBASE-20734.master.010.patch, HBASE-20734.master.011.patch, > HBASE-20734.master.012.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21164) reportForDuty should do (expotential) backoff rather than retry every 3 seconds (default).
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614210#comment-16614210 ] Mingliang Liu commented on HBASE-21164: --- {quote} > There is a facility to wake this.sleeper. Could call from stop/abort? Can do that. As long as it's after {{this.stopped = true;}}, sleeper should respect that. {quote} Currently the sleeper is already waken in stop/abort. v6 changes the test method to remove the dependency of log capture. > reportForDuty should do (expotential) backoff rather than retry every 3 > seconds (default). > -- > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, > HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. Do backoff if fails up to a > reasonable maximum period. Here is example: > {code} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message -- every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21164) reportForDuty should do (expotential) backoff rather than retry every 3 seconds (default).
[ https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HBASE-21164: -- Attachment: HBASE-21164.006.patch > reportForDuty should do (expotential) backoff rather than retry every 3 > seconds (default). > -- > > Key: HBASE-21164 > URL: https://issues.apache.org/jira/browse/HBASE-21164 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: stack >Assignee: Mingliang Liu >Priority: Minor > Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, > HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, > HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch > > > RegionServers do reportForDuty on startup to tell Master they are available. > If Master is initializing, and especially on a big cluster when it can take a > while particularly if something is amiss, the log every three seconds is > annoying and doesn't do anything of use. Do backoff if fails up to a > reasonable maximum period. Here is example: > {code} > 2018-09-06 14:01:39,312 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to > master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, > startcode=1536266763109 > 2018-09-06 14:01:39,312 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; > sleeping and then retrying. > > {code} > For example, I am looking at a large cluster now that had a backlog of > procedure WALs. It is taking a couple of hours recreating the procedure-state > because there are millions of procedures outstanding. Meantime, the Master > log is just full of the above message -- every three seconds... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21197) TestServerCrashProcedureWithReplicas fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HBASE-21197: -- Description: An example failure report is like: [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] Failing test methods are: {{testRecoveryAndDoubleExecutionOnRsWithMeta}}, {{testRecoveryAndDoubleExecutionOnRsWithoutMeta}} and {{testCrashTargetRs}}. Specially, the exception trace is: {code:java} java.lang.AssertionError: Crashed replica regions should not be assigned to same region server at org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} was: An example failure report is like: [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] Specially, the exception trace is: {code:java} java.lang.AssertionError: Crashed replica regions should not be assigned to same region server at org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} > TestServerCrashProcedureWithReplicas fails intermittently > - > > Key: HBASE-21197 > URL: https://issues.apache.org/jira/browse/HBASE-21197 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Priority: Major > > An example failure report is like: > [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] > Failing test methods are: {{testRecoveryAndDoubleExecutionOnRsWithMeta}}, > {{testRecoveryAndDoubleExecutionOnRsWithoutMeta}} and {{testCrashTargetRs}}. > Specially, the exception trace is: > {code:java} > java.lang.AssertionError: Crashed replica regions should not be assigned to > same region server > at > org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21197) TestServerCrashProcedureWithReplicas fails intermittently
Mingliang Liu created HBASE-21197: - Summary: TestServerCrashProcedureWithReplicas fails intermittently Key: HBASE-21197 URL: https://issues.apache.org/jira/browse/HBASE-21197 Project: HBase Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Mingliang Liu An example failure report is like: [https://builds.apache.org/job/PreCommit-HBASE-Build/14396/testReport/] Specially, the exception trace is: {code:java} java.lang.AssertionError: Crashed replica regions should not be assigned to same region server at org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas.assertReplicaDistributed(TestServerCrashProcedureWithReplicas.java:68){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work
[ https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614198#comment-16614198 ] Jack Bearden commented on HBASE-20993: -- Hi Reid thanks for checking in. I got pulled away for a family vacation but plan to make progress on this over the weekend > [Auth] IPC client fallback to simple auth allowed doesn't work > -- > > Key: HBASE-20993 > URL: https://issues.apache.org/jira/browse/HBASE-20993 > Project: HBase > Issue Type: Bug > Components: Client, security >Affects Versions: 1.2.6 >Reporter: Reid Chan >Assignee: Jack Bearden >Priority: Critical > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8 > > Attachments: HBASE-20993.001.patch, > HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, > HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, > HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, > HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, > HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.2.001.patch, > HBASE-20993.branch-1.wip.002.patch, HBASE-20993.branch-1.wip.patch, > yetus-local-testpatch-output-009.txt > > > It is easily reproducible. > client's hbase-site.xml: hadoop.security.authentication:kerberos, > hbase.security.authentication:kerberos, > hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal > are right set > A simple auth hbase cluster, a kerberized hbase client application. > application trying to r/w/c/d table will have following exception: > {code} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738) > at > org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4297) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4289) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsyncV2(HBaseAdmin.java:753) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:674) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:607) > at >
[jira] [Commented] (HBASE-20306) LoadTestTool does not print summary at end of run
[ https://issues.apache.org/jira/browse/HBASE-20306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614186#comment-16614186 ] Hadoop QA commented on HBASE-20306: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 30s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 13s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 11s{color} | {color:red} hbase-server: The patch generated 2 new + 8 unchanged - 0 fixed = 10 total (was 8) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 7s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 20s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}122m 33s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}163m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-20306 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939616/HBASE-20306.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux e6218759bd46 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 5d14c1af65 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14410/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14410/testReport/ | | Max. process+thread count | 5232 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output |
[jira] [Commented] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3
[ https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614180#comment-16614180 ] Zach York commented on HBASE-21098: --- Pushed to branch-2 and master. Wrangling a test before pushing it to branch-1. > Improve Snapshot Performance with Temporary Snapshot Directory when rootDir > on S3 > - > > Key: HBASE-21098 > URL: https://issues.apache.org/jira/browse/HBASE-21098 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 1.4.8, 2.1.1 >Reporter: Tyler Mi >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21098.master.001.patch, > HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, > HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, > HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, > HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, > HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, > HBASE-21098.master.012.patch, HBASE-21098.master.013.patch > > > When using Apache HBase, the snapshot feature can be used to make a point in > time recovery. To do this, HBase creates a manifest of all the files in all > of the Regions so that those files can be referenced again when a user > restores a snapshot. With HBase's S3 storage mode, developers can store their > data off-cluster on Amazon S3. However, utilizing S3 as a file system is > inefficient in some operations, namely renames. Most Hadoop ecosystem > applications use an atomic rename as a method of committing data. However, > with S3, a rename is a separate copy and then a delete of every file which is > no longer atomic and, in fact, quite costly. In addition, puts and deletes on > S3 have latency issues that traditional filesystems do not encounter when > manipulating the region snapshots to consolidate into a single manifest. When > HBase on S3 users have a significant amount of regions, puts, deletes, and > renames (the final commit stage of the snapshot) become the bottleneck > causing snapshots to take many minutes or even hours to complete. > The purpose of this patch is to increase the overall performance of snapshots > while utilizing HBase on S3 through the use of a temporary directory for the > snapshots that exists on a traditional filesystem like HDFS to circumvent the > bottlenecks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3
[ https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach York updated HBASE-21098: -- Fix Version/s: 2.2.0 3.0.0 > Improve Snapshot Performance with Temporary Snapshot Directory when rootDir > on S3 > - > > Key: HBASE-21098 > URL: https://issues.apache.org/jira/browse/HBASE-21098 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 1.4.8, 2.1.1 >Reporter: Tyler Mi >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21098.master.001.patch, > HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, > HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, > HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, > HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, > HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, > HBASE-21098.master.012.patch, HBASE-21098.master.013.patch > > > When using Apache HBase, the snapshot feature can be used to make a point in > time recovery. To do this, HBase creates a manifest of all the files in all > of the Regions so that those files can be referenced again when a user > restores a snapshot. With HBase's S3 storage mode, developers can store their > data off-cluster on Amazon S3. However, utilizing S3 as a file system is > inefficient in some operations, namely renames. Most Hadoop ecosystem > applications use an atomic rename as a method of committing data. However, > with S3, a rename is a separate copy and then a delete of every file which is > no longer atomic and, in fact, quite costly. In addition, puts and deletes on > S3 have latency issues that traditional filesystems do not encounter when > manipulating the region snapshots to consolidate into a single manifest. When > HBase on S3 users have a significant amount of regions, puts, deletes, and > renames (the final commit stage of the snapshot) become the bottleneck > causing snapshots to take many minutes or even hours to complete. > The purpose of this patch is to increase the overall performance of snapshots > while utilizing HBase on S3 through the use of a temporary directory for the > snapshots that exists on a traditional filesystem like HDFS to circumvent the > bottlenecks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
[ https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614121#comment-16614121 ] Zach York commented on HBASE-20734: --- Oh weird, I didn't notice the indenting... I'll remove this. > Colocate recovered edits directory with hbase.wal.dir > - > > Key: HBASE-20734 > URL: https://issues.apache.org/jira/browse/HBASE-20734 > Project: HBase > Issue Type: Improvement > Components: MTTR, Recovery, wal >Reporter: Ted Yu >Assignee: Zach York >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20734.branch-1.001.patch, > HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, > HBASE-20734.branch-1.004.patch, HBASE-20734.master.001.patch, > HBASE-20734.master.002.patch, HBASE-20734.master.003.patch, > HBASE-20734.master.004.patch, HBASE-20734.master.005.patch, > HBASE-20734.master.006.patch, HBASE-20734.master.007.patch, > HBASE-20734.master.008.patch, HBASE-20734.master.009.patch, > HBASE-20734.master.010.patch, HBASE-20734.master.011.patch, > HBASE-20734.master.012.patch > > > During investigation of HBASE-20723, I realized that we wouldn't get the best > performance when hbase.wal.dir is configured to be on different (fast) media > than hbase rootdir w.r.t. recovered edits since recovered edits directory is > currently under rootdir. > Such setup may not result in fast recovery when there is region server > failover. > This issue is to find proper (hopefully backward compatible) way in > colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20306) LoadTestTool does not print summary at end of run
[ https://issues.apache.org/jira/browse/HBASE-20306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614096#comment-16614096 ] Colin Garcia commented on HBASE-20306: -- Thanks for the comments Andrew. I've updated using the getHistogramReport functionality. The only concern I have is that the values won't exactly match the "Overall Status" due to some casting to long from doubles. Here is a snippet of the output. Let me know what you think :) hoping this is a step in the right direction {code:java} 2018-09-13 14:30:52,732 INFO [MultiThreadedAction-ProgressReporter-1536874207705] util.MultiThreadedAction: [W:20] Keys=82495, cols=927.8 K, time=00:00:45 Overall: [keys/s= 1832, latency=10.82 ms] Current: [keys/s=2058, latency=9.64 ms], wroteUpTo=-1 2018-09-13 14:30:57,733 INFO [MultiThreadedAction-ProgressReporter-1536874207705] util.MultiThreadedAction: [W:20] Keys=92516, cols=1.0 M, time=00:00:50 Overall: [keys/s= 1849, latency=10.72 ms] Current: [keys/s=2004, latency=9.89 ms], wroteUpTo=-1 2018-09-13 14:31:02,738 INFO [MultiThreadedAction-ProgressReporter-1536874207705] util.MultiThreadedAction: RUN SUMMARY KEYS PER SECOND: mean=1849.90, min=1108.00, max=2065.00, stdDev=291.71, 50th=1945.50, 75th=2017.50, 95th=2065.00, 99th=2065.00, 99.9th=2065.00, 99.99th=2065.00, 99.999th=2065.00 LATENCY: mean=10.60, min=9.00, max=17.00, stdDev=2.41, 50th=10.00, 75th=10.50, 95th=17.00, 99th=17.00, 99.9th=17.00, 99.99th=17.00, 99.999th=17.00 {code} > LoadTestTool does not print summary at end of run > - > > Key: HBASE-20306 > URL: https://issues.apache.org/jira/browse/HBASE-20306 > Project: HBase > Issue Type: Bug > Components: tooling >Reporter: Mike Drob >Assignee: Colin Garcia >Priority: Major > Labels: beginner > Attachments: HBASE-20306.000.patch, HBASE-20306.001.patch, > HBASE-20306.002.patch > > > ltt currently prints status as it goes, but doesn't give a nice summary of > what happened so users have to infer it from the last status line printed. > Would be nice to print a real summary with statistics about what was run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20306) LoadTestTool does not print summary at end of run
[ https://issues.apache.org/jira/browse/HBASE-20306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Garcia updated HBASE-20306: - Attachment: HBASE-20306.002.patch > LoadTestTool does not print summary at end of run > - > > Key: HBASE-20306 > URL: https://issues.apache.org/jira/browse/HBASE-20306 > Project: HBase > Issue Type: Bug > Components: tooling >Reporter: Mike Drob >Assignee: Colin Garcia >Priority: Major > Labels: beginner > Attachments: HBASE-20306.000.patch, HBASE-20306.001.patch, > HBASE-20306.002.patch > > > ltt currently prints status as it goes, but doesn't give a nice summary of > what happened so users have to infer it from the last status line printed. > Would be nice to print a real summary with statistics about what was run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21188) Print heap and gc informations in our junit ResourceChecker
[ https://issues.apache.org/jira/browse/HBASE-21188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614076#comment-16614076 ] Mike Drob commented on HBASE-21188: --- What do "GCCount LEAK?" and "UsedHeapMemoryMB LEAK?" mean in this context? Is that why you were suggesting not to track GC as a resource? Total heap usage detecting a leak also seems unlikely, since we probably are building up lots of structures during the tests that maybe we aren't cleaning up, but also maybe don't need to. > Print heap and gc informations in our junit ResourceChecker > --- > > Key: HBASE-21188 > URL: https://issues.apache.org/jira/browse/HBASE-21188 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21188.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614065#comment-16614065 ] Nihal Jain edited comment on HBASE-21196 at 9/13/18 9:12 PM: - Alternately we can modify [isMetaClearingException|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L54] to return false in case original exception is null. In fact we return false in case of [isConnectionException|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L142] while we return true in case of [isMetaClearingException|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L54]. Which shows we have an inconsistency here itself. was (Author: nihaljain.cs): Alternately we can modify [isMetaClearingException|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L54] to return false in case original exception is null. In fact we return false in case of [isConnectionException|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L142] while we return false in case of [isMetaClearingException|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L54]. Which shows we have an inconsistency here itself. > HTableMultiplexer clears the meta cache after every put operation > - > > Key: HBASE-21196 > URL: https://issues.apache.org/jira/browse/HBASE-21196 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 3.0.0, 1.3.3, 2.2.0 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Critical > Fix For: 3.0.0 > > Attachments: HBASE-21196.master.001.patch, > HTableMultiplexer1000Puts.UT.txt > > > *Problem:* Operations which use > {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, > MultiResponse, int)}} API with tablename set to null reset the meta cache of > the corresponding server after each call. One such operation is put operation > of HTableMultiplexer (Might not be the only one). This may impact the > performance of the system severely as all new ops directed to that server > will have to go to zk first to get the meta table address and then get the > location of the table region as it will become empty after every > htablemultiplexer put. > From the logs below, one can see after every other put the cached region > locations are cleared. As a side effect of this, before every put the server > needs to contact zk and get meta table location and read meta to get region > locations of the table. > {noformat} > 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): > Removed all cached region locations that map to > root1-thinkpad-t440p,35811,1536857446588 > 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] > client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for > root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] > ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" > request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 executing as root1 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 param: region= > testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., > row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { > associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 > totalTime: 0 > 2018-09-13 22:21:15,516 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, > count=0, allocations=1 > 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, > callTime: 2ms > 2018-09-13 22:21:15,516
[jira] [Commented] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614065#comment-16614065 ] Nihal Jain commented on HBASE-21196: Alternately we can modify [isMetaClearingException|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L54] to return false in case original exception is null. In fact we return false in case of [isConnectionException|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L142] while we return false in case of [isMetaClearingException|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L54]. Which shows we have an inconsistency here itself. > HTableMultiplexer clears the meta cache after every put operation > - > > Key: HBASE-21196 > URL: https://issues.apache.org/jira/browse/HBASE-21196 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 3.0.0, 1.3.3, 2.2.0 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Critical > Fix For: 3.0.0 > > Attachments: HBASE-21196.master.001.patch, > HTableMultiplexer1000Puts.UT.txt > > > *Problem:* Operations which use > {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, > MultiResponse, int)}} API with tablename set to null reset the meta cache of > the corresponding server after each call. One such operation is put operation > of HTableMultiplexer (Might not be the only one). This may impact the > performance of the system severely as all new ops directed to that server > will have to go to zk first to get the meta table address and then get the > location of the table region as it will become empty after every > htablemultiplexer put. > From the logs below, one can see after every other put the cached region > locations are cleared. As a side effect of this, before every put the server > needs to contact zk and get meta table location and read meta to get region > locations of the table. > {noformat} > 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): > Removed all cached region locations that map to > root1-thinkpad-t440p,35811,1536857446588 > 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] > client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for > root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] > ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" > request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 executing as root1 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 param: region= > testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., > row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { > associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 > totalTime: 0 > 2018-09-13 22:21:15,516 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, > count=0, allocations=1 > 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, > callTime: 2ms > 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan > table=hbase:meta, > startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99 > 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): > Advancing internal small scanner to startKey at > 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99' > 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up > meta region location in ZK, > connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f > {noformat} > From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see > that the string "Removed all cached region locations that map" and "Looking > up meta region location in ZK" are present for every put. > *Analysis:* > The problem occurs as we call the {{cleanServerCache}} method always clears > the server cache in case tablename is
[jira] [Updated] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal Jain updated HBASE-21196: --- Description: *Problem:* Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int)}} API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of the table region as it will become empty after every htablemultiplexer put. >From the logs below, one can see after every other put the cached region >locations are cleared. As a side effect of this, before every put the server >needs to contact zk and get meta table location and read meta to get region >locations of the table. {noformat} 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): Removed all cached region locations that map to root1-thinkpad-t440p,35811,1536857446588 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 2018-09-13 22:21:15,515 TRACE [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 executing as root1 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 param: region= testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 totalTime: 0 2018-09-13 22:21:15,516 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, count=0, allocations=1 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, callTime: 2ms 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan table=hbase:meta, startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): Advancing internal small scanner to startKey at 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99' 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up meta region location in ZK, connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f {noformat} >From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see that >the string "Removed all cached region locations that map" and "Looking up meta >region location in ZK" are present for every put. *Analysis:* The problem occurs as we call the {{cleanServerCache}} method always clears the server cache in case tablename is null and exception is null. See [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918] {code:java} private void cleanServerCache(ServerName server, Throwable regionException) { if (tableName == null && ClientExceptionsUtil.isMetaClearingException(regionException)) { // For multi-actions, we don't have a table name, but we want to make sure to clear the // cache in case there were location-related exceptions. We don't to clear the cache // for every possible exception that comes through, however. asyncProcess.connection.clearCaches(server); } } {code} The problem is [ClientExceptionsUtil.isMetaClearingException(regionException))|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L51] assumes that the caller should take care of null exception check before calling the method i.e. it will return true if the passed exception is null, which may not be a correct assumption. was: *Problem:* Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int)}} API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the
[jira] [Updated] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal Jain updated HBASE-21196: --- Description: *Problem:* Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int)}} API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of the table region as it will become empty after every htablemultiplexer put. >From the logs below, one can see after every other put the cached region >locations are cleared. As a side effect of this, before every put the server >needs to contact zk and get meta table location and read meta to get region >locations of the table. {noformat} 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): Removed all cached region locations that map to root1-thinkpad-t440p,35811,1536857446588 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 2018-09-13 22:21:15,515 TRACE [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 executing as root1 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 param: region= testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 totalTime: 0 2018-09-13 22:21:15,516 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, count=0, allocations=1 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, callTime: 2ms 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan table=hbase:meta, startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): Advancing internal small scanner to startKey at 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99' 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up meta region location in ZK, connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f {noformat} >From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see that >the string "Removed all cached region locations that map" and "Looking up meta >region location in ZK" are present for every put. *Analysis:* The problem occurs as we call the {{cleanServerCache}} method always clears the server cache in case tablename is null and exception is null. See [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918] {code:java} private void cleanServerCache(ServerName server, Throwable regionException) { if (tableName == null && ClientExceptionsUtil.isMetaClearingException(regionException)) { // For multi-actions, we don't have a table name, but we want to make sure to clear the // cache in case there were location-related exceptions. We don't to clear the cache // for every possible exception that comes through, however. asyncProcess.connection.clearCaches(server); } } {code} The problem is ClientExceptionsUtil.isMetaClearingException(regionException)) assumes that the caller should take care of null exception check before calling the method i.e. it will return true if the passed exception is null, which may not be a correct assumption. was: *Problem:* Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int)}} API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of the table
[jira] [Comment Edited] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614060#comment-16614060 ] Nihal Jain edited comment on HBASE-21196 at 9/13/18 9:02 PM: - Attaching a patch [^HBASE-21196.master.001.patch] with a UT to expose the problem and one of the possible fix i.e. caller validates the exception is not null before calling clear cache method. We need not do this null check in case of {{AsyncRequestFutureImpl.receiveGlobalFailure}} as this method is called only after an exception is caught. was (Author: nihaljain.cs): Attaching a patch [^HBASE-21196.master.001.patch] with a UT to expose the problem and one of the possible fix i.e. called validates the exception is not null before calling clear cache method. > HTableMultiplexer clears the meta cache after every put operation > - > > Key: HBASE-21196 > URL: https://issues.apache.org/jira/browse/HBASE-21196 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 3.0.0, 1.3.3, 2.2.0 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Critical > Fix For: 3.0.0 > > Attachments: HBASE-21196.master.001.patch, > HTableMultiplexer1000Puts.UT.txt > > > *Problem:* Operations which use > {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, > MultiResponse, int)}} API with tablename set to null reset the meta cache of > the corresponding server after each call. One such operation is put operation > of HTableMultiplexer (Might not be the only one). This may impact the > performance of the system severely as all new ops directed to that server > will have to go to zk first to get the meta table address and then get the > location of the table region as it will become empty after every > htablemultiplexer put. > From the logs below, one can see after every other put the cached region > locations are cleared. As a side effect of this, before every put the server > needs to contact zk and get meta table location and read meta to get region > locations of the table. > {noformat} > 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): > Removed all cached region locations that map to > root1-thinkpad-t440p,35811,1536857446588 > 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] > client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for > root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] > ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" > request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 executing as root1 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 param: region= > testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., > row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { > associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 > totalTime: 0 > 2018-09-13 22:21:15,516 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, > count=0, allocations=1 > 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, > callTime: 2ms > 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan > table=hbase:meta, > startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99 > 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): > Advancing internal small scanner to startKey at > 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99' > 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up > meta region location in ZK, > connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f > {noformat} > From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see > that the string "Removed all cached region locations that map" and "Looking > up meta region location in ZK" are present 800+ times for 1000 back to back > puts. > *Analysis:* > The problem occurs as we call the {{cleanServerCache}} method always clears > the server cache in case tablename is null and exception is null. See >
[jira] [Updated] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal Jain updated HBASE-21196: --- Fix Version/s: 3.0.0 Attachment: HBASE-21196.master.001.patch Status: Patch Available (was: Open) Attaching a patch [^HBASE-21196.master.001.patch] with a UT to expose the problem and one of the possible fix i.e. called validates the exception is not null before calling clear cache method. > HTableMultiplexer clears the meta cache after every put operation > - > > Key: HBASE-21196 > URL: https://issues.apache.org/jira/browse/HBASE-21196 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 3.0.0, 1.3.3, 2.2.0 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Critical > Fix For: 3.0.0 > > Attachments: HBASE-21196.master.001.patch, > HTableMultiplexer1000Puts.UT.txt > > > *Problem:* Operations which use > {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, > MultiResponse, int)}} API with tablename set to null reset the meta cache of > the corresponding server after each call. One such operation is put operation > of HTableMultiplexer (Might not be the only one). This may impact the > performance of the system severely as all new ops directed to that server > will have to go to zk first to get the meta table address and then get the > location of the table region as it will become empty after every > htablemultiplexer put. > From the logs below, one can see after every other put the cached region > locations are cleared. As a side effect of this, before every put the server > needs to contact zk and get meta table location and read meta to get region > locations of the table. > {noformat} > 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): > Removed all cached region locations that map to > root1-thinkpad-t440p,35811,1536857446588 > 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] > client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for > root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] > ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" > request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 executing as root1 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 param: region= > testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., > row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { > associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 > totalTime: 0 > 2018-09-13 22:21:15,516 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, > count=0, allocations=1 > 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, > callTime: 2ms > 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan > table=hbase:meta, > startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99 > 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): > Advancing internal small scanner to startKey at > 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99' > 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up > meta region location in ZK, > connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f > {noformat} > From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see > that the string "Removed all cached region locations that map" and "Looking > up meta region location in ZK" are present 800+ times for 1000 back to back > puts. > *Analysis:* > The problem occurs as we call the {{cleanServerCache}} method always clears > the server cache in case tablename is null and exception is null. See > [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918] > {code:java} > private void cleanServerCache(ServerName server, Throwable regionException) { > if (tableName == null && > ClientExceptionsUtil.isMetaClearingException(regionException)) { > // For multi-actions, we don't have a table
[jira] [Updated] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal Jain updated HBASE-21196: --- Description: *Problem:* Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int)}} API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of the table region as it will become empty after every htablemultiplexer put. >From the logs below, one can see after every other put the cached region >locations are cleared. As a side effect of this, before every put the server >needs to contact zk and get meta table location and read meta to get region >locations of the table. {noformat} 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): Removed all cached region locations that map to root1-thinkpad-t440p,35811,1536857446588 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 2018-09-13 22:21:15,515 TRACE [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 executing as root1 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 param: region= testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 totalTime: 0 2018-09-13 22:21:15,516 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, count=0, allocations=1 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, callTime: 2ms 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan table=hbase:meta, startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): Advancing internal small scanner to startKey at 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99' 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up meta region location in ZK, connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f {noformat} >From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see that >the string "Removed all cached region locations that map" and "Looking up meta >region location in ZK" are present 800+ times for 1000 back to back puts. *Analysis:* The problem occurs as we call the {{cleanServerCache}} method always clears the server cache in case tablename is null and exception is null. See [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918] {code:java} private void cleanServerCache(ServerName server, Throwable regionException) { if (tableName == null && ClientExceptionsUtil.isMetaClearingException(regionException)) { // For multi-actions, we don't have a table name, but we want to make sure to clear the // cache in case there were location-related exceptions. We don't to clear the cache // for every possible exception that comes through, however. asyncProcess.connection.clearCaches(server); } } {code} The problem is ClientExceptionsUtil.isMetaClearingException(regionException)) assumes that the caller should take care of null exception check before calling the method i.e. it will return true if the passed exception is null, which may not be a correct assumption. was: Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int)}} API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of
[jira] [Commented] (HBASE-21195) Support Log storage similar to FB LogDevice
[ https://issues.apache.org/jira/browse/HBASE-21195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614053#comment-16614053 ] Mike Drob commented on HBASE-21195: --- I was thinking about this earlier, it would be a great addition. > Support Log storage similar to FB LogDevice > --- > > Key: HBASE-21195 > URL: https://issues.apache.org/jira/browse/HBASE-21195 > Project: HBase > Issue Type: New Feature >Reporter: jagan >Priority: Major > > Log storage, which is write once and sequential data, can be optimized in the > following ways, > 1. Key generated should be incremental. > 2. HFile key index can be range and need not use BloomFilter > 3. Instead of compaction, periodic delete of old files based on TTL can be > supported -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal Jain updated HBASE-21196: --- Description: Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int)}} API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of the table region as it will become empty after every htablemultiplexer put. >From the logs below, one can see after every other put the cached region >locations are cleared. As a side effect of this, before every put the server >needs to contact zk and get meta table location and read meta to get region >locations of the table. {noformat} 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): Removed all cached region locations that map to root1-thinkpad-t440p,35811,1536857446588 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 2018-09-13 22:21:15,515 TRACE [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 executing as root1 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 param: region= testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 totalTime: 0 2018-09-13 22:21:15,516 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, count=0, allocations=1 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, callTime: 2ms 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan table=hbase:meta, startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): Advancing internal small scanner to startKey at 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99' 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up meta region location in ZK, connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f {noformat} >From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see >that the string "Removed all cached region locations that map" and "Looking up >meta region location in ZK" are present 800+ times for 1000 back to back puts. was: Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int)}} API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of the table region as it will become empty after every htablemultiplexer put. >From the logs below, one can see after every other put the cached region >locations are cleared. As a side effect of this, before every put the server >needs to contact zk and get meta table location and read meta to get region >locations of the table. {noformat} 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): Removed all cached region locations that map to root1-thinkpad-t440p,35811,1536857446588 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 2018-09-13 22:21:15,515 TRACE [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338
[jira] [Updated] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal Jain updated HBASE-21196: --- Attachment: HTableMultiplexer1000Puts.UT.txt > HTableMultiplexer clears the meta cache after every put operation > - > > Key: HBASE-21196 > URL: https://issues.apache.org/jira/browse/HBASE-21196 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 3.0.0, 1.3.3, 2.2.0 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Critical > Attachments: HTableMultiplexer1000Puts.UT.txt > > > Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, > ServerName, MultiResponse, int)}} API with tablename set to null reset the > meta cache of the corresponding server after each call. One such operation is > put operation of HTableMultiplexer (Might not be the only one). This may > impact the performance of the system severely as all new ops directed to that > server will have to go to zk first to get the meta table address and then get > the location of the table region as it will become empty after every > htablemultiplexer put. > From the logs below, one can see after every other put the cached region > locations are cleared. As a side effect of this, before every put the server > needs to contact zk and get meta table location and read meta to get region > locations of the table. > {noformat} > 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): > Removed all cached region locations that map to > root1-thinkpad-t440p,35811,1536857446588 > 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] > client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for > root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] > ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" > request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 executing as root1 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 param: region= > testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., > row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { > associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 > totalTime: 0 > 2018-09-13 22:21:15,516 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, > count=0, allocations=1 > 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, > callTime: 2ms > 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan > table=hbase:meta, > startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99 > 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): > Advancing internal small scanner to startKey at > 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99' > 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up > meta region location in ZK, > connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation
Nihal Jain created HBASE-21196: -- Summary: HTableMultiplexer clears the meta cache after every put operation Key: HBASE-21196 URL: https://issues.apache.org/jira/browse/HBASE-21196 Project: HBase Issue Type: Bug Components: Performance Affects Versions: 3.0.0, 1.3.3, 2.2.0 Reporter: Nihal Jain Assignee: Nihal Jain Attachments: HTableMultiplexer1000Puts.UT.txt Operations which use {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, MultiResponse, int)}} API with tablename set to null reset the meta cache of the corresponding server after each call. One such operation is put operation of HTableMultiplexer (Might not be the only one). This may impact the performance of the system severely as all new ops directed to that server will have to go to zk first to get the meta table address and then get the location of the table region as it will become empty after every htablemultiplexer put. >From the logs below, one can see after every other put the cached region >locations are cleared. As a side effect of this, before every put the server >needs to contact zk and get meta table location and read meta to get region >locations of the table. {noformat} 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): Removed all cached region locations that map to root1-thinkpad-t440p,35811,1536857446588 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 2018-09-13 22:21:15,515 TRACE [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 executing as root1 2018-09-13 22:21:15,515 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 137 connection: 127.0.0.1:42338 param: region= testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 totalTime: 0 2018-09-13 22:21:15,516 TRACE [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, count=0, allocations=1 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, callTime: 2ms 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan table=hbase:meta, startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): Advancing internal small scanner to startKey at 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99' 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up meta region location in ZK, connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21177) Add per-table metrics on getTime,putTime and scanTime
[ https://issues.apache.org/jira/browse/HBASE-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613964#comment-16613964 ] Andrew Purtell edited comment on HBASE-21177 at 9/13/18 7:37 PM: - This doesn't seem quite right: {code} this.hashCode = this.tableName.hashCode(); + getHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + GET_REQUEST_TIME, + GET_REQUEST_TIME_DESC); + putHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + PUT_REQUEST_TIME, + PUT_REQUEST_TIME_DESC); + scanHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + SCAN_REQUEST_TIME, + SCAN_REQUEST_TIME_DESC); {code} Shouldn't this code register the metrics using {{tableNamePrefix}} instead of {{"Namespace_default_table_"}}? Also, use EnvironmentEdge#getCurrentTime instead of System#currentTimeMillis was (Author: apurtell): This doesn't seem quite right: {code} this.hashCode = this.tableName.hashCode(); + getHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + GET_REQUEST_TIME, + GET_REQUEST_TIME_DESC); + putHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + PUT_REQUEST_TIME, + PUT_REQUEST_TIME_DESC); + scanHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + SCAN_REQUEST_TIME, + SCAN_REQUEST_TIME_DESC); {code} Shouldn't this code register the metrics using {{tableNamePrefix}} instead of {{"Namespace_default_table_"}}? > Add per-table metrics on getTime,putTime and scanTime > - > > Key: HBASE-21177 > URL: https://issues.apache.org/jira/browse/HBASE-21177 > Project: HBase > Issue Type: Task > Components: metrics >Affects Versions: 2.0.2 >Reporter: xijiawen >Priority: Major > Fix For: HBASE-14850 > > Attachments: HBASE-21177.patch > > > Adds getTime,putTime,scanTime to the per-table mertrics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21177) Add per-table metrics on getTime,putTime and scanTime
[ https://issues.apache.org/jira/browse/HBASE-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613964#comment-16613964 ] Andrew Purtell edited comment on HBASE-21177 at 9/13/18 7:36 PM: - This doesn't seem quite right: {code} this.hashCode = this.tableName.hashCode(); + getHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + GET_REQUEST_TIME, + GET_REQUEST_TIME_DESC); + putHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + PUT_REQUEST_TIME, + PUT_REQUEST_TIME_DESC); + scanHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + SCAN_REQUEST_TIME, + SCAN_REQUEST_TIME_DESC); {code} Shouldn't this code register the metrics using {{tableNamePrefix}} instead of {{"Namespace_default_table_"}}? was (Author: apurtell): This doesn't seem quite right: {code} this.hashCode = this.tableName.hashCode(); + getHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + GET_REQUEST_TIME, + GET_REQUEST_TIME_DESC); + putHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + PUT_REQUEST_TIME, + PUT_REQUEST_TIME_DESC); + scanHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + SCAN_REQUEST_TIME, + SCAN_REQUEST_TIME_DESC); {code} Shouldn't this code register the metrics using {{tableNamePrefix}} instead of {{"Namespace_default_table_" }}? > Add per-table metrics on getTime,putTime and scanTime > - > > Key: HBASE-21177 > URL: https://issues.apache.org/jira/browse/HBASE-21177 > Project: HBase > Issue Type: Task > Components: metrics >Affects Versions: 2.0.2 >Reporter: xijiawen >Priority: Major > Fix For: HBASE-14850 > > Attachments: HBASE-21177.patch > > > Adds getTime,putTime,scanTime to the per-table mertrics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21177) Add per-table metrics on getTime,putTime and scanTime
[ https://issues.apache.org/jira/browse/HBASE-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613964#comment-16613964 ] Andrew Purtell commented on HBASE-21177: This doesn't seem quite right: {code} this.hashCode = this.tableName.hashCode(); + getHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + GET_REQUEST_TIME, + GET_REQUEST_TIME_DESC); + putHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + PUT_REQUEST_TIME, + PUT_REQUEST_TIME_DESC); + scanHis = registry.newHistogram("Namespace_default_table_" + tblName + "_metric_" + SCAN_REQUEST_TIME, + SCAN_REQUEST_TIME_DESC); {code} Shouldn't this code register the metrics using {{tableNamePrefix}} instead of {{"Namespace_default_table_" }}? > Add per-table metrics on getTime,putTime and scanTime > - > > Key: HBASE-21177 > URL: https://issues.apache.org/jira/browse/HBASE-21177 > Project: HBase > Issue Type: Task > Components: metrics >Affects Versions: 2.0.2 >Reporter: xijiawen >Priority: Major > Fix For: HBASE-14850 > > Attachments: HBASE-21177.patch > > > Adds getTime,putTime,scanTime to the per-table mertrics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20306) LoadTestTool does not print summary at end of run
[ https://issues.apache.org/jira/browse/HBASE-20306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613962#comment-16613962 ] Andrew Purtell commented on HBASE-20306: I don't think the changes are quite what was asked for. The request is for post run summary statistics, like the summary reports produced by PerformanceEvaluation via getHistogramReport, for example. The patch here changes the running status logging as well via use of the refactored getOverallRunInformation(), which is not what we want, I think. Just add a detailed statistics dump after the run is complete. > LoadTestTool does not print summary at end of run > - > > Key: HBASE-20306 > URL: https://issues.apache.org/jira/browse/HBASE-20306 > Project: HBase > Issue Type: Bug > Components: tooling >Reporter: Mike Drob >Assignee: Colin Garcia >Priority: Major > Labels: beginner > Attachments: HBASE-20306.000.patch, HBASE-20306.001.patch > > > ltt currently prints status as it goes, but doesn't give a nice summary of > what happened so users have to infer it from the last status line printed. > Would be nice to print a real summary with statistics about what was run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-16458) Shorten backup / restore test execution time
[ https://issues.apache.org/jira/browse/HBASE-16458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613949#comment-16613949 ] Vladimir Rodionov commented on HBASE-16458: --- Yes, with tearDown it took 63 min, w/o - 44 min. What tearDown does is # Cleaning snapshots # stopping hbase and yarn mini clusters I did not spend time on analyzing this. Just an observation. > Shorten backup / restore test execution time > > > Key: HBASE-16458 > URL: https://issues.apache.org/jira/browse/HBASE-16458 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Vladimir Rodionov >Priority: Major > Labels: backup > Attachments: 16458-v1.patch, 16458.HBASE-7912.v3.txt, > 16458.HBASE-7912.v4.txt, 16458.v1.txt, 16458.v2.txt, 16458.v2.txt, > 16458.v3.txt, 16458.v4.txt, 16458.v5.txt, HBASE-16458-v1.patch, > HBASE-16458-v2.patch > > > Below was timing information for all the backup / restore tests (today's > result): > {code} > Running org.apache.hadoop.hbase.backup.TestIncrementalBackup > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 576.273 sec - > in org.apache.hadoop.hbase.backup.TestIncrementalBackup > Running org.apache.hadoop.hbase.backup.TestBackupBoundaryTests > Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 124.67 sec - > in org.apache.hadoop.hbase.backup.TestBackupBoundaryTests > Running org.apache.hadoop.hbase.backup.TestBackupStatusProgress > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 102.34 sec - > in org.apache.hadoop.hbase.backup.TestBackupStatusProgress > Running org.apache.hadoop.hbase.backup.TestBackupAdmin > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 490.251 sec - > in org.apache.hadoop.hbase.backup.TestBackupAdmin > Running org.apache.hadoop.hbase.backup.TestHFileArchiving > Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.323 sec - > in org.apache.hadoop.hbase.backup.TestHFileArchiving > Running org.apache.hadoop.hbase.backup.TestSystemTableSnapshot > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.492 sec - > in org.apache.hadoop.hbase.backup.TestSystemTableSnapshot > Running org.apache.hadoop.hbase.backup.TestBackupDescribe > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.758 sec - > in org.apache.hadoop.hbase.backup.TestBackupDescribe > Running org.apache.hadoop.hbase.backup.TestBackupLogCleaner > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 109.187 sec - > in org.apache.hadoop.hbase.backup.TestBackupLogCleaner > Running org.apache.hadoop.hbase.backup.TestIncrementalBackupNoDataLoss > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 330.539 sec - > in org.apache.hadoop.hbase.backup.TestIncrementalBackupNoDataLoss > Running org.apache.hadoop.hbase.backup.TestRemoteBackup > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.371 sec - > in org.apache.hadoop.hbase.backup.TestRemoteBackup > Running org.apache.hadoop.hbase.backup.TestBackupSystemTable > Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.893 sec - > in org.apache.hadoop.hbase.backup.TestBackupSystemTable > Running org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 120.779 sec - > in org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests > Running org.apache.hadoop.hbase.backup.TestFullBackupSetRestoreSet > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 117.815 sec - > in org.apache.hadoop.hbase.backup.TestFullBackupSetRestoreSet > Running org.apache.hadoop.hbase.backup.TestBackupShowHistory > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 136.517 sec - > in org.apache.hadoop.hbase.backup.TestBackupShowHistory > Running org.apache.hadoop.hbase.backup.TestRemoteRestore > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 91.799 sec - > in org.apache.hadoop.hbase.backup.TestRemoteRestore > Running org.apache.hadoop.hbase.backup.TestFullRestore > Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 317.711 sec > - in org.apache.hadoop.hbase.backup.TestFullRestore > Running org.apache.hadoop.hbase.backup.TestFullBackupSet > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 87.045 sec - > in org.apache.hadoop.hbase.backup.TestFullBackupSet > Running org.apache.hadoop.hbase.backup.TestBackupDelete > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 86.214 sec - > in org.apache.hadoop.hbase.backup.TestBackupDelete > Running org.apache.hadoop.hbase.backup.TestBackupDeleteRestore > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 77.631 sec - > in org.apache.hadoop.hbase.backup.TestBackupDeleteRestore > Running
[jira] [Commented] (HBASE-16458) Shorten backup / restore test execution time
[ https://issues.apache.org/jira/browse/HBASE-16458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613941#comment-16613941 ] Josh Elser commented on HBASE-16458: {quote}each test is executed in a separate JVM instance {quote} Interesting. Didn't realize we had resuseForks disabled in HBase :). Seems like as long as the surefire-plugin is configured this way, your change is fine. {quote}This actually saved almost 30% of overall execution time. {quote} That's.. crazy. Did you dig in to see why this was the case? Curious to know how much of it is in "our" code compared to Hadoop's. > Shorten backup / restore test execution time > > > Key: HBASE-16458 > URL: https://issues.apache.org/jira/browse/HBASE-16458 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Vladimir Rodionov >Priority: Major > Labels: backup > Attachments: 16458-v1.patch, 16458.HBASE-7912.v3.txt, > 16458.HBASE-7912.v4.txt, 16458.v1.txt, 16458.v2.txt, 16458.v2.txt, > 16458.v3.txt, 16458.v4.txt, 16458.v5.txt, HBASE-16458-v1.patch, > HBASE-16458-v2.patch > > > Below was timing information for all the backup / restore tests (today's > result): > {code} > Running org.apache.hadoop.hbase.backup.TestIncrementalBackup > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 576.273 sec - > in org.apache.hadoop.hbase.backup.TestIncrementalBackup > Running org.apache.hadoop.hbase.backup.TestBackupBoundaryTests > Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 124.67 sec - > in org.apache.hadoop.hbase.backup.TestBackupBoundaryTests > Running org.apache.hadoop.hbase.backup.TestBackupStatusProgress > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 102.34 sec - > in org.apache.hadoop.hbase.backup.TestBackupStatusProgress > Running org.apache.hadoop.hbase.backup.TestBackupAdmin > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 490.251 sec - > in org.apache.hadoop.hbase.backup.TestBackupAdmin > Running org.apache.hadoop.hbase.backup.TestHFileArchiving > Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.323 sec - > in org.apache.hadoop.hbase.backup.TestHFileArchiving > Running org.apache.hadoop.hbase.backup.TestSystemTableSnapshot > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.492 sec - > in org.apache.hadoop.hbase.backup.TestSystemTableSnapshot > Running org.apache.hadoop.hbase.backup.TestBackupDescribe > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.758 sec - > in org.apache.hadoop.hbase.backup.TestBackupDescribe > Running org.apache.hadoop.hbase.backup.TestBackupLogCleaner > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 109.187 sec - > in org.apache.hadoop.hbase.backup.TestBackupLogCleaner > Running org.apache.hadoop.hbase.backup.TestIncrementalBackupNoDataLoss > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 330.539 sec - > in org.apache.hadoop.hbase.backup.TestIncrementalBackupNoDataLoss > Running org.apache.hadoop.hbase.backup.TestRemoteBackup > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.371 sec - > in org.apache.hadoop.hbase.backup.TestRemoteBackup > Running org.apache.hadoop.hbase.backup.TestBackupSystemTable > Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.893 sec - > in org.apache.hadoop.hbase.backup.TestBackupSystemTable > Running org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 120.779 sec - > in org.apache.hadoop.hbase.backup.TestRestoreBoundaryTests > Running org.apache.hadoop.hbase.backup.TestFullBackupSetRestoreSet > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 117.815 sec - > in org.apache.hadoop.hbase.backup.TestFullBackupSetRestoreSet > Running org.apache.hadoop.hbase.backup.TestBackupShowHistory > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 136.517 sec - > in org.apache.hadoop.hbase.backup.TestBackupShowHistory > Running org.apache.hadoop.hbase.backup.TestRemoteRestore > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 91.799 sec - > in org.apache.hadoop.hbase.backup.TestRemoteRestore > Running org.apache.hadoop.hbase.backup.TestFullRestore > Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 317.711 sec > - in org.apache.hadoop.hbase.backup.TestFullRestore > Running org.apache.hadoop.hbase.backup.TestFullBackupSet > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 87.045 sec - > in org.apache.hadoop.hbase.backup.TestFullBackupSet > Running org.apache.hadoop.hbase.backup.TestBackupDelete > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 86.214 sec - > in org.apache.hadoop.hbase.backup.TestBackupDelete > Running
[jira] [Commented] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613871#comment-16613871 ] Sean Busbey commented on HBASE-21182: - +1 confirmed things work locally here too. > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0, 2.1.1 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21182.master.001.patch, > HBASE-21182.master.002.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21190) Log files and count of entries in each as we load from the MasterProcWAL store
[ https://issues.apache.org/jira/browse/HBASE-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613747#comment-16613747 ] stack commented on HBASE-21190: --- Thanks for reviews [~allan163] and [~balazs.meszaros] > Log files and count of entries in each as we load from the MasterProcWAL store > -- > > Key: HBASE-21190 > URL: https://issues.apache.org/jira/browse/HBASE-21190 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21190.branch-2.1.001.patch > > > Sometimes this can take a while especially if loads of files. Emit counts of > entries so operator gets sense of scale of procedures being processed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613746#comment-16613746 ] Hadoop QA commented on HBASE-21182: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 9s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 11s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 19s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} hbase-assembly in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 31m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21182 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12939577/HBASE-21182.master.002.patch | | Optional Tests | asflicense javac javadoc unit shadedjars hadoopcheck xml compile | | uname | Linux ec9dcbd1fc99 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 5d14c1af65 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14408/testReport/ | | Max. process+thread count | 87 (vs. ulimit of 1) | | modules | C: hbase-assembly U: hbase-assembly | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14408/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0, 2.1.1 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments:
[jira] [Commented] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613703#comment-16613703 ] Toshihiro Suzuki commented on HBASE-21182: -- Sure [~busbey]. > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0, 2.1.1 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21182.master.001.patch, > HBASE-21182.master.002.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613701#comment-16613701 ] Sean Busbey edited comment on HBASE-21182 at 9/13/18 4:07 PM: -- bq. I just attached the v2 patch. I will commit it tomorrow if no objections. please wait for a sign-off from some committer before pushing. I'm waiting to see what QABot says. (Edit because I quoted the wrong bit.) was (Author: busbey): bq. I will open other Jiras for the nightly test and the documentation. Thanks. please wait for a sign-off from some committer before pushing. I'm waiting to see what QABot says. > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0, 2.1.1 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21182.master.001.patch, > HBASE-21182.master.002.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613701#comment-16613701 ] Sean Busbey commented on HBASE-21182: - bq. I will open other Jiras for the nightly test and the documentation. Thanks. please wait for a sign-off from some committer before pushing. I'm waiting to see what QABot says. > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0, 2.1.1 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21182.master.001.patch, > HBASE-21182.master.002.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613696#comment-16613696 ] Toshihiro Suzuki commented on HBASE-21182: -- [~busbey] Thank you for reviewing. I just attached the v2 patch. I will commit it tomorrow if no objections. {code} I strongly suggest someone add a test for it to nightly and probably add a paragraph to the "Building Apache HBase" section of the ref guide after the advice on how to quickly build a tarball. {code} I will open other Jiras for the nightly test and the documentation. Thanks. > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0, 2.1.1 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21182.master.001.patch, > HBASE-21182.master.002.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613696#comment-16613696 ] Toshihiro Suzuki edited comment on HBASE-21182 at 9/13/18 4:04 PM: --- [~busbey] Thank you for reviewing. I just attached the v2 patch. I will commit it tomorrow if no objections. {quote} I strongly suggest someone add a test for it to nightly and probably add a paragraph to the "Building Apache HBase" section of the ref guide after the advice on how to quickly build a tarball. {quote} I will open other Jiras for the nightly test and the documentation. Thanks. was (Author: brfrn169): [~busbey] Thank you for reviewing. I just attached the v2 patch. I will commit it tomorrow if no objections. {code} I strongly suggest someone add a test for it to nightly and probably add a paragraph to the "Building Apache HBase" section of the ref guide after the advice on how to quickly build a tarball. {code} I will open other Jiras for the nightly test and the documentation. Thanks. > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0, 2.1.1 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21182.master.001.patch, > HBASE-21182.master.002.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-21182: Fix Version/s: 2.1.1 2.2.0 3.0.0 Affects Version/s: 2.1.1 2.2.0 Status: Patch Available (was: Open) > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0, 2.1.1 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-21182.master.001.patch, > HBASE-21182.master.002.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiro Suzuki updated HBASE-21182: - Attachment: HBASE-21182.master.002.patch > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Attachments: HBASE-21182.master.001.patch, > HBASE-21182.master.002.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20615) emphasize use of shaded client jars when they're present in an install
[ https://issues.apache.org/jira/browse/HBASE-20615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613626#comment-16613626 ] Sean Busbey commented on HBASE-20615: - Please open new issues instead of commenting on old ones. > emphasize use of shaded client jars when they're present in an install > -- > > Key: HBASE-20615 > URL: https://issues.apache.org/jira/browse/HBASE-20615 > Project: HBase > Issue Type: Sub-task > Components: build, Client, Usability >Affects Versions: 2.0.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Major > Fix For: 3.0.0, 2.1.0 > > Attachments: HBASE-20615.0.patch, HBASE-20615.1.patch, > HBASE-20615.2.patch > > > Working through setting up an IT for our shaded artifacts in HBASE-20334 > makes our lack of packaging seem like an oversight. While I could work around > by pulling the shaded clients out of whatever build process built the > convenience binary that we're trying to test, it seems v awkward. > After reflecting on it more, it makes more sense to me for there to be a > common place in the install that folks running jobs against the cluster can > rely on. If they need to run without a full hbase install, that should still > work fine via e.g. grabbing from the maven repo. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613586#comment-16613586 ] Sean Busbey commented on HBASE-21182: - {code} - jline,jruby-complete + jline,jruby-complete,hbase-shaded-mapreduce {code} You should exclude all of the shaded client artifacts. {quote} Yes. I ran start-hbase.sh from the source checkout directory after running mvn clean install -DskipTests. I usually do this to test my patch. At least, before HBASE-21153 I was able to do this. You mean it's unexpected? ... I think running bin/start-hbase.sh in the source checkout directory is expected, because bin/hbase obviously expects it as the following: ... {quote} I believe stack uses this same thing, so it's definitely expected. If folks want it to keep working reliably, I strongly suggest someone add a test for it to nightly and probably add a paragraph to the "Building Apache HBase" section of the ref guide after the advice on how to quickly build a tarball. The current implementation is brittle and not covered by any checks for what would break in an actual deployment. Related, maybe it's time we talk about better ways to do "quick" testing of things instead of maintaining this shadow of a normal deployment. Something for dev@; no need to block this fix. > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Attachments: HBASE-21182.master.001.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-21182) Failed to execute start-hbase.sh
[ https://issues.apache.org/jira/browse/HBASE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey reassigned HBASE-21182: --- Assignee: Toshihiro Suzuki > Failed to execute start-hbase.sh > > > Key: HBASE-21182 > URL: https://issues.apache.org/jira/browse/HBASE-21182 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Subrat Mishra >Assignee: Toshihiro Suzuki >Priority: Major > Attachments: HBASE-21182.master.001.patch > > > Built master branch like below: > {code:java} > mvn clean install -DskipTests{code} > Then tried to execute start-hbase.sh failed with NoClassDefFoundError > {code:java} > ./bin/start-hbase.sh > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/org/eclipse/jetty/server/Connector > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.org.eclipse.jetty.server.Connector{code} > Note: It worked after reverting HBASE-21153 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21160) Assertion in TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels is ignored
[ https://issues.apache.org/jira/browse/HBASE-21160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613541#comment-16613541 ] Ted Yu commented on HBASE-21160: If there is no assertion in the try block where Throwable is caught, you don't need to change. > Assertion in > TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels > is ignored > --- > > Key: HBASE-21160 > URL: https://issues.apache.org/jira/browse/HBASE-21160 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: liubangchen >Priority: Trivial > > From > https://builds.apache.org/job/PreCommit-HBASE-Build/14327/artifact/patchprocess/diff-compile-javac-hbase-server.txt > (HBASE-21138 QA run): > {code} > [WARNING] > /testptch/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithDeletes.java:[315,25] > [AssertionFailureIgnored] This assertion throws an AssertionError if it > fails, which will be caught by an enclosing try block. > {code} > Here is related code: > {code} > PrivilegedExceptionAction scanAction = new > PrivilegedExceptionAction() { > @Override > public Void run() throws Exception { > try (Connection connection = > ConnectionFactory.createConnection(conf); > ... > assertEquals(1, next.length); > } catch (Throwable t) { > throw new IOException(t); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21187) The HBase UTs are extremely slow on some jenkins node
[ https://issues.apache.org/jira/browse/HBASE-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613502#comment-16613502 ] Duo Zhang commented on HBASE-21187: --- We can always success on H4, but on other machines we are likely to fail... And there are no big difference between the machines... Strange... > The HBase UTs are extremely slow on some jenkins node > - > > Key: HBASE-21187 > URL: https://issues.apache.org/jira/browse/HBASE-21187 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Duo Zhang >Priority: Major > > Looking at the flaky dashboard for master branch, the top several UTs are > likely to fail at the same time. One of the common things for the failed > flaky tests job is that, the execution time is more than one hour, and the > successful executions are usually only about half an hour. > And I have compared the output for > TestRestoreSnapshotFromClientWithRegionReplicas, for a successful run, the > DisableTableProcedure can finish within one second, and for the failed run, > it can take even more than half a minute. > Not sure what is the real problem, but it seems that for the failed runs, > there are likely time holes in the output, i.e, there is no log output for > several seconds. Like this: > {noformat} > 2018-09-11 21:08:08,152 INFO [PEWorker-4] > procedure2.ProcedureExecutor(1500): Finished pid=490, state=SUCCESS, > hasLock=false; CreateTableProcedure table=testRestoreSnapshotAfterTruncate in > 12.9380sec > 2018-09-11 21:08:15,590 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=1,queue=0,port=33663] > master.MasterRpcServices(1174): Checking to see if procedure is done pid=490 > {noformat} > No log output for about 7 seconds. > And for a successful run, the same place > {noformat} > 2018-09-12 07:47:32,488 INFO [PEWorker-7] > procedure2.ProcedureExecutor(1500): Finished pid=490, state=SUCCESS, > hasLock=false; CreateTableProcedure table=testRestoreSnapshotAfterTruncate in > 1.2220sec > 2018-09-12 07:47:32,881 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=59079] > master.MasterRpcServices(1174): Checking to see if procedure is done pid=490 > {noformat} > There is no such hole. > Maybe there is big GC? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21168) BloomFilterUtil uses hardcoded randomness
[ https://issues.apache.org/jira/browse/HBASE-21168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613487#comment-16613487 ] Hudson commented on HBASE-21168: Results for branch master [build #489 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/489/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > BloomFilterUtil uses hardcoded randomness > - > > Key: HBASE-21168 > URL: https://issues.apache.org/jira/browse/HBASE-21168 > Project: HBase > Issue Type: Task >Affects Versions: 2.0.0 >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-21168.branch-1.002.patch, > HBASE-21168.master.001.patch, HBASE-21168.master.002.patch > > > This was flagged by a Fortify scan and while it doesn't appear to be a real > issue, it's pretty easy to take care of anyway. > The hard coded rand can be moved to the test class that actually needs it to > make the static analysis happy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21189) flaky job should gather machine stats
[ https://issues.apache.org/jira/browse/HBASE-21189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613490#comment-16613490 ] Hudson commented on HBASE-21189: Results for branch master [build #489 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/489/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > flaky job should gather machine stats > - > > Key: HBASE-21189 > URL: https://issues.apache.org/jira/browse/HBASE-21189 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21189.0.patch > > > flaky test should gather all the same environment information as our normal > nightly tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21188) Print heap and gc informations in our junit ResourceChecker
[ https://issues.apache.org/jira/browse/HBASE-21188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613489#comment-16613489 ] Hudson commented on HBASE-21188: Results for branch master [build #489 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/489/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Print heap and gc informations in our junit ResourceChecker > --- > > Key: HBASE-21188 > URL: https://issues.apache.org/jira/browse/HBASE-21188 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21188.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21190) Log files and count of entries in each as we load from the MasterProcWAL store
[ https://issues.apache.org/jira/browse/HBASE-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613488#comment-16613488 ] Hudson commented on HBASE-21190: Results for branch master [build #489 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/489/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/489//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Log files and count of entries in each as we load from the MasterProcWAL store > -- > > Key: HBASE-21190 > URL: https://issues.apache.org/jira/browse/HBASE-21190 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21190.branch-2.1.001.patch > > > Sometimes this can take a while especially if loads of files. Emit counts of > entries so operator gets sense of scale of procedures being processed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21160) Assertion in TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels is ignored
[ https://issues.apache.org/jira/browse/HBASE-21160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613394#comment-16613394 ] liubangchen commented on HBASE-21160: - Hi [~yuzhih...@gmail.com] I found so many re-throws blocks in the file of TestVisibilityLabelsWithDeletes.java . Should we resolve it all? > Assertion in > TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels > is ignored > --- > > Key: HBASE-21160 > URL: https://issues.apache.org/jira/browse/HBASE-21160 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: liubangchen >Priority: Trivial > > From > https://builds.apache.org/job/PreCommit-HBASE-Build/14327/artifact/patchprocess/diff-compile-javac-hbase-server.txt > (HBASE-21138 QA run): > {code} > [WARNING] > /testptch/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithDeletes.java:[315,25] > [AssertionFailureIgnored] This assertion throws an AssertionError if it > fails, which will be caught by an enclosing try block. > {code} > Here is related code: > {code} > PrivilegedExceptionAction scanAction = new > PrivilegedExceptionAction() { > @Override > public Void run() throws Exception { > try (Connection connection = > ConnectionFactory.createConnection(conf); > ... > assertEquals(1, next.length); > } catch (Throwable t) { > throw new IOException(t); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21174) [REST] Failed to parse empty qualifier in TableResource#getScanResource
[ https://issues.apache.org/jira/browse/HBASE-21174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613355#comment-16613355 ] Hudson commented on HBASE-21174: Results for branch branch-1 [build #459 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > [REST] Failed to parse empty qualifier in TableResource#getScanResource > --- > > Key: HBASE-21174 > URL: https://issues.apache.org/jira/browse/HBASE-21174 > Project: HBase > Issue Type: Bug > Components: REST >Affects Versions: 3.0.0, 2.2.0 >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21174.branch-1.001.patch, > HBASE-21174.master.001.patch, HBASE-21174.master.002.patch > > > {code:xml} > GET /t1/*?column=f:c1=f: > {code} > If I want to get the values of 'f:'(empty qualifier) for all rows in the > table by rest server, I will send the above request. However, this request > will return all column values. > {code:java|title=TableResource#getScanResource|borderStyle=solid} > for (String csplit : column) { > String[] familysplit = csplit.trim().split(":"); > if (familysplit.length == 2) { > if (familysplit[1].length() > 0) { > if (LOG.isTraceEnabled()) { > LOG.trace("Scan family and column : " + familysplit[0] + " " + > familysplit[1]); > } > tableScan.addColumn(Bytes.toBytes(familysplit[0]), > Bytes.toBytes(familysplit[1])); > } else { > tableScan.addFamily(Bytes.toBytes(familysplit[0])); > if (LOG.isTraceEnabled()) { > LOG.trace("Scan family : " + familysplit[0] + " and empty > qualifier."); > } > tableScan.addColumn(Bytes.toBytes(familysplit[0]), null); > } > } else if (StringUtils.isNotEmpty(familysplit[0])) { > if (LOG.isTraceEnabled()) { > LOG.trace("Scan family : " + familysplit[0]); > } > tableScan.addFamily(Bytes.toBytes(familysplit[0])); > } > } > {code} > Through the above code, when the column has an empty qualifier, the empty > qualifier cannot be parsed correctly.In other words, 'f:'(empty qualifier) > and 'f' (column family) are considered to have the same meaning, which is > wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21168) BloomFilterUtil uses hardcoded randomness
[ https://issues.apache.org/jira/browse/HBASE-21168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613356#comment-16613356 ] Hudson commented on HBASE-21168: Results for branch branch-1 [build #459 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > BloomFilterUtil uses hardcoded randomness > - > > Key: HBASE-21168 > URL: https://issues.apache.org/jira/browse/HBASE-21168 > Project: HBase > Issue Type: Task >Affects Versions: 2.0.0 >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-21168.branch-1.002.patch, > HBASE-21168.master.001.patch, HBASE-21168.master.002.patch > > > This was flagged by a Fortify scan and while it doesn't appear to be a real > issue, it's pretty easy to take care of anyway. > The hard coded rand can be moved to the test class that actually needs it to > make the static analysis happy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21179) Fix the number of actions in responseTooSlow log
[ https://issues.apache.org/jira/browse/HBASE-21179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613359#comment-16613359 ] Hudson commented on HBASE-21179: Results for branch branch-1 [build #459 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > Fix the number of actions in responseTooSlow log > > > Key: HBASE-21179 > URL: https://issues.apache.org/jira/browse/HBASE-21179 > Project: HBase > Issue Type: Bug > Components: rpc >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21179.branch-1.001.patch, > HBASE-21179.master.001.patch, HBASE-21179.master.002.patch > > > {panel:title=responseTooSlow|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} > 2018-09-10 16:13:53,022 WARN > [B.DefaultRpcServer.handler=209,queue=29,port=60020] ipc.RpcServer: > (responseTooSlow): > {"processingtimems":321262,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":"127.0.0.1:56149","param":"region= > > tsdb,\\x00\\x00.[\\x89\\x1F\\xB0\\x00\\x00\\x01\\x00\\x01Y\\x00\\x00\\x02\\x00\\x00\\x04,1536133210446.7c752de470bd5558a001117b123a5db5., > {color:red}for 1 actions and 1st row{color} > key=\\x00\\x00.[\\x96\\x16p","starttimems":1536566911759,"queuetimems":0,"class":"HRegionServer","responsesize":2,"method":"Multi"} > {panel} > The responseTooSlow log is printed when the processing time of a request > exceeds the specified threshold. The number of actions and the contents of > the first rowkey in the request will be included in the log. > However, the number of actions is inaccurate, and it is actually the number > of regions that the request needs to visit. > Just like the logs above, users may be mistaken for using 321262ms to process > an action, which is incredible, so we need to fix it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21189) flaky job should gather machine stats
[ https://issues.apache.org/jira/browse/HBASE-21189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613358#comment-16613358 ] Hudson commented on HBASE-21189: Results for branch branch-1 [build #459 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > flaky job should gather machine stats > - > > Key: HBASE-21189 > URL: https://issues.apache.org/jira/browse/HBASE-21189 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21189.0.patch > > > flaky test should gather all the same environment information as our normal > nightly tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21190) Log files and count of entries in each as we load from the MasterProcWAL store
[ https://issues.apache.org/jira/browse/HBASE-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613357#comment-16613357 ] Hudson commented on HBASE-21190: Results for branch branch-1 [build #459 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/459//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > Log files and count of entries in each as we load from the MasterProcWAL store > -- > > Key: HBASE-21190 > URL: https://issues.apache.org/jira/browse/HBASE-21190 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21190.branch-2.1.001.patch > > > Sometimes this can take a while especially if loads of files. Emit counts of > entries so operator gets sense of scale of procedures being processed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21189) flaky job should gather machine stats
[ https://issues.apache.org/jira/browse/HBASE-21189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613346#comment-16613346 ] Hudson commented on HBASE-21189: Results for branch branch-2 [build #1242 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1242/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1242//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1242//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1242//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > flaky job should gather machine stats > - > > Key: HBASE-21189 > URL: https://issues.apache.org/jira/browse/HBASE-21189 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21189.0.patch > > > flaky test should gather all the same environment information as our normal > nightly tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21189) flaky job should gather machine stats
[ https://issues.apache.org/jira/browse/HBASE-21189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613338#comment-16613338 ] Hudson commented on HBASE-21189: Results for branch branch-2.1 [build #318 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/318/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/318//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/318//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/318//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > flaky job should gather machine stats > - > > Key: HBASE-21189 > URL: https://issues.apache.org/jira/browse/HBASE-21189 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21189.0.patch > > > flaky test should gather all the same environment information as our normal > nightly tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20941) Create and implement HbckService in master
[ https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613339#comment-16613339 ] Hudson commented on HBASE-20941: Results for branch branch-2.1 [build #318 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/318/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/318//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/318//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/318//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Create and implement HbckService in master > -- > > Key: HBASE-20941 > URL: https://issues.apache.org/jira/browse/HBASE-20941 > Project: HBase > Issue Type: Sub-task >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.2 > > Attachments: hbase-20941.master.001.patch, > hbase-20941.master.002.patch, hbase-20941.master.003.patch, > hbase-20941.master.004.patch, hbase-20941.master.004.patch, > hbase-20941.master.004.patch > > > Create HbckService in master and implement following methods: > # setTableState(): If table state are inconsistent with action/ procedures > working on them, sometimes manipulating their states in meta fix things. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21189) flaky job should gather machine stats
[ https://issues.apache.org/jira/browse/HBASE-21189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613330#comment-16613330 ] Hudson commented on HBASE-21189: Results for branch branch-2.0 [build #808 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/808/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/808//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/808//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/808//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > flaky job should gather machine stats > - > > Key: HBASE-21189 > URL: https://issues.apache.org/jira/browse/HBASE-21189 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21189.0.patch > > > flaky test should gather all the same environment information as our normal > nightly tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21185) WALPrettyPrinter: Additional useful info to be printed by wal printer tool, for debugability purposes
[ https://issues.apache.org/jira/browse/HBASE-21185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613324#comment-16613324 ] Wellington Chevreuil commented on HBASE-21185: -- Thanks [~allan163]. Decided to go with *estimatedSizeOfCell,* as it will already use *heapSize* implementation of the cell internally. > WALPrettyPrinter: Additional useful info to be printed by wal printer tool, > for debugability purposes > - > > Key: HBASE-21185 > URL: https://issues.apache.org/jira/browse/HBASE-21185 > Project: HBase > Issue Type: Improvement >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Trivial > Attachments: HBASE-21185.master.001.patch, > HBASE-21185.master.002.patch > > > *WALPrettyPrinter* is very useful for troubleshooting wal issues, such as > faulty replication sinks. An useful information one might want to track is > the size of a single WAL entry edit, as well as size for each edit cell. Am > proposing a patch that adds calculations for these two, as well an option to > seek straight to a given position on the WAL file being analysed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21185) WALPrettyPrinter: Additional useful info to be printed by wal printer tool, for debugability purposes
[ https://issues.apache.org/jira/browse/HBASE-21185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil updated HBASE-21185: - Attachment: HBASE-21185.master.002.patch > WALPrettyPrinter: Additional useful info to be printed by wal printer tool, > for debugability purposes > - > > Key: HBASE-21185 > URL: https://issues.apache.org/jira/browse/HBASE-21185 > Project: HBase > Issue Type: Improvement >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Trivial > Attachments: HBASE-21185.master.001.patch, > HBASE-21185.master.002.patch > > > *WALPrettyPrinter* is very useful for troubleshooting wal issues, such as > faulty replication sinks. An useful information one might want to track is > the size of a single WAL entry edit, as well as size for each edit cell. Am > proposing a patch that adds calculations for these two, as well an option to > seek straight to a given position on the WAL file being analysed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21179) Fix the number of actions in responseTooSlow log
[ https://issues.apache.org/jira/browse/HBASE-21179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613285#comment-16613285 ] Hudson commented on HBASE-21179: Results for branch branch-1.3 [build #466 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/466/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/466//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/466//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/466//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Fix the number of actions in responseTooSlow log > > > Key: HBASE-21179 > URL: https://issues.apache.org/jira/browse/HBASE-21179 > Project: HBase > Issue Type: Bug > Components: rpc >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21179.branch-1.001.patch, > HBASE-21179.master.001.patch, HBASE-21179.master.002.patch > > > {panel:title=responseTooSlow|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} > 2018-09-10 16:13:53,022 WARN > [B.DefaultRpcServer.handler=209,queue=29,port=60020] ipc.RpcServer: > (responseTooSlow): > {"processingtimems":321262,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":"127.0.0.1:56149","param":"region= > > tsdb,\\x00\\x00.[\\x89\\x1F\\xB0\\x00\\x00\\x01\\x00\\x01Y\\x00\\x00\\x02\\x00\\x00\\x04,1536133210446.7c752de470bd5558a001117b123a5db5., > {color:red}for 1 actions and 1st row{color} > key=\\x00\\x00.[\\x96\\x16p","starttimems":1536566911759,"queuetimems":0,"class":"HRegionServer","responsesize":2,"method":"Multi"} > {panel} > The responseTooSlow log is printed when the processing time of a request > exceeds the specified threshold. The number of actions and the contents of > the first rowkey in the request will be included in the log. > However, the number of actions is inaccurate, and it is actually the number > of regions that the request needs to visit. > Just like the logs above, users may be mistaken for using 321262ms to process > an action, which is incredible, so we need to fix it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21190) Log files and count of entries in each as we load from the MasterProcWAL store
[ https://issues.apache.org/jira/browse/HBASE-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613283#comment-16613283 ] Hudson commented on HBASE-21190: Results for branch branch-1.3 [build #466 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/466/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/466//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/466//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/466//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Log files and count of entries in each as we load from the MasterProcWAL store > -- > > Key: HBASE-21190 > URL: https://issues.apache.org/jira/browse/HBASE-21190 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-21190.branch-2.1.001.patch > > > Sometimes this can take a while especially if loads of files. Emit counts of > entries so operator gets sense of scale of procedures being processed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)