[jira] [Commented] (HBASE-20917) MetaTableMetrics#stop references uninitialized requestsMap for non-meta region

2018-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585405#comment-16585405
 ] 

Hudson commented on HBASE-20917:


Results for branch branch-2
[build #1135 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1135/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1135//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1135//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1135//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> MetaTableMetrics#stop references uninitialized requestsMap for non-meta region
> --
>
> Key: HBASE-20917
> URL: https://issues.apache.org/jira/browse/HBASE-20917
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 1.5.0, 1.4.6, 2.2.0
>
> Attachments: 20917.addendum, 20917.v1.txt, 20917.v2.txt
>
>
> I noticed the following in test output:
> {code}
> 2018-07-21 15:54:43,181 ERROR [RS_CLOSE_REGION-regionserver/172.17.5.4:0-1] 
> executor.EventHandler(186): Caught throwable while processing event 
> M_RS_CLOSE_REGION
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics.stop(MetaTableMetrics.java:329)
>   at 
> org.apache.hadoop.hbase.coprocessor.BaseEnvironment.shutdown(BaseEnvironment.java:91)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionEnvironment.shutdown(RegionCoprocessorHost.java:165)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.shutdown(CoprocessorHost.java:290)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.postEnvCall(RegionCoprocessorHost.java:559)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:622)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postClose(RegionCoprocessorHost.java:551)
>   at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1678)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1484)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
>   at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> {code}
> {{requestsMap}} is only initialized for the meta region.
> However, check for meta region is absent in the stop method:
> {code}
>   public void stop(CoprocessorEnvironment e) throws IOException {
> // since meta region can move around, clear stale metrics when stop.
> for (String meterName : requestsMap.keySet()) {
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21074) JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to maven when building

2018-08-19 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585377#comment-16585377
 ] 

Sean Busbey commented on HBASE-21074:
-

make_rc.sh already got fixed for this in HBASE-20931

> JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to maven when building
> -
>
> Key: HBASE-21074
> URL: https://issues.apache.org/jira/browse/HBASE-21074
> Project: HBase
>  Issue Type: Bug
>  Components: build, community, test
>Affects Versions: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Maven central now requires TLSv1.2 and by default JDK7 doesn't use it. So 
> anyone building from a clean repo will fail like our nightly check of 
> building the convenience binary from the source tarball e.g. 1.4
> {code}
> [INFO] Scanning for projects...
> [INFO] Downloading from apache release: 
> https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
> [INFO] Downloaded from apache release: 
> https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
>  (16 kB at 14 kB/s)
> [INFO] Downloading from Nexus: 
> http://repository.apache.org/snapshots/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
> [INFO] Downloading from central: 
> https://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
> [ERROR] [ERROR] Some problems were encountered while processing the POMs:
> [ERROR] Unresolveable build extension: Plugin 
> org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could 
> not be resolved: Failed to read artifact descriptor for 
> org.apache.felix:maven-bundle-plugin:jar:2.5.3 @ 
>  @ 
> [ERROR] The build could not read 1 project -> [Help 1]
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase:1.4.7-SNAPSHOT 
> (/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-1.4-EDDBHIHAYHZVAGB2FQL37O5LZNSEJJEXGP55DEGOA4FQKBLNWBAQ/unpacked_src_tarball/pom.xml)
>  has 1 error
> [ERROR] Unresolveable build extension: Plugin 
> org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could 
> not be resolved: Failed to read artifact descriptor for 
> org.apache.felix:maven-bundle-plugin:jar:2.5.3: Could not transfer artifact 
> org.apache.felix:maven-bundle-plugin:pom:2.5.3 from/to central 
> (https://repo.maven.apache.org/maven2): Received fatal alert: 
> protocol_version -> [Help 2]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginManagerException
> {code}
> if we pass "-Dhttps.protocols=TLSv1.2" to maven then it should work for any 
> JDK7 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-08-19 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585376#comment-16585376
 ] 

Xu Cang commented on HBASE-20690:
-

Correct. TableStateNotFound can be caused by TableNotFound (No entry for such 
table in Meta table). So this possibility is covered by the procedure itself.

Based on a comment  in #migrateZooKeeper

{{"// This can happen; table exists but no TableState." }}

Table state missing is somewhat acceptable. (?) 

>From this comment in #fixTableStates

 

{{LOG.warn(tableName + " has no table state in hbase:meta, assuming ENABLED");}}

 

We assume table is enabled when no table state found. This aligns with the 
decision we make in #moveTables. (Only ignore moving table when it's disabled. 
When enabled or unknown, we keep moving it.)

 

 

> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-21074) JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to maven when building

2018-08-19 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-21074 started by Sean Busbey.
---
> JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to maven when building
> -
>
> Key: HBASE-21074
> URL: https://issues.apache.org/jira/browse/HBASE-21074
> Project: HBase
>  Issue Type: Bug
>  Components: build, community, test
>Affects Versions: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Maven central now requires TLSv1.2 and by default JDK7 doesn't use it. So 
> anyone building from a clean repo will fail like our nightly check of 
> building the convenience binary from the source tarball e.g. 1.4
> {code}
> [INFO] Scanning for projects...
> [INFO] Downloading from apache release: 
> https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
> [INFO] Downloaded from apache release: 
> https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
>  (16 kB at 14 kB/s)
> [INFO] Downloading from Nexus: 
> http://repository.apache.org/snapshots/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
> [INFO] Downloading from central: 
> https://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
> [ERROR] [ERROR] Some problems were encountered while processing the POMs:
> [ERROR] Unresolveable build extension: Plugin 
> org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could 
> not be resolved: Failed to read artifact descriptor for 
> org.apache.felix:maven-bundle-plugin:jar:2.5.3 @ 
>  @ 
> [ERROR] The build could not read 1 project -> [Help 1]
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase:1.4.7-SNAPSHOT 
> (/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-1.4-EDDBHIHAYHZVAGB2FQL37O5LZNSEJJEXGP55DEGOA4FQKBLNWBAQ/unpacked_src_tarball/pom.xml)
>  has 1 error
> [ERROR] Unresolveable build extension: Plugin 
> org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could 
> not be resolved: Failed to read artifact descriptor for 
> org.apache.felix:maven-bundle-plugin:jar:2.5.3: Could not transfer artifact 
> org.apache.felix:maven-bundle-plugin:pom:2.5.3 from/to central 
> (https://repo.maven.apache.org/maven2): Received fatal alert: 
> protocol_version -> [Help 2]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginManagerException
> {code}
> if we pass "-Dhttps.protocols=TLSv1.2" to maven then it should work for any 
> JDK7 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21074) JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to maven when building

2018-08-19 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585369#comment-16585369
 ] 

Sean Busbey commented on HBASE-21074:
-

places we should update:

* ref guide section on building with a note about older jdk7
* make_rc.sh
* hbase_nightly_source-artifact.sh
* hbase-personality.sh

The current Yetus based JDK7 builds all seem to be fine. I think it's because 
they're relying on docker and thus get a more recent JDK1.7 than is on my local 
laptop or on the ASF build hosts (the source artifact test relies on asf 
jenkins for java and maven rather than using docker).

> JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to maven when building
> -
>
> Key: HBASE-21074
> URL: https://issues.apache.org/jira/browse/HBASE-21074
> Project: HBase
>  Issue Type: Bug
>  Components: build, community, test
>Affects Versions: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>
> Maven central now requires TLSv1.2 and by default JDK7 doesn't use it. So 
> anyone building from a clean repo will fail like our nightly check of 
> building the convenience binary from the source tarball e.g. 1.4
> {code}
> [INFO] Scanning for projects...
> [INFO] Downloading from apache release: 
> https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
> [INFO] Downloaded from apache release: 
> https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
>  (16 kB at 14 kB/s)
> [INFO] Downloading from Nexus: 
> http://repository.apache.org/snapshots/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
> [INFO] Downloading from central: 
> https://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
> [ERROR] [ERROR] Some problems were encountered while processing the POMs:
> [ERROR] Unresolveable build extension: Plugin 
> org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could 
> not be resolved: Failed to read artifact descriptor for 
> org.apache.felix:maven-bundle-plugin:jar:2.5.3 @ 
>  @ 
> [ERROR] The build could not read 1 project -> [Help 1]
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase:1.4.7-SNAPSHOT 
> (/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-1.4-EDDBHIHAYHZVAGB2FQL37O5LZNSEJJEXGP55DEGOA4FQKBLNWBAQ/unpacked_src_tarball/pom.xml)
>  has 1 error
> [ERROR] Unresolveable build extension: Plugin 
> org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could 
> not be resolved: Failed to read artifact descriptor for 
> org.apache.felix:maven-bundle-plugin:jar:2.5.3: Could not transfer artifact 
> org.apache.felix:maven-bundle-plugin:pom:2.5.3 from/to central 
> (https://repo.maven.apache.org/maven2): Received fatal alert: 
> protocol_version -> [Help 2]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginManagerException
> {code}
> if we pass "-Dhttps.protocols=TLSv1.2" to maven then it should work for any 
> JDK7 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21074) JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to maven when building

2018-08-19 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21074:

Priority: Major  (was: Critical)

> JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to maven when building
> -
>
> Key: HBASE-21074
> URL: https://issues.apache.org/jira/browse/HBASE-21074
> Project: HBase
>  Issue Type: Bug
>  Components: build, community, test
>Affects Versions: 1.5.0, 1.2.7, 1.3.3, 1.4.7
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
>
> Maven central now requires TLSv1.2 and by default JDK7 doesn't use it. So 
> anyone building from a clean repo will fail like our nightly check of 
> building the convenience binary from the source tarball e.g. 1.4
> {code}
> [INFO] Scanning for projects...
> [INFO] Downloading from apache release: 
> https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
> [INFO] Downloaded from apache release: 
> https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
>  (16 kB at 14 kB/s)
> [INFO] Downloading from Nexus: 
> http://repository.apache.org/snapshots/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
> [INFO] Downloading from central: 
> https://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
> [ERROR] [ERROR] Some problems were encountered while processing the POMs:
> [ERROR] Unresolveable build extension: Plugin 
> org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could 
> not be resolved: Failed to read artifact descriptor for 
> org.apache.felix:maven-bundle-plugin:jar:2.5.3 @ 
>  @ 
> [ERROR] The build could not read 1 project -> [Help 1]
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase:1.4.7-SNAPSHOT 
> (/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-1.4-EDDBHIHAYHZVAGB2FQL37O5LZNSEJJEXGP55DEGOA4FQKBLNWBAQ/unpacked_src_tarball/pom.xml)
>  has 1 error
> [ERROR] Unresolveable build extension: Plugin 
> org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could 
> not be resolved: Failed to read artifact descriptor for 
> org.apache.felix:maven-bundle-plugin:jar:2.5.3: Could not transfer artifact 
> org.apache.felix:maven-bundle-plugin:pom:2.5.3 from/to central 
> (https://repo.maven.apache.org/maven2): Received fatal alert: 
> protocol_version -> [Help 2]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginManagerException
> {code}
> if we pass "-Dhttps.protocols=TLSv1.2" to maven then it should work for any 
> JDK7 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20387) flaky infrastructure should work for all branches

2018-08-19 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585367#comment-16585367
 ] 

Duo Zhang commented on HBASE-20387:
---

OK, great. Thanks.

> flaky infrastructure should work for all branches
> -
>
> Key: HBASE-20387
> URL: https://issues.apache.org/jira/browse/HBASE-20387
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Fix For: 3.0.0, 1.5.0, 1.2.7, 1.3.3, 2.0.2, 2.2.0, 2.1.1, 1.4.7
>
> Attachments: HBASE-20387.0.patch, HBASE-20387.1.patch
>
>
> We need a flaky list per-branch, since what does/does not work reliably on 
> master isn't really relevant to our older maintenance release lines.
> We should just make the invocation a step in the current per-branch nightly 
> jobs, prior to when we need the list in the stages that run unit tests. We 
> can publish it in the nightly job as well so that precommit can still get it. 
> (and can fetch it per-branch if needed)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21074) JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to maven when building

2018-08-19 Thread Sean Busbey (JIRA)
Sean Busbey created HBASE-21074:
---

 Summary: JDK7 branches need to pass "-Dhttps.protocols=TLSv1.2" to 
maven when building
 Key: HBASE-21074
 URL: https://issues.apache.org/jira/browse/HBASE-21074
 Project: HBase
  Issue Type: Bug
  Components: build, community, test
Affects Versions: 1.5.0, 1.2.7, 1.3.3, 1.4.7
Reporter: Sean Busbey
Assignee: Sean Busbey


Maven central now requires TLSv1.2 and by default JDK7 doesn't use it. So 
anyone building from a clean repo will fail like our nightly check of building 
the convenience binary from the source tarball e.g. 1.4

{code}
[INFO] Scanning for projects...
[INFO] Downloading from apache release: 
https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
[INFO] Downloaded from apache release: 
https://repository.apache.org/content/repositories/releases/org/apache/apache/18/apache-18.pom
 (16 kB at 14 kB/s)
[INFO] Downloading from Nexus: 
http://repository.apache.org/snapshots/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
[INFO] Downloading from central: 
https://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.5.3/maven-bundle-plugin-2.5.3.pom
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[ERROR] Unresolveable build extension: Plugin 
org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could not 
be resolved: Failed to read artifact descriptor for 
org.apache.felix:maven-bundle-plugin:jar:2.5.3 @ 
 @ 
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]   
[ERROR]   The project org.apache.hbase:hbase:1.4.7-SNAPSHOT 
(/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-1.4-EDDBHIHAYHZVAGB2FQL37O5LZNSEJJEXGP55DEGOA4FQKBLNWBAQ/unpacked_src_tarball/pom.xml)
 has 1 error
[ERROR] Unresolveable build extension: Plugin 
org.apache.felix:maven-bundle-plugin:2.5.3 or one of its dependencies could not 
be resolved: Failed to read artifact descriptor for 
org.apache.felix:maven-bundle-plugin:jar:2.5.3: Could not transfer artifact 
org.apache.felix:maven-bundle-plugin:pom:2.5.3 from/to central 
(https://repo.maven.apache.org/maven2): Received fatal alert: protocol_version 
-> [Help 2]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginManagerException
{code}

if we pass "-Dhttps.protocols=TLSv1.2" to maven then it should work for any 
JDK7 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-08-19 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585361#comment-16585361
 ] 

Ted Yu commented on HBASE-20690:


In TableStateManager, I only found one place where TableStateNotFoundException 
is caught (note: not TableNotFoundException).
That is in {{migrateZooKeeper}} which is not related to the methods shown in 
the stack trace in the description.


> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18512) Region Server will abort with IllegalStateException if HDFS umask has limited scope

2018-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585357#comment-16585357
 ] 

Hudson commented on HBASE-18512:


FAILURE: Integrated in Jenkins build HBase-1.3-IT #455 (See 
[https://builds.apache.org/job/HBase-1.3-IT/455/])
HBASE-18512, Region Server will abort with IllegalStateException if HDFS 
(busbey: rev d1b0c322373e27f10ac104ae2c2fd2645444929f)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java


> Region Server will abort with IllegalStateException if HDFS umask has limited 
> scope
> ---
>
> Key: HBASE-18512
> URL: https://issues.apache.org/jira/browse/HBASE-18512
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.4.0
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.4.0, 1.2.7, 1.3.3
>
> Attachments: HBASE-18512-branch-1.patch
>
>
> If HDFS umask (fs.permissions.umask-mode) has limited scope say 077 then 
> file/dir permission will not be wider than 700. HDFS client has to set 
> permission explicitly if required.
> During SecureBulkLoadEndpoint CP start, RegionServer creates (if not exist) 
> the staging directory with the specified permission and later throws 
> IllegalStateException if staging directory permission is not set to 711.
> After HBASE-17861, we are setting staging dir permission explicitly only when 
> it exist. In case of fresh cluster startup staging dir permission wont be 711 
> when umask defined as 077 which cause RS to abort.
> {noformat}
> 2017-07-30 14:26:33,350 | ERROR | 
> B.defaultRpcServer.handler=12,queue=2,port=21300 | Region server 
> HOSTNAME,PORT,X reported a fatal error:
> ABORTING region server HOSTNAME,PORT,X: The coprocessor 
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint threw 
> java.lang.IllegalStateException: Staging directory of 
> /user/HBase/hbase-staging already exists but permissions aren't set to 
> '-rwx--x--x' 
> Cause:
> java.lang.IllegalStateException: Staging directory of 
> /user/root/hbase-staging already exists but permissions aren't set to 
> '-rwx--x--x' 
> {noformat}
> We should set permission explicitly to 711 after staging directory creation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17861) Regionserver down when checking the permission of staging dir if hbase.rootdir is on S3

2018-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585356#comment-16585356
 ] 

Hudson commented on HBASE-17861:


FAILURE: Integrated in Jenkins build HBase-1.3-IT #455 (See 
[https://builds.apache.org/job/HBase-1.3-IT/455/])
HBASE-17861: Regionserver down when checking the permission of staging (busbey: 
rev 7d7279d19d534f660eaa8dd9dec63fe8b547fb89)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java


> Regionserver down when checking the permission of staging dir if 
> hbase.rootdir is on S3
> ---
>
> Key: HBASE-17861
> URL: https://issues.apache.org/jira/browse/HBASE-17861
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Major
>  Labels: filesystem, s3, wal
> Fix For: 1.4.0, 1.2.7, 1.3.3
>
> Attachments: HBASE-17861-V1.patch, HBASE-17861.branch-1.V1.patch, 
> HBASE-17861.branch-1.V2.patch, HBASE-17861.branch-1.V3.patch, 
> HBASE-17861.branch-1.V4.patch
>
>
> Found some issue, when set up HBASE-17437: Support specifying a WAL directory 
> outside of the root directory.
> The region server are  showdown when I add following config into 
> hbase-site.xml 
> hbase.rootdir =  s3a://xx//xx
> hbase.wal.dir = hdfs://xx/xx
> hbase.coprocessor.region.classes = 
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint
> Error is below
> {noformat}
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint threw 
> java.lang.IllegalStateException: Directory already exists but permissions 
> aren't set to '-rwx--x--x'
> {noformat}
> The reason is that, when hbase enable securebulkload, hbase will create a 
> folder in s3, it can not set above permission, because in s3, all files are 
> listed as having full read/write permissions and all directories appear to 
> have full rwx permissions. See Object stores have differerent authorization 
> models in 
> https://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/index.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18512) Region Server will abort with IllegalStateException if HDFS umask has limited scope

2018-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585346#comment-16585346
 ] 

Hudson commented on HBASE-18512:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #1148 (See 
[https://builds.apache.org/job/HBase-1.2-IT/1148/])
HBASE-18512, Region Server will abort with IllegalStateException if HDFS 
(busbey: rev 22d2c72c0bc2c8566550627924aa6a68e890de1d)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java


> Region Server will abort with IllegalStateException if HDFS umask has limited 
> scope
> ---
>
> Key: HBASE-18512
> URL: https://issues.apache.org/jira/browse/HBASE-18512
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.4.0
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.4.0, 1.2.7, 1.3.3
>
> Attachments: HBASE-18512-branch-1.patch
>
>
> If HDFS umask (fs.permissions.umask-mode) has limited scope say 077 then 
> file/dir permission will not be wider than 700. HDFS client has to set 
> permission explicitly if required.
> During SecureBulkLoadEndpoint CP start, RegionServer creates (if not exist) 
> the staging directory with the specified permission and later throws 
> IllegalStateException if staging directory permission is not set to 711.
> After HBASE-17861, we are setting staging dir permission explicitly only when 
> it exist. In case of fresh cluster startup staging dir permission wont be 711 
> when umask defined as 077 which cause RS to abort.
> {noformat}
> 2017-07-30 14:26:33,350 | ERROR | 
> B.defaultRpcServer.handler=12,queue=2,port=21300 | Region server 
> HOSTNAME,PORT,X reported a fatal error:
> ABORTING region server HOSTNAME,PORT,X: The coprocessor 
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint threw 
> java.lang.IllegalStateException: Staging directory of 
> /user/HBase/hbase-staging already exists but permissions aren't set to 
> '-rwx--x--x' 
> Cause:
> java.lang.IllegalStateException: Staging directory of 
> /user/root/hbase-staging already exists but permissions aren't set to 
> '-rwx--x--x' 
> {noformat}
> We should set permission explicitly to 711 after staging directory creation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17861) Regionserver down when checking the permission of staging dir if hbase.rootdir is on S3

2018-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585345#comment-16585345
 ] 

Hudson commented on HBASE-17861:


SUCCESS: Integrated in Jenkins build HBase-1.2-IT #1148 (See 
[https://builds.apache.org/job/HBase-1.2-IT/1148/])
HBASE-17861: Regionserver down when checking the permission of staging (busbey: 
rev 0466474d845413b4145e4602028ca49bf20130a4)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java


> Regionserver down when checking the permission of staging dir if 
> hbase.rootdir is on S3
> ---
>
> Key: HBASE-17861
> URL: https://issues.apache.org/jira/browse/HBASE-17861
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Major
>  Labels: filesystem, s3, wal
> Fix For: 1.4.0, 1.2.7, 1.3.3
>
> Attachments: HBASE-17861-V1.patch, HBASE-17861.branch-1.V1.patch, 
> HBASE-17861.branch-1.V2.patch, HBASE-17861.branch-1.V3.patch, 
> HBASE-17861.branch-1.V4.patch
>
>
> Found some issue, when set up HBASE-17437: Support specifying a WAL directory 
> outside of the root directory.
> The region server are  showdown when I add following config into 
> hbase-site.xml 
> hbase.rootdir =  s3a://xx//xx
> hbase.wal.dir = hdfs://xx/xx
> hbase.coprocessor.region.classes = 
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint
> Error is below
> {noformat}
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint threw 
> java.lang.IllegalStateException: Directory already exists but permissions 
> aren't set to '-rwx--x--x'
> {noformat}
> The reason is that, when hbase enable securebulkload, hbase will create a 
> folder in s3, it can not set above permission, because in s3, all files are 
> listed as having full read/write permissions and all directories appear to 
> have full rwx permissions. See Object stores have differerent authorization 
> models in 
> https://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/index.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20387) flaky infrastructure should work for all branches

2018-08-19 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585342#comment-16585342
 ] 

Sean Busbey commented on HBASE-20387:
-

the dashboard reports all of the runs that it sees for a given test. as I 
mentioned before, there being 5 runs from the flaky job was just a matter of 
timing; the new flaky job had only run 5 times when the report you linked to 
was generated.

looking at the current run:

https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/master/4/artifact/dashboard.html

you can see the flaky section has up to 40 runs for some of the tests, 
including 3 that only have a single failure in those runs.

> flaky infrastructure should work for all branches
> -
>
> Key: HBASE-20387
> URL: https://issues.apache.org/jira/browse/HBASE-20387
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Fix For: 3.0.0, 1.5.0, 1.2.7, 1.3.3, 2.0.2, 2.2.0, 2.1.1, 1.4.7
>
> Attachments: HBASE-20387.0.patch, HBASE-20387.1.patch
>
>
> We need a flaky list per-branch, since what does/does not work reliably on 
> master isn't really relevant to our older maintenance release lines.
> We should just make the invocation a step in the current per-branch nightly 
> jobs, prior to when we need the list in the stages that run unit tests. We 
> can publish it in the nightly job as well so that precommit can still get it. 
> (and can fetch it per-branch if needed)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21060) fix dead store in SecureBulkLoadEndpoint

2018-08-19 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21060:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

precommit runs look fine for this change AFAICT. pushed backports to branch-1.3 
and branch-1.2

> fix dead store in SecureBulkLoadEndpoint
> 
>
> Key: HBASE-21060
> URL: https://issues.apache.org/jira/browse/HBASE-21060
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Affects Versions: 1.2.7, 1.3.3
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Minor
> Fix For: 1.2.7, 1.3.3
>
> Attachments: HBASE-21060-branch-1.2.v0.patch, 
> HBASE-21060-branch-1.3.v0.patch
>
>
> Dead store to fsSet in 
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint.start(CoprocessorEnvironment)
>  At 
> SecureBulkLoadEndpoint.java:org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint.start(CoprocessorEnvironment)
>  At SecureBulkLoadEndpoint.java:[line 145]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-17861) Regionserver down when checking the permission of staging dir if hbase.rootdir is on S3

2018-08-19 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-17861:

Fix Version/s: 1.3.3
   1.2.7

> Regionserver down when checking the permission of staging dir if 
> hbase.rootdir is on S3
> ---
>
> Key: HBASE-17861
> URL: https://issues.apache.org/jira/browse/HBASE-17861
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Major
>  Labels: filesystem, s3, wal
> Fix For: 1.4.0, 1.2.7, 1.3.3
>
> Attachments: HBASE-17861-V1.patch, HBASE-17861.branch-1.V1.patch, 
> HBASE-17861.branch-1.V2.patch, HBASE-17861.branch-1.V3.patch, 
> HBASE-17861.branch-1.V4.patch
>
>
> Found some issue, when set up HBASE-17437: Support specifying a WAL directory 
> outside of the root directory.
> The region server are  showdown when I add following config into 
> hbase-site.xml 
> hbase.rootdir =  s3a://xx//xx
> hbase.wal.dir = hdfs://xx/xx
> hbase.coprocessor.region.classes = 
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint
> Error is below
> {noformat}
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint threw 
> java.lang.IllegalStateException: Directory already exists but permissions 
> aren't set to '-rwx--x--x'
> {noformat}
> The reason is that, when hbase enable securebulkload, hbase will create a 
> folder in s3, it can not set above permission, because in s3, all files are 
> listed as having full read/write permissions and all directories appear to 
> have full rwx permissions. See Object stores have differerent authorization 
> models in 
> https://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/index.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18512) Region Server will abort with IllegalStateException if HDFS umask has limited scope

2018-08-19 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-18512:

Fix Version/s: 1.3.3
   1.2.7

> Region Server will abort with IllegalStateException if HDFS umask has limited 
> scope
> ---
>
> Key: HBASE-18512
> URL: https://issues.apache.org/jira/browse/HBASE-18512
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, security
>Affects Versions: 1.4.0
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.4.0, 1.2.7, 1.3.3
>
> Attachments: HBASE-18512-branch-1.patch
>
>
> If HDFS umask (fs.permissions.umask-mode) has limited scope say 077 then 
> file/dir permission will not be wider than 700. HDFS client has to set 
> permission explicitly if required.
> During SecureBulkLoadEndpoint CP start, RegionServer creates (if not exist) 
> the staging directory with the specified permission and later throws 
> IllegalStateException if staging directory permission is not set to 711.
> After HBASE-17861, we are setting staging dir permission explicitly only when 
> it exist. In case of fresh cluster startup staging dir permission wont be 711 
> when umask defined as 077 which cause RS to abort.
> {noformat}
> 2017-07-30 14:26:33,350 | ERROR | 
> B.defaultRpcServer.handler=12,queue=2,port=21300 | Region server 
> HOSTNAME,PORT,X reported a fatal error:
> ABORTING region server HOSTNAME,PORT,X: The coprocessor 
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint threw 
> java.lang.IllegalStateException: Staging directory of 
> /user/HBase/hbase-staging already exists but permissions aren't set to 
> '-rwx--x--x' 
> Cause:
> java.lang.IllegalStateException: Staging directory of 
> /user/root/hbase-staging already exists but permissions aren't set to 
> '-rwx--x--x' 
> {noformat}
> We should set permission explicitly to 711 after staging directory creation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21025) Add cache for TableStateManager

2018-08-19 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585333#comment-16585333
 ] 

Duo Zhang commented on HBASE-21025:
---

I think the code you mentioned is for initialization? If there are exceptions 
then the master will crash, so we do not need to restore any state. And we can 
also make sure that there is no entry in cache for the given table yet so even 
we do not crash, we do not need to clear the cache.

> Add cache for TableStateManager
> ---
>
> Key: HBASE-21025
> URL: https://issues.apache.org/jira/browse/HBASE-21025
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-21025-addendum.patch, HBASE-21025-v1.patch, 
> HBASE-21025-v2.patch, HBASE-21025.patch
>
>
> After HBASE-20881, we will check whether a table is disabled in SCP, so we 
> need to add cache for it to improve MTTR, and also reduce the request to meta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-08-19 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583592#comment-16583592
 ] 

Xu Cang edited comment on HBASE-20690 at 8/20/18 1:10 AM:
--

created Jira HBASE-21066 to fix #isTableState

-I don't think it's good for- caller -to guess after receives "false" as 
result.-


was (Author: xucang):
created Jira HBASE-21066 to fix #isTableState

I don't think it's good for caller to guess after receives "false" as result.

> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-08-19 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585320#comment-16585320
 ] 

Xu Cang commented on HBASE-20690:
-

#moveTables calls #move which creates  _MoveRegionProcedure_ which _calls 
#_preflightChecks to check if the table is enabled. If not, it throws 
TableNotFoundException too.

So, the current solution is fine. No need to fix anything.  

This current approach also gives another chance to retrieve table state from 
MetaData later which might succeed. 

[~yuzhih...@gmail.com]

So I suggest marking this one "won't fix" since the logic is fine here. 

 

 

> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21025) Add cache for TableStateManager

2018-08-19 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585305#comment-16585305
 ] 

Xu Cang commented on HBASE-21025:
-

{quote}You can see the code in setDeletedTable, we will clear the cache in a 
finally block, no matter whether the meta deletion is succeeded. I also did the 
same thing when update meta, if we fail then clear the cache. This is 
important, as we do not know if we have successfully updated meta or not when 
there is an exception, so the safe way is to clear the cache, itherwise there 
maybe inconsistency. And next time we will read it directly from meta.
{quote}
Isn't this contradicts with this code logic?

{{if (tableState == null) {}}
{{  LOG.warn(tableName + " has no table state in hbase:meta, assuming 
ENABLED");}}
{{  MetaTableAccessor.updateTableState(connection, tableName, 
TableState.State.ENABLED);}}
{{  fixTableState(new TableState(tableName, TableState.State.ENABLED));}}
{{  tableName2State.put(tableName, TableState.State.ENABLED);}}
{{}}}

[~Apache9]

> Add cache for TableStateManager
> ---
>
> Key: HBASE-21025
> URL: https://issues.apache.org/jira/browse/HBASE-21025
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-21025-addendum.patch, HBASE-21025-v1.patch, 
> HBASE-21025-v2.patch, HBASE-21025.patch
>
>
> After HBASE-20881, we will check whether a table is disabled in SCP, so we 
> need to add cache for it to improve MTTR, and also reduce the request to meta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21069) NPE in StoreScanner.updateReaders causes RS to crash

2018-08-19 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585267#comment-16585267
 ] 

Andrew Purtell commented on HBASE-21069:


Sure we can do that. And there is another place where I test for a null 
reference to the collection before an isEmpty check but if that API handles a 
null passed in as your comment implies (I haven't looked but will) then the 
preceeding null check can be removed there. Will make these changes. 

> NPE in StoreScanner.updateReaders causes RS to crash 
> -
>
> Key: HBASE-21069
> URL: https://issues.apache.org/jira/browse/HBASE-21069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thomas D'Silva
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.7
>
> Attachments: HBASE-21069-branch-1.patch, HBASE-21069.patch
>
>
> I see the following NPE in the region server log for a table that is taking 
> heavy writes. 
> I am not sure how the {{memStoreScanners}} variable gets set to null.
> {code}
> 2018-08-17 19:59:23,682 DEBUG [MemStoreFlusher.1] 
> regionserver.HRegionFileSystem - Committing store file ...
> 2018-08-17 19:59:23,684 INFO  [MemStoreFlusher.1] regionserver.HStore - Added 
> hdfs://, entries=919170, sequenceid=275114, filesize=22.6 M
> 2018-08-17 19:59:23,689 FATAL [MemStoreFlusher.1] regionserver.HRegionServer 
> - ABORTING region server 
> iotperf1dchbase1a-dnds22-2-prd.eng.sfdc.net,60020,1533915690501: Replay of 
> WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: ..
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2581)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2258)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2220)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2106)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2031)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:177)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:827)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1160)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1133)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:120)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2487)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2536)
> ... 9 more
> 2018-08-17 19:59:23,692 FATAL [MemStoreFlusher.1] regionserver.HRegionServer 
> - RegionServer abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.phoenix.coprocessor.ScanRegionObserver, 
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, 
> org.apache.phoenix.hbase.index.Indexer, 
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, 
> org.apache.hadoop.hbase.security.token.TokenProvider, 
> org.apache.phoenix.coprocessor.ServerCachingEndpointImpl]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18477) Umbrella JIRA for HBase Read Replica clusters

2018-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585264#comment-16585264
 ] 

Hudson commented on HBASE-18477:


Results for branch HBASE-18477
[build #300 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/300/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/300//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/300//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/300//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18477/300//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Umbrella JIRA for HBase Read Replica clusters
> -
>
> Key: HBASE-18477
> URL: https://issues.apache.org/jira/browse/HBASE-18477
> Project: HBase
>  Issue Type: New Feature
>Reporter: Zach York
>Assignee: Zach York
>Priority: Major
> Attachments: HBase Read-Replica Clusters Scope doc.docx, HBase 
> Read-Replica Clusters Scope doc.pdf, HBase Read-Replica Clusters Scope 
> doc_v2.docx, HBase Read-Replica Clusters Scope doc_v2.pdf
>
>
> Recently, changes (such as HBASE-17437) have unblocked HBase to run with a 
> root directory external to the cluster (such as in Amazon S3). This means 
> that the data is stored outside of the cluster and can be accessible after 
> the cluster has been terminated. One use case that is often asked about is 
> pointing multiple clusters to one root directory (sharing the data) to have 
> read resiliency in the case of a cluster failure.
>  
> This JIRA is an umbrella JIRA to contain all the tasks necessary to create a 
> read-replica HBase cluster that is pointed at the same root directory.
>  
> This requires making the Read-Replica cluster Read-Only (no metadata 
> operation or data operations).
> Separating the hbase:meta table for each cluster (Otherwise HBase gets 
> confused with multiple clusters trying to update the meta table with their ip 
> addresses)
> Adding refresh functionality for the meta table to ensure new metadata is 
> picked up on the read replica cluster.
> Adding refresh functionality for HFiles for a given table to ensure new data 
> is picked up on the read replica cluster.
>  
> This can be used with any existing cluster that is backed by an external 
> filesystem.
>  
> Please note that this feature is still quite manual (with the potential for 
> automation later).
>  
> More information on this particular feature can be found here: 
> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21042) processor.getRowsToLock() always assumes there is some row being locked in HRegion#processRowsWithLocks

2018-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585263#comment-16585263
 ] 

Hudson commented on HBASE-21042:


Results for branch branch-1.4
[build #426 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/426/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/426//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/426//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/426//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> processor.getRowsToLock() always assumes there is some row being locked in 
> HRegion#processRowsWithLocks
> ---
>
> Key: HBASE-21042
> URL: https://issues.apache.org/jira/browse/HBASE-21042
> Project: HBase
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Ted Yu
>Priority: Major
> Fix For: 1.4.7
>
> Attachments: 21042.branch-1.txt
>
>
> [~tdsilva] reported at the tail of HBASE-18998 that the fix for HBASE-18998 
> missed finally block of HRegion#processRowsWithLocks
> This is to fix that remaining call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21069) NPE in StoreScanner.updateReaders causes RS to crash

2018-08-19 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585201#comment-16585201
 ] 

Toshihiro Suzuki commented on HBASE-21069:
--

Also, I think instead of the following "memStoreScanners != null", we can use 
"!CollectionUtils.isEmpty(memStoreScanners)" as we don't need to call 
clearAndClose() when memStoreScanners is empty.
{code}
863 if (memStoreScanners != null) {
864   clearAndClose(new ArrayList<>(memStoreScanners));
865 }
{code}



> NPE in StoreScanner.updateReaders causes RS to crash 
> -
>
> Key: HBASE-21069
> URL: https://issues.apache.org/jira/browse/HBASE-21069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thomas D'Silva
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.7
>
> Attachments: HBASE-21069-branch-1.patch, HBASE-21069.patch
>
>
> I see the following NPE in the region server log for a table that is taking 
> heavy writes. 
> I am not sure how the {{memStoreScanners}} variable gets set to null.
> {code}
> 2018-08-17 19:59:23,682 DEBUG [MemStoreFlusher.1] 
> regionserver.HRegionFileSystem - Committing store file ...
> 2018-08-17 19:59:23,684 INFO  [MemStoreFlusher.1] regionserver.HStore - Added 
> hdfs://, entries=919170, sequenceid=275114, filesize=22.6 M
> 2018-08-17 19:59:23,689 FATAL [MemStoreFlusher.1] regionserver.HRegionServer 
> - ABORTING region server 
> iotperf1dchbase1a-dnds22-2-prd.eng.sfdc.net,60020,1533915690501: Replay of 
> WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: ..
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2581)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2258)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2220)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2106)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2031)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:177)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:827)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1160)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1133)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:120)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2487)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2536)
> ... 9 more
> 2018-08-17 19:59:23,692 FATAL [MemStoreFlusher.1] regionserver.HRegionServer 
> - RegionServer abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.phoenix.coprocessor.ScanRegionObserver, 
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, 
> org.apache.phoenix.hbase.index.Indexer, 
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, 
> org.apache.hadoop.hbase.security.token.TokenProvider, 
> org.apache.phoenix.coprocessor.ServerCachingEndpointImpl]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21069) NPE in StoreScanner.updateReaders causes RS to crash

2018-08-19 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585196#comment-16585196
 ] 

Toshihiro Suzuki commented on HBASE-21069:
--

The following is the implementation of CollectionUtils.isEmpty(), which 
includes a null check.
{code}
  public static boolean isEmpty(Collection coll) {
return coll == null || coll.isEmpty();
  }
{code}

So I don't think we need to add the null checks with CollectionUtils.isEmpty() 
in the patches.

> NPE in StoreScanner.updateReaders causes RS to crash 
> -
>
> Key: HBASE-21069
> URL: https://issues.apache.org/jira/browse/HBASE-21069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thomas D'Silva
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.7
>
> Attachments: HBASE-21069-branch-1.patch, HBASE-21069.patch
>
>
> I see the following NPE in the region server log for a table that is taking 
> heavy writes. 
> I am not sure how the {{memStoreScanners}} variable gets set to null.
> {code}
> 2018-08-17 19:59:23,682 DEBUG [MemStoreFlusher.1] 
> regionserver.HRegionFileSystem - Committing store file ...
> 2018-08-17 19:59:23,684 INFO  [MemStoreFlusher.1] regionserver.HStore - Added 
> hdfs://, entries=919170, sequenceid=275114, filesize=22.6 M
> 2018-08-17 19:59:23,689 FATAL [MemStoreFlusher.1] regionserver.HRegionServer 
> - ABORTING region server 
> iotperf1dchbase1a-dnds22-2-prd.eng.sfdc.net,60020,1533915690501: Replay of 
> WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: ..
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2581)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2258)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2220)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2106)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2031)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:177)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:827)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1160)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1133)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:120)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2487)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2536)
> ... 9 more
> 2018-08-17 19:59:23,692 FATAL [MemStoreFlusher.1] regionserver.HRegionServer 
> - RegionServer abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.phoenix.coprocessor.ScanRegionObserver, 
> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, 
> org.apache.phoenix.hbase.index.Indexer, 
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, 
> org.apache.hadoop.hbase.security.token.TokenProvider, 
> org.apache.phoenix.coprocessor.ServerCachingEndpointImpl]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21071) HBaseTestingUtility::startMiniCluster() to use builder pattern

2018-08-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585146#comment-16585146
 ] 

Hadoop QA commented on HBASE-21071:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 72 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
12s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
58s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
19s{color} | {color:red} hbase-server: The patch generated 1 new + 407 
unchanged - 49 fixed = 408 total (was 456) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
7m 31s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 40s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
18s{color} | {color:green} hbase-mapreduce in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
13s{color} | {color:green} hbase-rsgroup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m  
1s{color} | {color:green} hbase-shell in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
4s{color} | {color:green} hbase-rest in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
42s{color} | {color:green} hbase-examples in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
52s{color} | {color:green} hbase-spark in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  2m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}214m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes 

[jira] [Commented] (HBASE-20941) Create and implement HbckService in master

2018-08-19 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585136#comment-16585136
 ] 

stack commented on HBASE-20941:
---

Left notes on rb. Almost there.

> Create and implement HbckService in master
> --
>
> Key: HBASE-20941
> URL: https://issues.apache.org/jira/browse/HBASE-20941
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-20941.master.001.patch, 
> hbase-20941.master.002.patch
>
>
> Create HbckService in master and implement following methods:
>  # setTableState(): If table state are inconsistent with action/ procedures 
> working on them, sometimes manipulating their states in meta fix things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21023) Add purgeProcedure/s() API to HbckService

2018-08-19 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585117#comment-16585117
 ] 

stack commented on HBASE-21023:
---

See parent issue toward end. Purge is going to make new problems. Need to 
figure way to force complete procedures and have them let go of any resources 
such as locks. Not sure how (smile).

> Add purgeProcedure/s() API to HbckService
> -
>
> Key: HBASE-21023
> URL: https://issues.apache.org/jira/browse/HBASE-21023
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck2
>Affects Versions: 2.0.1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.2.0
>
>
> purgeProcedure/s(): some procedures do not support abort at every step. When 
> these procedures get stuck then they can not be aborted or make further 
> progress. Corrective action is to purge these procedures from ProcWAL. 
> Provide option to purge sub-procedures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21073) "Maintenance mode" master

2018-08-19 Thread stack (JIRA)
stack created HBASE-21073:
-

 Summary: "Maintenance mode" master
 Key: HBASE-21073
 URL: https://issues.apache.org/jira/browse/HBASE-21073
 Project: HBase
  Issue Type: Sub-task
  Components: amv2, hbck2, master
Reporter: stack
Assignee: stack


Make it so we can bring up a Master in "maintenance mode". This is parse of 
master wal procs but not taking on regionservers. It would be in a state where 
"repair" Procedures could run; e.g. a Procedure that could recover meta by 
looking for meta WALs, splitting them, dropping recovered.edits, and even 
making it so meta is readable. See parent issue for why needed (disaster 
recovery).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21072) Block out HBCK1 in hbase2

2018-08-19 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21072:
--
Fix Version/s: 2.0.2

> Block out HBCK1 in hbase2
> -
>
> Key: HBASE-21072
> URL: https://issues.apache.org/jira/browse/HBASE-21072
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck
>Affects Versions: 2.0.1
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.2
>
>
> [~busbey] left a note in the parent issue that I only just read which has a 
> prescription for how we might block hbck1 from running against an hbase-2.x 
> (hbck1 could damage a hbase-2Its disabled in hbase-2 but an errant hbck1 
> from an hbase-1.x install might run).
> Here is quote from parent issue:
> {code}
> I was idly thinking about how to stop HBase v1 HBCK. Thanks to HBASE-11405, 
> we know that all HBase 1.y.z hbck instances should refuse to run if there's a 
> lock file at '/hbase/hbase-hbck.lock' (given defaults). How about HBase v2 
> places that file permanently in place and replace the contents (usually just 
> an IP address) with a note about how you must not run HBase v1 HBCK against 
> the cluster?
> {code}
> There is also the below:
> {code}
> We could pick another location for locking on HBase version 2 and start 
> building in a version check of some kind?
> {code}
> ... to which I'd answer, lets see. hbck2 is a different beast. It asks the 
> master to do stuff. It doesn't do it itself, as hbck1 did. So no need of a 
> lock/version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21072) Block out HBCK1 in hbase2

2018-08-19 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-21072:
-

Assignee: stack

> Block out HBCK1 in hbase2
> -
>
> Key: HBASE-21072
> URL: https://issues.apache.org/jira/browse/HBASE-21072
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck
>Affects Versions: 2.0.1
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> [~busbey] left a note in the parent issue that I only just read which has a 
> prescription for how we might block hbck1 from running against an hbase-2.x 
> (hbck1 could damage a hbase-2Its disabled in hbase-2 but an errant hbck1 
> from an hbase-1.x install might run).
> Here is quote from parent issue:
> {code}
> I was idly thinking about how to stop HBase v1 HBCK. Thanks to HBASE-11405, 
> we know that all HBase 1.y.z hbck instances should refuse to run if there's a 
> lock file at '/hbase/hbase-hbck.lock' (given defaults). How about HBase v2 
> places that file permanently in place and replace the contents (usually just 
> an IP address) with a note about how you must not run HBase v1 HBCK against 
> the cluster?
> {code}
> There is also the below:
> {code}
> We could pick another location for locking on HBase version 2 and start 
> building in a version check of some kind?
> {code}
> ... to which I'd answer, lets see. hbck2 is a different beast. It asks the 
> master to do stuff. It doesn't do it itself, as hbck1 did. So no need of a 
> lock/version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21072) Block out HBCK1 in hbase2

2018-08-19 Thread stack (JIRA)
stack created HBASE-21072:
-

 Summary: Block out HBCK1 in hbase2
 Key: HBASE-21072
 URL: https://issues.apache.org/jira/browse/HBASE-21072
 Project: HBase
  Issue Type: Sub-task
  Components: hbck
Affects Versions: 2.0.1
Reporter: stack


[~busbey] left a note in the parent issue that I only just read which has a 
prescription for how we might block hbck1 from running against an hbase-2.x 
(hbck1 could damage a hbase-2Its disabled in hbase-2 but an errant hbck1 
from an hbase-1.x install might run).

Here is quote from parent issue:

{code}
I was idly thinking about how to stop HBase v1 HBCK. Thanks to HBASE-11405, we 
know that all HBase 1.y.z hbck instances should refuse to run if there's a lock 
file at '/hbase/hbase-hbck.lock' (given defaults). How about HBase v2 places 
that file permanently in place and replace the contents (usually just an IP 
address) with a note about how you must not run HBase v1 HBCK against the 
cluster?
{code}

There is also the below:
{code}
We could pick another location for locking on HBase version 2 and start 
building in a version check of some kind?
{code}

... to which I'd answer, lets see. hbck2 is a different beast. It asks the 
master to do stuff. It doesn't do it itself, as hbck1 did. So no need of a 
lock/version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21042) processor.getRowsToLock() always assumes there is some row being locked in HRegion#processRowsWithLocks

2018-08-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585115#comment-16585115
 ] 

Hudson commented on HBASE-21042:


Results for branch branch-1
[build #421 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/421/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/421//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/421//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/421//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> processor.getRowsToLock() always assumes there is some row being locked in 
> HRegion#processRowsWithLocks
> ---
>
> Key: HBASE-21042
> URL: https://issues.apache.org/jira/browse/HBASE-21042
> Project: HBase
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Ted Yu
>Priority: Major
> Fix For: 1.4.7
>
> Attachments: 21042.branch-1.txt
>
>
> [~tdsilva] reported at the tail of HBASE-18998 that the fix for HBASE-18998 
> missed finally block of HRegion#processRowsWithLocks
> This is to fix that remaining call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-08-19 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585114#comment-16585114
 ] 

stack commented on HBASE-19121:
---

Chatting with [~allan163] and [~Apache9], major concern is loss of master proc 
wals. If gone, mis-deleted, or damaged, then the cluster is hosed. Can't have 
this. Redundancy? How to have redundant master proc WAL? Or can we leave 
breadcrumbs as we used to try in hbck1 days that allow us rebuild if all is 
trashed? How? We have some file-based droppings. Will use for now though we 
would like to move away from depending on particularities of our fs persist. 
For hbase2, minimally:

* A rebuild procedure that can put cluster back together after catastrophy.  
Rebuild procedure might be composed of multiple fix-it procedures that an 
operator would run via hbck2.  hbck2 would require at least a minimal Master 
running ("maintenance mode"). Best if no dependency on RSs.
* But only ever one master at a time! Even if a mimimal.
* One procedure would repair meta. It would work though minimal master. It 
would look for meta WAL logs for recovery. It'd run splitting inline rather 
than try farm it out to cluster to minimize dependency on RS's being up. It'd 
dump the recovered.edits into place. It might then open the the meta region
for hbck2 to read.
* hbck2 would make report of the troublesomeRITs. Or unfinished split or 
merge.
* A procedure to look for -SPLITTING RS dirs for queuing new SCPs.

Other hbck2 features:

* Move aside the master proc wals.
* Force complete of a procedure. Can't kill Procedures. Rollback doesn't always 
work. Procedures maybe subprocedures. Need to have them complete so parent can 
complete. Then operator does fixup. When force complete, need to release locks 
too... else operator or new procedures to fix cannot make progress.



> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21071) HBaseTestingUtility::startMiniCluster() to use builder pattern

2018-08-19 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585110#comment-16585110
 ] 

Duo Zhang commented on HBASE-21071:
---

And I like the approach here, Builder pattern is good. My concern is that, 
maybe we could start with a new one instead of patching on a corrupt one? I do 
not think HBTU should be marked as IA.Public directly, we should give user an 
interface, or just some actual utility classes without any internal states.

Thanks.

> HBaseTestingUtility::startMiniCluster() to use builder pattern
> --
>
> Key: HBASE-21071
> URL: https://issues.apache.org/jira/browse/HBASE-21071
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
> Attachments: HBASE-21071.000.patch, HBASE-21071.001.patch, 
> HBASE-21071.002.patch
>
>
> Currently there are 13 {{startMiniCluster()}} methods to set up a mini 
> cluster. I'm not surprised if we have a few more in future. It's good to 
> support different combination of optional parameters. We have to pick up one 
> of them carefully while still wondering the default values of other 
> parameters; if we add a new option, we may bring more new methods.
> One solution is to use builder pattern: create a class {{MiniClusterOptions}} 
> along with a static class {{MiniClusterOptionsBuilder}}, create a new method  
> {{startMiniCluster(MiniClusterOptions)}}. In {{master}} we delete the old 13 
> methods while in branch-2, we deprecate the old 13 methods.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21071) HBaseTestingUtility::startMiniCluster() to use builder pattern

2018-08-19 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585109#comment-16585109
 ] 

Duo Zhang commented on HBASE-21071:
---

Our testing framework is a bit messy, IIRC, the HBTU is marked as IA.Public, so 
I'm afraid you can not remove the public methods directly...

> HBaseTestingUtility::startMiniCluster() to use builder pattern
> --
>
> Key: HBASE-21071
> URL: https://issues.apache.org/jira/browse/HBASE-21071
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
> Attachments: HBASE-21071.000.patch, HBASE-21071.001.patch, 
> HBASE-21071.002.patch
>
>
> Currently there are 13 {{startMiniCluster()}} methods to set up a mini 
> cluster. I'm not surprised if we have a few more in future. It's good to 
> support different combination of optional parameters. We have to pick up one 
> of them carefully while still wondering the default values of other 
> parameters; if we add a new option, we may bring more new methods.
> One solution is to use builder pattern: create a class {{MiniClusterOptions}} 
> along with a static class {{MiniClusterOptionsBuilder}}, create a new method  
> {{startMiniCluster(MiniClusterOptions)}}. In {{master}} we delete the old 13 
> methods while in branch-2, we deprecate the old 13 methods.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20881) Introduce a region transition procedure to handle all the state transition for a region

2018-08-19 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20881:
--
Release Note: 
Introduced a new TransitRegionStateProcedure to replace the old 
AssignProcedure/UnassignProcedure/MoveRegionProcedure. In the old code, MRP 
will not be attached to RegionStateNode, so it can not be interrupted by 
ServerCrashProcedure, which introduces lots of tricky code to deal with races, 
and also causes lots of other difficulties on how to prevent scheduling 
redundant or even conflict procedures for a region.

And now TRSP is the only one procedure which can bring region online or 
offline. When you want to schedule one, you need to check whether there is 
already one attached to the RegionStateNode, under the lock of the 
RegionStateNode. If not just go ahead, and if there is one, then you should do 
something, for example, give up and fail directly, or tell the TRSP to give 
up(This is what SCP does). Since the check and attach are both under the lock 
of RSN, it will greatly reduce the possible races, and make the code much 
simpler.

> Introduce a region transition procedure to handle all the state transition 
> for a region
> ---
>
> Key: HBASE-20881
> URL: https://issues.apache.org/jira/browse/HBASE-20881
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20881-v1.patch, HBASE-20881-v10.patch, 
> HBASE-20881-v11.patch, HBASE-20881-v12.patch, HBASE-20881-v13.patch, 
> HBASE-20881-v13.patch, HBASE-20881-v14.patch, HBASE-20881-v14.patch, 
> HBASE-20881-v2.patch, HBASE-20881-v3.patch, HBASE-20881-v4.patch, 
> HBASE-20881-v4.patch, HBASE-20881-v5.patch, HBASE-20881-v6.patch, 
> HBASE-20881-v7.patch, HBASE-20881-v7.patch, HBASE-20881-v8.patch, 
> HBASE-20881-v9.patch, HBASE-20881.patch
>
>
> Now have an AssignProcedure, an UnssignProcedure, and also a 
> MoveRegionProcedure which schedules an AssignProcedure and an 
> UnssignProcedure to move a region. This makes the logic a bit complicated, as 
> MRP is not a RIT, so when SCP can not interrupt it directly...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21071) HBaseTestingUtility::startMiniCluster() to use builder pattern

2018-08-19 Thread Mingliang Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HBASE-21071:
--
Attachment: HBASE-21071.002.patch

> HBaseTestingUtility::startMiniCluster() to use builder pattern
> --
>
> Key: HBASE-21071
> URL: https://issues.apache.org/jira/browse/HBASE-21071
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
> Attachments: HBASE-21071.000.patch, HBASE-21071.001.patch, 
> HBASE-21071.002.patch
>
>
> Currently there are 13 {{startMiniCluster()}} methods to set up a mini 
> cluster. I'm not surprised if we have a few more in future. It's good to 
> support different combination of optional parameters. We have to pick up one 
> of them carefully while still wondering the default values of other 
> parameters; if we add a new option, we may bring more new methods.
> One solution is to use builder pattern: create a class {{MiniClusterOptions}} 
> along with a static class {{MiniClusterOptionsBuilder}}, create a new method  
> {{startMiniCluster(MiniClusterOptions)}}. In {{master}} we delete the old 13 
> methods while in branch-2, we deprecate the old 13 methods.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21071) HBaseTestingUtility::startMiniCluster() to use builder pattern

2018-08-19 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585101#comment-16585101
 ] 

Mingliang Liu commented on HBASE-21071:
---

Thanks [~stack] for prompt review and helpful comments!

Yes it's necessary to make javadoc clear. In the v1 patch I put comments for 
each {{StartMiniClusterOption}} field so I did not put anything for the 
Builder. I agree that the field name should be meaningful. I also got confused 
about the {{create}} and {{withWALDir}} options, so I read through and rename 
them to {{createRootDir}} and {{createWALDir}} respectively. The v2 patch also 
refined the comments for those two options in  {{StartMiniClusterOption}}  
javadoc.

I leave the extra short cut method to start a mini cluster: 
{{startMiniCluster(int numSlaves)}}. There are >300 usages in the whole project 
and this method needs special attention. The other start method 
{{startMiniCluster(int numMasters, int numSlaves)}} was deleted as there are 
only ~20 effective use cases and I changed them manually from v1 patch to v2.

If the v2 patch looks good, I'll prepare for a branch-2 one.

> HBaseTestingUtility::startMiniCluster() to use builder pattern
> --
>
> Key: HBASE-21071
> URL: https://issues.apache.org/jira/browse/HBASE-21071
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
> Attachments: HBASE-21071.000.patch, HBASE-21071.001.patch, 
> HBASE-21071.002.patch
>
>
> Currently there are 13 {{startMiniCluster()}} methods to set up a mini 
> cluster. I'm not surprised if we have a few more in future. It's good to 
> support different combination of optional parameters. We have to pick up one 
> of them carefully while still wondering the default values of other 
> parameters; if we add a new option, we may bring more new methods.
> One solution is to use builder pattern: create a class {{MiniClusterOptions}} 
> along with a static class {{MiniClusterOptionsBuilder}}, create a new method  
> {{startMiniCluster(MiniClusterOptions)}}. In {{master}} we delete the old 13 
> methods while in branch-2, we deprecate the old 13 methods.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-08-19 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585096#comment-16585096
 ] 

Xu Cang commented on HBASE-20690:
-

introduced by: HBASE-19088 

> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20917) MetaTableMetrics#stop references uninitialized requestsMap for non-meta region

2018-08-19 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585054#comment-16585054
 ] 

Xu Cang commented on HBASE-20917:
-

[~yuzhih...@gmail.com] Looks good! +1

right, TableName can be compared directly. Thanks. 

> MetaTableMetrics#stop references uninitialized requestsMap for non-meta region
> --
>
> Key: HBASE-20917
> URL: https://issues.apache.org/jira/browse/HBASE-20917
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 1.5.0, 1.4.6, 2.2.0
>
> Attachments: 20917.addendum, 20917.v1.txt, 20917.v2.txt
>
>
> I noticed the following in test output:
> {code}
> 2018-07-21 15:54:43,181 ERROR [RS_CLOSE_REGION-regionserver/172.17.5.4:0-1] 
> executor.EventHandler(186): Caught throwable while processing event 
> M_RS_CLOSE_REGION
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics.stop(MetaTableMetrics.java:329)
>   at 
> org.apache.hadoop.hbase.coprocessor.BaseEnvironment.shutdown(BaseEnvironment.java:91)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionEnvironment.shutdown(RegionCoprocessorHost.java:165)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.shutdown(CoprocessorHost.java:290)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.postEnvCall(RegionCoprocessorHost.java:559)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:622)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postClose(RegionCoprocessorHost.java:551)
>   at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1678)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1484)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
>   at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> {code}
> {{requestsMap}} is only initialized for the meta region.
> However, check for meta region is absent in the stop method:
> {code}
>   public void stop(CoprocessorEnvironment e) throws IOException {
> // since meta region can move around, clear stale metrics when stop.
> for (String meterName : requestsMap.keySet()) {
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21071) HBaseTestingUtility::startMiniCluster() to use builder pattern

2018-08-19 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584741#comment-16584741
 ] 

Mingliang Liu edited comment on HBASE-21071 at 8/19/18 6:55 AM:


{quote}
It seems the Builder can be a class within MiniClusterOptions.
{quote}
Thanks [~yuzhih...@gmail.com]. Yes that's a good suggestion.

{quote}
Should it be StartMiniClusterOptions to tie the new Builder tighter to the 
startMiniCluster method? (Will other methods in MiniCluster want to take 
options?)
{quote}
Yes this should be {{StartMiniClusterOptions}}. I don't find other methods that 
will be using this option class.

{quote}
Does this mean the options should be MiniHBaseClusterOptions and we should 
rename this start method to be startMiniHBaseCluster. Would a 
MiniHBaseClusterBuilder make sense returning a MiniHBaseCluster instance on 
which you called start.
{quote}
I checked this and tried in code for something that only focuses on the 
{{MiniHBaseCluster}}. Also followed the idea of creating a 
{{MiniHBaseClusterBuilder}} class to build a {{MiniHBaseCluster}} directly. 
However, I got two obstacles after a short code walk:
# Option combinations supported by {{startMiniCluster}} also include multiple 
HDFS options. To simplify those polymorphic helper methods, it seems easier to 
consolidate the hbase/hdfs/zk options in one place {{StartMiniClusterOptions}}. 
Specially {{startMiniHBaseCluster}} also accepts this option.
# If we replace {{startMiniCluster()}}, the current build-and-start combo 
methods, with creating {{MiniHBaseCluster}} from builder first and calling 
start(), all call sites (hundreds) will have to update. Meanwhile, 
{{MiniHBaseCluster}} constructor currently initializes the cluster. We will 
have to split it to builder phase and start() phase. Some of the methods are 
using non-static {{HBaseTestingUtility}} methods to prepare directories. As 
[~stack] expects, it will be a very large refactoring patch.

*TL;DR* I take the trade off of being perfect and being affordable change. To 
start a {{MiniCluster}} or {{MiniHBaseCluster}} cluster, we first build an 
option and then pass it to {{startMiniCluster(option)}} or 
{{startMiniHBaseCluster(option)}} methods. If using default option values, we 
can avoid build an option and use the other two methods {{startMiniCluster()}} 
or {{startMiniHBaseCluster()}}. These four methods serve all use cases we are 
targeting in a (hopefully) clear, simple and flexible way.

The v1 patch almost implements this idea, with one exception: 
{{startMiniCluster(int numSlaves)}} and {{startMiniCluster(int numMasters, int 
numSlaves)}} are not yet removed and replaced with simple builder calls. The 
reason is that, there are hundreds of calls of those two methods, and changing 
them manually can be error-prone. I'd like to post the v1 patch first for high 
level review and Jenkins. If it looks good overall, in the next patch I can 
update all other places. I'm fine if we keep one of them as another shortcut by 
the way.

Thanks,


was (Author: liuml07):
{quote}
It seems the Builder can be a class within MiniClusterOptions.
{quote}
Thanks [~yuzhih...@gmail.com]. Yes that's a good suggestion.

{quote}
Should it be StartMiniClusterOptions to tie the new Builder tighter to the 
startMiniCluster method? (Will other methods in MiniCluster want to take 
options?)
{quote}
Yes this should be {{StartMiniClusterOptions}}. I don't find other methods that 
will be using this option class.

{quote}
Does this mean the options should be MiniHBaseClusterOptions and we should 
rename this start method to be startMiniHBaseCluster. Would a 
MiniHBaseClusterBuilder make sense returning a MiniHBaseCluster instance on 
which you called start.
{quote}
I checked this and tried in code for something that only focuses on the 
{{MiniHBaseCluster}}. Also followed the idea of creating a 
{{MiniHBaseClusterBuilder}} class to build a {{MiniHBaseCluster}} directly. 
However, I got two obstacles after a short code walk:
# Option combinations supported by {{startMiniCluster}} also include multiple 
HDFS options. To simplify those polymorphic helper methods, it seems easier to 
consolidate the hbase/hdfs/zk options in one place {{StartMiniClusterOptions}}. 
Specially {{startMiniHBaseCluster}} also accepts this option.
# If we replace {{startMiniCluster()}}, the current build-and-start combo 
methods, with creating {{MiniHBaseCluster}} from builder first and calling 
start(), all call sites (hundreds) will have to update. Meanwhile, 
{{MiniHBaseCluster}} constructor currently initializes the cluster. We will 
have to split it to builder phase and start() phase. Some of the methods are 
using non-static {{HBaseTestingUtility}} methods to prepare directories. As 
[~stack] expects, it will be a very large refactoring patch.

*TL;DR* I take the trade off of being perfect and being affordable change. To 

[jira] [Comment Edited] (HBASE-21071) HBaseTestingUtility::startMiniCluster() to use builder pattern

2018-08-19 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584741#comment-16584741
 ] 

Mingliang Liu edited comment on HBASE-21071 at 8/19/18 6:53 AM:


{quote}
It seems the Builder can be a class within MiniClusterOptions.
{quote}
Thanks [~yuzhih...@gmail.com]. Yes that's a good suggestion.

{quote}
Should it be StartMiniClusterOptions to tie the new Builder tighter to the 
startMiniCluster method? (Will other methods in MiniCluster want to take 
options?)
{quote}
Yes this should be {{StartMiniClusterOptions}}. I don't find other methods that 
will be using this option class.

{quote}
Does this mean the options should be MiniHBaseClusterOptions and we should 
rename this start method to be startMiniHBaseCluster. Would a 
MiniHBaseClusterBuilder make sense returning a MiniHBaseCluster instance on 
which you called start.
{quote}
I checked this and tried in code for something that only focuses on the 
{{MiniHBaseCluster}}. Also followed the idea of creating a 
{{MiniHBaseClusterBuilder}} class to build a {{MiniHBaseCluster}} directly. 
However, I got two obstacles after a short code walk:
# Option combinations supported by {{startMiniCluster}} also include multiple 
HDFS options. To simplify those polymorphic helper methods, it seems easier to 
consolidate the hbase/hdfs/zk options in one place {{StartMiniClusterOptions}}. 
Specially {{startMiniHBaseCluster}} also accepts this option.
# If we replace {{startMiniCluster()}}, the current build-and-start combo 
methods, with creating {{MiniHBaseCluster}} from builder first and calling 
start(), all call sites (hundreds) will have to update. Meanwhile, 
{{MiniHBaseCluster}} constructor currently initializes the cluster. We will 
have to split it to builder phase and start() phase. Some of the methods are 
using non-static {{HBaseTestingUtility}} methods to prepare directories. As 
[~stack] expects, it will be a very large refactoring patch.

*TL;DR* I take the trade off of being perfect and being affordable change. To 
start a {{MiniCluster}} or {{MiniHBaseCluster}} cluster, we first build an 
immutable option and then pass it to {{startMiniCluster(option)}} or 
{{startMiniHBaseCluster(option)}} methods. If using default option values, we 
can avoid build an option and use the other two methods {{startMiniCluster()}} 
or {{startMiniHBaseCluster()}}. These four methods sever all use cases we are 
targeting in a (hopefully) clear, simple and flexible way.

The v1 patch almost implements this idea, with one exception: 
{{startMiniCluster(int numSlaves)}} and {{startMiniCluster(int numMasters, int 
numSlaves)}} are not yet removed and replaced with simple builder calls. The 
reason is that, there are hundreds of calls of those two methods, and changing 
them manually can be error-prone. I'd like to post the v1 patch first for high 
level review and Jenkins. If it looks good overall, in the next patch I can 
update all other places. I'm fine if we keep one of them as another shortcut by 
the way.

Thanks,


was (Author: liuml07):
{quote}
It seems the Builder can be a class within MiniClusterOptions.
{quote}
Thanks [~yuzhih...@gmail.com]. Yes that's a good suggestion.

{quote}
Should it be StartMiniClusterOptions to tie the new Builder tighter to the 
startMiniCluster method? (Will other methods in MiniCluster want to take 
options?)
{quote}
Yes this should be {{StartMiniClusterOptions}}. I don't find other methods that 
will be using this option class.

{quote}
Does this mean the options should be MiniHBaseClusterOptions and we should 
rename this start method to be startMiniHBaseCluster. Would a 
MiniHBaseClusterBuilder make sense returning a MiniHBaseCluster instance on 
which you called start.
{quote}
I checked this and tried in code for something that only focuses on the 
{{MiniHBaseCluster}}. Also followed the idea of creating a 
{{MiniHBaseClusterBuilder}} class to build a {{MiniHBaseCluster}} directly. 
However, I got two obstacles after a short code walk:
# Option combinations supported by {{startMiniCluster}} also include multiple 
HDFS options. To simplify those polymorphic helper methods, it seems easier to 
consolidate the hbase/hdfs/zk options in one place {{StartMiniClusterOptions}}. 
Specially {{startMiniHBaseCluster}} also accepts this option.
# If we replace {{startMiniCluster()}, the current build-and-start combo 
methods, with creating {{MiniHBaseCluster}} from builder first and calling 
start(), all call sites (hundreds) will have to update. Meanwhile, 
{{MiniHBaseCluster}} constructor currently initializes the cluster. We will 
have to split it to builder phase and start() phase. Some of the methods are 
using non-static {{HBaseTestingUtility}} methods to prepare directories. As 
[~stack] expects, it will be a very large refactoring patch.

*TL;DR* I take the trade off of being perfect and being affordable