Re: Review Request 70031: HIVE-21167

2019-02-22 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70031/#review213097
---




ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1850 (patched)


Add NULL check for the parent. If a plan doesn't have reduce sink operator 
and you hit table scan its parent will be NULL


- Vineet Garg


On Feb. 22, 2019, 7:19 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70031/
> ---
> 
> (Updated Feb. 22, 2019, 7:19 a.m.)
> 
> 
> Review request for hive, Jason Dere and Vaibhav Gumashta.
> 
> 
> Bugs: HIVE-21167
> https://issues.apache.org/jira/browse/HIVE-21167
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Bucketing: Bucketing version 1 is incorrectly partitioning data
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 4b10e8974e 
>   ql/src/test/queries/clientpositive/murmur_hash_migration.q 2b8da9f683 
>   
> ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 
> 5a2cd47381 
>   ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out 
> 5343628252 
> 
> 
> Diff: https://reviews.apache.org/r/70031/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



[jira] [Created] (HIVE-21310) Hashcode of a varchar column is incorrect if its folded

2019-02-22 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-21310:
---

 Summary: Hashcode of a varchar column is incorrect if its folded
 Key: HIVE-21310
 URL: https://issues.apache.org/jira/browse/HIVE-21310
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich



{code:sql}
create table t (a varchar(10));

insert into t values('bee'),('xxx');

-- select  t0.v,t1.v from
select   assert_true(t0.v = t1.v) from
(select hash(a) as v from t where a='bee') as t0 
join(select hash(a) as v from t where a='bee' or a='xbee') as t1 on (true);
{code}

the assertion fails because: {{97410 != 127201}}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259393882
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/AlterDatabaseDesc.java
 ##
 @@ -16,112 +16,100 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
 import java.util.Map;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
 import org.apache.hadoop.hive.ql.parse.ReplicationSpec;
+import org.apache.hadoop.hive.ql.plan.Explain;
 import org.apache.hadoop.hive.ql.plan.Explain.Level;
+import org.apache.hadoop.hive.ql.plan.PrincipalDesc;
 
 /**
- * AlterDatabaseDesc.
- *
+ * DDL task description for ALTER DATABASE commands.
  */
 @Explain(displayName = "Alter Database", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
 public class AlterDatabaseDesc extends DDLDesc implements Serializable {
-
   private static final long serialVersionUID = 1L;
 
-  // Only altering the database property and owner is currently supported
-  public static enum ALTER_DB_TYPES {
-ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
-  };
-
-  ALTER_DB_TYPES alterType;
-  String databaseName;
-  Map dbProperties;
-  PrincipalDesc ownerPrincipal;
-  ReplicationSpec replicationSpec;
-  String location;
+  static {
+DDLTask2.registerOperator(AlterDatabaseDesc.class, 
AlterDatabaseOperation.class);
+  }
 
   /**
-   * For serialization only.
+   * Supported type of alter db commands.
+   * Only altering the database property and owner is currently supported
*/
-  public AlterDatabaseDesc() {
-  }
+  public enum AlterDbType {
+ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
+  };
 
-  public AlterDatabaseDesc(String databaseName, Map dbProps,
 
 Review comment:
   I see; so you've already considered considered and rejected this approach :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259392471
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/AlterDatabaseDesc.java
 ##
 @@ -16,112 +16,100 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
 import java.util.Map;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
 import org.apache.hadoop.hive.ql.parse.ReplicationSpec;
+import org.apache.hadoop.hive.ql.plan.Explain;
 import org.apache.hadoop.hive.ql.plan.Explain.Level;
+import org.apache.hadoop.hive.ql.plan.PrincipalDesc;
 
 /**
- * AlterDatabaseDesc.
- *
+ * DDL task description for ALTER DATABASE commands.
  */
 @Explain(displayName = "Alter Database", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
 public class AlterDatabaseDesc extends DDLDesc implements Serializable {
-
   private static final long serialVersionUID = 1L;
 
-  // Only altering the database property and owner is currently supported
-  public static enum ALTER_DB_TYPES {
-ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
-  };
-
-  ALTER_DB_TYPES alterType;
-  String databaseName;
-  Map dbProperties;
-  PrincipalDesc ownerPrincipal;
-  ReplicationSpec replicationSpec;
-  String location;
+  static {
+DDLTask2.registerOperator(AlterDatabaseDesc.class, 
AlterDatabaseOperation.class);
+  }
 
   /**
-   * For serialization only.
+   * Supported type of alter db commands.
+   * Only altering the database property and owner is currently supported
*/
-  public AlterDatabaseDesc() {
-  }
+  public enum AlterDbType {
+ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
+  };
 
-  public AlterDatabaseDesc(String databaseName, Map dbProps,
 
 Review comment:
   Basically AlterDatabaseOperation (or in more general every XXXOperation 
could be a task as well, and every XXXDesc could have their own XXXWork class. 
But that would mean a lot of repetitive code, so having a common Task, and Work 
class for them is just reusing the same boilerplate code. Having a separate SA 
is a good idea, I think when DDLTask is cut to pieces the next step should be a 
to cut DDLSemanticAnalyzer too!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259363654
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/AlterDatabaseDesc.java
 ##
 @@ -16,112 +16,100 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
 import java.util.Map;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
 import org.apache.hadoop.hive.ql.parse.ReplicationSpec;
+import org.apache.hadoop.hive.ql.plan.Explain;
 import org.apache.hadoop.hive.ql.plan.Explain.Level;
+import org.apache.hadoop.hive.ql.plan.PrincipalDesc;
 
 /**
- * AlterDatabaseDesc.
- *
+ * DDL task description for ALTER DATABASE commands.
  */
 @Explain(displayName = "Alter Database", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
 public class AlterDatabaseDesc extends DDLDesc implements Serializable {
-
   private static final long serialVersionUID = 1L;
 
-  // Only altering the database property and owner is currently supported
-  public static enum ALTER_DB_TYPES {
-ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
-  };
-
-  ALTER_DB_TYPES alterType;
-  String databaseName;
-  Map dbProperties;
-  PrincipalDesc ownerPrincipal;
-  ReplicationSpec replicationSpec;
-  String location;
+  static {
+DDLTask2.registerOperator(AlterDatabaseDesc.class, 
AlterDatabaseOperation.class);
+  }
 
   /**
-   * For serialization only.
+   * Supported type of alter db commands.
+   * Only altering the database property and owner is currently supported
*/
-  public AlterDatabaseDesc() {
-  }
+  public enum AlterDbType {
+ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
+  };
 
-  public AlterDatabaseDesc(String databaseName, Map dbProps,
 
 Review comment:
   I don't feel a few hundred lines much :) tought that closely related stuff 
might be beneficial during future feature additions/etc
   but it was just an idea; it will be fine without it as well.
   
   I'm right now wondering about what would happen if instead of adding an 
extra layer of  (`AlterDatabaseOperation`,`AlterDatabaseDesc`) ; how insane 
would be to do: `AlterDatabaseTask`,`Work` and `SA`?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259361340
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/SwitchDatabaseDesc.java
 ##
 @@ -16,37 +16,34 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
-import org.apache.hadoop.hive.ql.plan.Explain.Level;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
+import org.apache.hadoop.hive.ql.plan.Explain;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
 
 /**
- * SwitchDatabaseDesc.
- *
+ * DDL task description for SWITCH DATABASE commands.
 
 Review comment:
   of course; it makes sense to do it separetly


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259360497
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/ShowDatabasesOperation.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.List;
+
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of locking a database.
+ */
+public class ShowDatabasesOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+// get the databases for the desired pattern - populate the output stream
+List databases = null;
+if (ddlDesc.getPattern() != null) {
+  LOG.debug("pattern: {}", ddlDesc.getPattern());
+  databases = db.getDatabasesByPattern(ddlDesc.getPattern());
+} else {
+  databases = db.getAllDatabases();
+}
+
+LOG.info("Found {} database(s) matching the SHOW DATABASES statement.", 
databases.size());
+
+// write the results in the file
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
+try {
+  formatter.showDatabases(outStream, databases);
 
 Review comment:
   that could be done later; and there are 2 brands of formatters...so it's not 
that simple


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259359243
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLTask2.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.ExplainConfiguration.AnalyzeState;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+/**
+ * DDLTask implementation.
+**/
+public class DDLTask2 extends Task implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private static final Map, Class>> DESC_TO_OPARATION =
+  new HashMap<>();
+  public static void registerOperator(Class descClass,
+  Class> operationClass) {
+DESC_TO_OPARATION.put(descClass, operationClass);
+  }
+
+  @Override
+  public boolean requireLock() {
+return this.work != null && this.work.getNeedLock();
+  }
+
+  @Override
+  public void initialize(QueryState queryState, QueryPlan queryPlan, 
DriverContext ctx,
+  CompilationOpContext opContext) {
+super.initialize(queryState, queryPlan, ctx, opContext);
+  }
+
+  @Override
+  public int execute(DriverContext driverContext) {
+if (driverContext.getCtx().getExplainAnalyze() == AnalyzeState.RUNNING) {
+  return 0;
+}
+
+try {
+  Hive db = Hive.get(conf);
+  DDLDesc ddlDesc = work.getDDLDesc();
+
+  if (DESC_TO_OPARATION.containsKey(ddlDesc.getClass())) {
+DDLOperation ddlOperation = 
DESC_TO_OPARATION.get(ddlDesc.getClass()).newInstance();
 
 Review comment:
   Switched to use constructor. Looks better indeed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259358917
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLOperation.java
 ##
 @@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.DataOutputStream;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatUtils;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Abstract ancestor class of all DDL Operation classes.
+ */
+public abstract class DDLOperation {
+  protected static final Logger LOG = 
LoggerFactory.getLogger("hive.ql.exec.DDLTask");
+
+  protected Hive db;
 
 Review comment:
   Made them final, added via constructor. Looks better indeed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259358745
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLTask2.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.ExplainConfiguration.AnalyzeState;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+/**
+ * DDLTask implementation.
+**/
+public class DDLTask2 extends Task implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private static final Map, Class>> DESC_TO_OPARATION =
+  new HashMap<>();
+  public static void registerOperator(Class descClass,
+  Class> operationClass) {
+DESC_TO_OPARATION.put(descClass, operationClass);
+  }
+
+  @Override
+  public boolean requireLock() {
+return this.work != null && this.work.getNeedLock();
+  }
+
+  @Override
+  public void initialize(QueryState queryState, QueryPlan queryPlan, 
DriverContext ctx,
+  CompilationOpContext opContext) {
+super.initialize(queryState, queryPlan, ctx, opContext);
+  }
+
+  @Override
+  public int execute(DriverContext driverContext) {
+if (driverContext.getCtx().getExplainAnalyze() == AnalyzeState.RUNNING) {
+  return 0;
+}
+
+try {
+  Hive db = Hive.get(conf);
+  DDLDesc ddlDesc = work.getDDLDesc();
+
+  if (DESC_TO_OPARATION.containsKey(ddlDesc.getClass())) {
+DDLOperation ddlOperation = 
DESC_TO_OPARATION.get(ddlDesc.getClass()).newInstance();
+ddlOperation.init(db, conf, driverContext, ddlDesc);
 
 Review comment:
   Context created, will be in the next version.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259336043
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLTask2.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.ExplainConfiguration.AnalyzeState;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+/**
+ * DDLTask implementation.
+**/
+public class DDLTask2 extends Task implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private static final Map, Class>> DESC_TO_OPARATION =
+  new HashMap<>();
+  public static void registerOperator(Class descClass,
+  Class> operationClass) {
+DESC_TO_OPARATION.put(descClass, operationClass);
+  }
+
+  @Override
+  public boolean requireLock() {
+return this.work != null && this.work.getNeedLock();
+  }
+
+  @Override
+  public void initialize(QueryState queryState, QueryPlan queryPlan, 
DriverContext ctx,
+  CompilationOpContext opContext) {
+super.initialize(queryState, queryPlan, ctx, opContext);
+  }
+
+  @Override
+  public int execute(DriverContext driverContext) {
+if (driverContext.getCtx().getExplainAnalyze() == AnalyzeState.RUNNING) {
+  return 0;
+}
+
+try {
+  Hive db = Hive.get(conf);
+  DDLDesc ddlDesc = work.getDDLDesc();
+
+  if (DESC_TO_OPARATION.containsKey(ddlDesc.getClass())) {
+DDLOperation ddlOperation = 
DESC_TO_OPARATION.get(ddlDesc.getClass()).newInstance();
 
 Review comment:
   if it's not the concreate operation's responsibility then the `init` method 
should be final.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259328924
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
 ##
 @@ -2571,11 +2567,10 @@ private void analyzeDescDatabase(ASTNode ast) throws 
SemanticException {
   throw new SemanticException("Unexpected Tokens at DESCRIBE DATABASE");
 }
 
-DescDatabaseDesc descDbDesc = new DescDatabaseDesc(ctx.getResFile(),
-dbName, isExtended);
+DescDatabaseDesc descDbDesc = new DescDatabaseDesc(ctx.getResFile(), 
dbName, isExtended);
 inputs.add(new ReadEntity(getDatabase(dbName)));
-rootTasks.add(TaskFactory.get(new DDLWork(getInputs(), getOutputs(), 
descDbDesc)));
-setFetchTask(createFetchTask(descDbDesc.getSchema()));
+rootTasks.add(TaskFactory.get(new DDLWork2(getInputs(), getOutputs(), 
descDbDesc)));
+setFetchTask(createFetchTask(DESC_DATABASE_SCHEMA));
 
 Review comment:
   will do


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259328210
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/ShowDatabasesOperation.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.List;
+
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of locking a database.
+ */
+public class ShowDatabasesOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+// get the databases for the desired pattern - populate the output stream
+List databases = null;
+if (ddlDesc.getPattern() != null) {
+  LOG.debug("pattern: {}", ddlDesc.getPattern());
+  databases = db.getDatabasesByPattern(ddlDesc.getPattern());
+} else {
+  databases = db.getAllDatabases();
+}
+
+LOG.info("Found {} database(s) matching the SHOW DATABASES statement.", 
databases.size());
+
+// write the results in the file
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
+try {
+  formatter.showDatabases(outStream, databases);
 
 Review comment:
   As I see only one formatter is used for all purposes in the old DDLTask 
class. What do you mean by finding a new home?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259327538
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/DescDatabaseOperation.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.Map;
+import java.util.TreeMap;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of describing a database.
+ */
+public class DescDatabaseOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
 
 Review comment:
   will fix


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259327195
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/DescDatabaseOperation.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.Map;
+import java.util.TreeMap;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of describing a database.
+ */
+public class DescDatabaseOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
+try {
+  Database database = db.getDatabase(ddlDesc.getDatabaseName());
+  if (database == null) {
+throw new HiveException(ErrorMsg.DATABASE_NOT_EXISTS, 
ddlDesc.getDatabaseName());
+  }
+
+  Map params = null;
+  if (ddlDesc.isExt()) {
+params = database.getParameters();
+  }
+
+  // If this is a q-test, let's order the params map (lexicographically) by
+  // key. This is to get consistent param ordering between Java7 and Java8.
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) && params 
!= null) {
+params = new TreeMap(params);
+  }
+
+  String location = database.getLocationUri();
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+location = "location/in/test";
+  }
+
+  PrincipalType ownerType = database.getOwnerType();
+  formatter.showDatabaseDescription(outStream, database.getName(), 
database.getDescription(), location,
+  database.getOwnerName(), (null == ownerType) ? null : 
ownerType.name(), params);
 
 Review comment:
   May be valid, but not in the scope of this modification. First, let's cut it 
to pieces, later we can create a jira for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259326980
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/DescDatabaseOperation.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.Map;
+import java.util.TreeMap;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of describing a database.
+ */
+public class DescDatabaseOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
+try {
+  Database database = db.getDatabase(ddlDesc.getDatabaseName());
+  if (database == null) {
+throw new HiveException(ErrorMsg.DATABASE_NOT_EXISTS, 
ddlDesc.getDatabaseName());
+  }
+
+  Map params = null;
+  if (ddlDesc.isExt()) {
+params = database.getParameters();
+  }
+
+  // If this is a q-test, let's order the params map (lexicographically) by
+  // key. This is to get consistent param ordering between Java7 and Java8.
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) && params 
!= null) {
+params = new TreeMap(params);
+  }
+
+  String location = database.getLocationUri();
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+location = "location/in/test";
 
 Review comment:
   May be valid, but not in the scope of this modification. First, let's cut it 
to pieces, later we can create a jira for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259327077
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/DescDatabaseOperation.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.Map;
+import java.util.TreeMap;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of describing a database.
+ */
+public class DescDatabaseOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
+try {
+  Database database = db.getDatabase(ddlDesc.getDatabaseName());
+  if (database == null) {
+throw new HiveException(ErrorMsg.DATABASE_NOT_EXISTS, 
ddlDesc.getDatabaseName());
+  }
+
+  Map params = null;
+  if (ddlDesc.isExt()) {
+params = database.getParameters();
+  }
+
+  // If this is a q-test, let's order the params map (lexicographically) by
+  // key. This is to get consistent param ordering between Java7 and Java8.
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) && params 
!= null) {
+params = new TreeMap(params);
 
 Review comment:
   May be valid, but not in the scope of this modification. First, let's cut it 
to pieces, later we can create a jira for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259326675
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLTask2.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.ExplainConfiguration.AnalyzeState;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+/**
+ * DDLTask implementation.
+**/
+public class DDLTask2 extends Task implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private static final Map, Class>> DESC_TO_OPARATION =
+  new HashMap<>();
+  public static void registerOperator(Class descClass,
+  Class> operationClass) {
+DESC_TO_OPARATION.put(descClass, operationClass);
+  }
+
+  @Override
+  public boolean requireLock() {
+return this.work != null && this.work.getNeedLock();
+  }
+
+  @Override
+  public void initialize(QueryState queryState, QueryPlan queryPlan, 
DriverContext ctx,
+  CompilationOpContext opContext) {
+super.initialize(queryState, queryPlan, ctx, opContext);
+  }
+
+  @Override
+  public int execute(DriverContext driverContext) {
+if (driverContext.getCtx().getExplainAnalyze() == AnalyzeState.RUNNING) {
+  return 0;
+}
+
+try {
+  Hive db = Hive.get(conf);
+  DDLDesc ddlDesc = work.getDDLDesc();
+
+  if (DESC_TO_OPARATION.containsKey(ddlDesc.getClass())) {
+DDLOperation ddlOperation = 
DESC_TO_OPARATION.get(ddlDesc.getClass()).newInstance();
+ddlOperation.init(db, conf, driverContext, ddlDesc);
 
 Review comment:
   conf and db are definitely not redundant, they are used by many operations. 
We may consider though creating a context.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259326436
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLTask2.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.ExplainConfiguration.AnalyzeState;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+/**
+ * DDLTask implementation.
+**/
+public class DDLTask2 extends Task implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private static final Map, Class>> DESC_TO_OPARATION =
+  new HashMap<>();
+  public static void registerOperator(Class descClass,
+  Class> operationClass) {
+DESC_TO_OPARATION.put(descClass, operationClass);
+  }
+
+  @Override
+  public boolean requireLock() {
+return this.work != null && this.work.getNeedLock();
+  }
+
+  @Override
+  public void initialize(QueryState queryState, QueryPlan queryPlan, 
DriverContext ctx,
+  CompilationOpContext opContext) {
+super.initialize(queryState, queryPlan, ctx, opContext);
+  }
+
+  @Override
+  public int execute(DriverContext driverContext) {
+if (driverContext.getCtx().getExplainAnalyze() == AnalyzeState.RUNNING) {
+  return 0;
+}
+
+try {
+  Hive db = Hive.get(conf);
+  DDLDesc ddlDesc = work.getDDLDesc();
+
+  if (DESC_TO_OPARATION.containsKey(ddlDesc.getClass())) {
+DDLOperation ddlOperation = 
DESC_TO_OPARATION.get(ddlDesc.getClass()).newInstance();
 
 Review comment:
   Using a constructor would mean that all the actual Operation classes should 
declare a constructor just to pass these arguments to the super class, which 
would lead to a lot of extra code doing nothing specific.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259325961
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/AlterDatabaseDesc.java
 ##
 @@ -16,112 +16,100 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
 import java.util.Map;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
 import org.apache.hadoop.hive.ql.parse.ReplicationSpec;
+import org.apache.hadoop.hive.ql.plan.Explain;
 import org.apache.hadoop.hive.ql.plan.Explain.Level;
+import org.apache.hadoop.hive.ql.plan.PrincipalDesc;
 
 /**
- * AlterDatabaseDesc.
- *
+ * DDL task description for ALTER DATABASE commands.
  */
 @Explain(displayName = "Alter Database", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
 public class AlterDatabaseDesc extends DDLDesc implements Serializable {
-
   private static final long serialVersionUID = 1L;
 
-  // Only altering the database property and owner is currently supported
-  public static enum ALTER_DB_TYPES {
-ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
-  };
-
-  ALTER_DB_TYPES alterType;
-  String databaseName;
-  Map dbProperties;
-  PrincipalDesc ownerPrincipal;
-  ReplicationSpec replicationSpec;
-  String location;
+  static {
+DDLTask2.registerOperator(AlterDatabaseDesc.class, 
AlterDatabaseOperation.class);
+  }
 
   /**
-   * For serialization only.
+   * Supported type of alter db commands.
+   * Only altering the database property and owner is currently supported
*/
-  public AlterDatabaseDesc() {
-  }
+  public enum AlterDbType {
+ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
+  };
 
-  public AlterDatabaseDesc(String databaseName, Map dbProps,
 
 Review comment:
   This would result in having longer classes, and putting one indentation 
level lower everything. Also later I'd consider moving the related parts of the 
DDLSemanticAnalyzer next to these classes in a third class, which if put into 
the same class would also result even longer container classes. I believe 
having a separate class for Desc, Operation and Analyzer is cleaner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259324982
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/SwitchDatabaseDesc.java
 ##
 @@ -16,37 +16,34 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
-import org.apache.hadoop.hive.ql.plan.Explain.Level;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
+import org.apache.hadoop.hive.ql.plan.Explain;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
 
 /**
- * SwitchDatabaseDesc.
- *
+ * DDL task description for SWITCH DATABASE commands.
 
 Review comment:
   you are right! Let's modify the comment for now, then in a separate jira 
modify it everywhere to USE database, ok?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259324164
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLOperation.java
 ##
 @@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.DataOutputStream;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatUtils;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Abstract ancestor class of all DDL Operation classes.
+ */
+public abstract class DDLOperation {
+  protected static final Logger LOG = 
LoggerFactory.getLogger("hive.ql.exec.DDLTask");
+
+  protected Hive db;
 
 Review comment:
   Have considered it, but as they are populated via the init method they can 
not be final. passing them via constructor would help, but then all the 
operation classes should have a constructor with the same arguments just to 
pass these variables...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259318829
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/CreateDatabaseOperation.java
 ##
 @@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.AlreadyExistsException;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.session.SessionState;
+
+/**
+ * Operation process of creating a database.
+ */
+public class CreateDatabaseOperation extends DDLOperation {
+  private static final String DATABASE_PATH_SUFFIX = ".db";
+
+  @Override
+  public int execute() throws HiveException {
+Database database = new Database();
+database.setName(ddlDesc.getName());
+database.setDescription(ddlDesc.getComment());
+database.setLocationUri(ddlDesc.getLocationUri());
+database.setParameters(ddlDesc.getDatabaseProperties());
+database.setOwnerName(SessionState.getUserFromAuthenticator());
+database.setOwnerType(PrincipalType.USER);
+
+try {
+  makeLocationQualified(database);
+  db.createDatabase(database, ddlDesc.getIfNotExists());
+} catch (AlreadyExistsException ex) {
+  //it would be better if AlreadyExistsException had an errorCode field
+  throw new HiveException(ex, ErrorMsg.DATABASE_ALREADY_EXISTS, 
ddlDesc.getName());
+}
+
+return 0;
+  }
+
+  private void makeLocationQualified(Database database) throws HiveException {
 
 Review comment:
   Valid, but not in the scope of this. Let's cut it to pieces, later modify it 
in a different jira.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259317832
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLDesc.java
 ##
 @@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+
+/**
+ * Abstract ancestor of all DDL operation descriptors.
+ */
+public abstract class DDLDesc implements Serializable {
 
 Review comment:
   true, will modify


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259317746
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLOperation.java
 ##
 @@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.DataOutputStream;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatUtils;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Abstract ancestor class of all DDL Operation classes.
+ */
+public abstract class DDLOperation {
+  protected static final Logger LOG = 
LoggerFactory.getLogger("hive.ql.exec.DDLTask");
+
+  protected Hive db;
+  protected HiveConf conf;
+  protected DriverContext driverContext;
+  protected T ddlDesc;
+  protected MetaDataFormatter formatter;
+
+  @SuppressWarnings("unchecked")
+  public void init(Hive db, HiveConf conf, DriverContext driverContext, 
DDLDesc ddlDesc) {
+this.db = db;
+this.conf = conf;
+this.driverContext = driverContext;
+this.ddlDesc = (T)ddlDesc;
 
 Review comment:
   will fix, where can I obtain this hive formatter?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259317522
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLTask2.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.ExplainConfiguration.AnalyzeState;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+/**
+ * DDLTask implementation.
+**/
+public class DDLTask2 extends Task implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private static final Map, Class>> DESC_TO_OPARATION =
+  new HashMap<>();
+  public static void registerOperator(Class descClass,
+  Class> operationClass) {
+DESC_TO_OPARATION.put(descClass, operationClass);
+  }
+
+  @Override
+  public boolean requireLock() {
+return this.work != null && this.work.getNeedLock();
+  }
+
+  @Override
+  public void initialize(QueryState queryState, QueryPlan queryPlan, 
DriverContext ctx,
+  CompilationOpContext opContext) {
+super.initialize(queryState, queryPlan, ctx, opContext);
+  }
+
+  @Override
+  public int execute(DriverContext driverContext) {
+if (driverContext.getCtx().getExplainAnalyze() == AnalyzeState.RUNNING) {
+  return 0;
+}
+
+try {
+  Hive db = Hive.get(conf);
+  DDLDesc ddlDesc = work.getDDLDesc();
+
+  if (DESC_TO_OPARATION.containsKey(ddlDesc.getClass())) {
+DDLOperation ddlOperation = 
DESC_TO_OPARATION.get(ddlDesc.getClass()).newInstance();
+ddlOperation.init(db, conf, driverContext, ddlDesc);
+return ddlOperation.execute();
+  } else {
+throw new IllegalArgumentException("Unknown DDL request: " + 
ddlDesc.getClass());
+  }
+} catch (Throwable e) {
+  failed(e);
+  return 1;
 
 Review comment:
   This is a valid suggestion, but not in the scope of this refactoring. First, 
let's cut it to pieces, later we can create a jira for more informative error 
codes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259317265
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/AlterDatabaseDesc.java
 ##
 @@ -16,112 +16,100 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
 import java.util.Map;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
 import org.apache.hadoop.hive.ql.parse.ReplicationSpec;
+import org.apache.hadoop.hive.ql.plan.Explain;
 import org.apache.hadoop.hive.ql.plan.Explain.Level;
+import org.apache.hadoop.hive.ql.plan.PrincipalDesc;
 
 /**
- * AlterDatabaseDesc.
- *
+ * DDL task description for ALTER DATABASE commands.
  */
 @Explain(displayName = "Alter Database", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
 public class AlterDatabaseDesc extends DDLDesc implements Serializable {
-
   private static final long serialVersionUID = 1L;
 
-  // Only altering the database property and owner is currently supported
-  public static enum ALTER_DB_TYPES {
-ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
-  };
-
-  ALTER_DB_TYPES alterType;
-  String databaseName;
-  Map dbProperties;
-  PrincipalDesc ownerPrincipal;
-  ReplicationSpec replicationSpec;
-  String location;
+  static {
+DDLTask2.registerOperator(AlterDatabaseDesc.class, 
AlterDatabaseOperation.class);
 
 Review comment:
   yes, but it is not a problem, as we may not possibly need to register it as 
long as no XXXDesc object was created.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259316748
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLWork2.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import org.apache.hadoop.hive.ql.hooks.ReadEntity;
+import org.apache.hadoop.hive.ql.hooks.WriteEntity;
+
+import java.io.Serializable;
+
+import java.util.Set;
+
+/**
+ * A DDL operation.
+ */
+public class DDLWork2 implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private DDLDesc ddlDesc;
 
 Review comment:
   sure!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259316684
 
 

 ##
 File path: 
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/HCatSemanticAnalyzerBase.java
 ##
 @@ -122,6 +128,13 @@ protected void 
authorizeDDLWork(HiveSemanticAnalyzerHookContext context,
   Hive hive, DDLWork work) throws HiveException {
   }
 
+  /**
+   * Authorized the given DDLWork2. It is only for the interim time while 
DDLTask and DDLWork are being refactored.
+   */
+  protected void authorizeDDLWork2(HiveSemanticAnalyzerHookContext context,
 
 Review comment:
   In the long term yes, for now it must stay there as there are works done by 
the original DDLWork, and others by the DDLWork2. When all the refactorings are 
done there going to be only one authorizeDDLWork.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] miklosgergely commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
miklosgergely commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259316348
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLOperation.java
 ##
 @@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.DataOutputStream;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatUtils;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Abstract ancestor class of all DDL Operation classes.
+ */
+public abstract class DDLOperation {
+  protected static final Logger LOG = 
LoggerFactory.getLogger("hive.ql.exec.DDLTask");
+
+  protected Hive db;
+  protected HiveConf conf;
+  protected DriverContext driverContext;
+  protected T ddlDesc;
+  protected MetaDataFormatter formatter;
+
+  @SuppressWarnings("unchecked")
+  public void init(Hive db, HiveConf conf, DriverContext driverContext, 
DDLDesc ddlDesc) {
+this.db = db;
+this.conf = conf;
+this.driverContext = driverContext;
+this.ddlDesc = (T)ddlDesc;
+this.formatter = MetaDataFormatUtils.getFormatter(conf);
+  }
+
+  public abstract int execute() throws HiveException;
+
+  protected DataOutputStream getOutputStream(String resFile) throws 
HiveException {
 
 Review comment:
   ok, I'll remove it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310998
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/plan/ReplSetFirstIncLoadFlagDesc.java
 ##
 @@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.plan;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
+
+import java.io.Serializable;
+
+/**
+ * ReplSetFirstIncLoadFlagDesc.
+ *
+ */
+@Explain(displayName = "Set First Incr Load Flag", explainLevels = { 
Level.USER, Level.DEFAULT, Level.EXTENDED })
+public class ReplSetFirstIncLoadFlagDesc extends DDLDesc implements 
Serializable {
+
+  private static final long serialVersionUID = 1L;
+  String databaseName;
+  String tableName;
+
+  /**
+   * For serialization only.
+   */
+  public ReplSetFirstIncLoadFlagDesc() {
+  }
+
+  public ReplSetFirstIncLoadFlagDesc(String databaseName, String tableName) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259314021
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -133,8 +161,11 @@ protected int execute(DriverContext driverContext) {
 return 6;
   }
   long writeId = Long.parseLong(writeIdString);
-  toPath = new Path(toPath, 
AcidUtils.baseOrDeltaSubdir(work.getDeleteDestIfExist(), writeId, writeId,
-  
driverContext.getCtx().getHiveTxnManager().getStmtIdAndIncrement()));
+  // Set stmt id 0 for bootstrap load as the directory needs to be 
searched during incremental load to avoid any
+  // duplicate copy from the source. Check HIVE-21197 for more detail.
+  int stmtId = (writeId == 
ReplUtils.REPL_BOOTSTRAP_MIGRATION_BASE_WRITE_ID) ? 0 :
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259312676
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -158,6 +159,15 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
 // Add the checkpoint key to the Database binding it to current dump 
directory.
 // So, if retry using same dump, we shall skip Database object update.
 parameters.put(ReplUtils.REPL_CHECKPOINT_KEY, dumpDirectory);
+
+if (needSetIncFlag) {
 
 Review comment:
   in case of alter case need not set the flag ..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259312460
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
 ##
 @@ -71,6 +74,20 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) 
throws Exception {
 }
   }
 
+  @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws 
TException {
+try {
+  Database database = rs.getDatabase(getDefaultCatalog(conf), dbName);
+  if (database != null) {
+return !ReplUtils.isFirstIncDone(database.getParameters());
+  }
+  LOG.info("Unable to find database " + dbName);
 
 Review comment:
   same as other method in compactor thread


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310998
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/plan/ReplSetFirstIncLoadFlagDesc.java
 ##
 @@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.plan;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
+
+import java.io.Serializable;
+
+/**
+ * ReplSetFirstIncLoadFlagDesc.
+ *
+ */
+@Explain(displayName = "Set First Incr Load Flag", explainLevels = { 
Level.USER, Level.DEFAULT, Level.EXTENDED })
+public class ReplSetFirstIncLoadFlagDesc extends DDLDesc implements 
Serializable {
+
+  private static final long serialVersionUID = 1L;
+  String databaseName;
+  String tableName;
+
+  /**
+   * For serialization only.
+   */
+  public ReplSetFirstIncLoadFlagDesc() {
+  }
+
+  public ReplSetFirstIncLoadFlagDesc(String databaseName, String tableName) {
 
 Review comment:
   as its just setting it to false always ..changed the name of the method


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310718
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1147,6 +1145,12 @@ private static void createReplImportTasks(
   if (!waitOnPrecursor){
 throw new 
SemanticException(ErrorMsg.DATABASE_NOT_EXISTS.getMsg(tblDesc.getDatabaseName()));
   }
+  // For warehouse level replication, if the database itself is getting 
created in this load, then no need to
+  // check for duplicate copy. Check HIVE-21197 for more detail.
+  firstIncDone = true;
 
 Review comment:
   if check was already thread and ternary will not give any perf advantage


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310336
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1136,6 +1133,7 @@ private static void createReplImportTasks(
 
 Task dropTblTask = null;
 WriteEntity.WriteType lockType = WriteEntity.WriteType.DDL_NO_LOCK;
+Boolean firstIncDone;
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259310189
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +188,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
+if (parameter == null) {
+  return true;
+}
+String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG);
 
 Review comment:
   its from map and not from conf . Not done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259309835
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -271,12 +302,13 @@ public String getName() {
 LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
 if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
   ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-  if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
 rcwork.setDeleteDestIfExist(true);
 rcwork.setAutoPurge(isAutoPurge);
 rcwork.setNeedRecycle(needRecycle);
   }
   rcwork.setCopyToMigratedTxnTable(copyToMigratedTxnTable);
+  rcwork.setCheckDuplicateCopy(replicationSpec.needDupCopyCheck());
 
 Review comment:
   not done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259309699
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -61,6 +62,25 @@ public ReplCopyTask(){
 super();
   }
 
+  // If file is already present in base directory, then remove it from the 
list.
+  // Check  HIVE-21197 for more detail
+  private void updateSrcFileListForDupCopy(FileSystem dstFs, Path toPath, 
List srcFiles,
+   long writeId, int stmtId) throws 
IOException {
+ListIterator iter = srcFiles.listIterator();
+Path basePath = new Path(toPath, AcidUtils.baseOrDeltaSubdir(true, 
writeId, writeId, stmtId));
+while (iter.hasNext()) {
+  Path filePath = new Path(basePath, 
iter.next().getSourcePath().getName());
+  try {
+if (dstFs.getFileStatus(filePath) != null) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259309300
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigration - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationMisc {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationWithTableMigrationMisc.class);
+  private static WarehouseInstance primary, replica;
+  private String primaryDbName, replicatedDbName;
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+HashMap overrideProperties = new HashMap<>();
+internalBeforeClassSetup(overrideProperties);
+  }
+
+  static void internalBeforeClassSetup(Map overrideConfigs) 
throws Exception {
+HiveConf conf = new HiveConf(TestReplicationWithTableMigrationMisc.class);
+conf.set("dfs.client.use.datanode.hostname", "true");
+conf.set("hadoop.proxyuser." + Utils.getUGI().getShortUserName() + 
".hosts", "*");
+MiniDFSCluster miniDFSCluster =
+new MiniDFSCluster.Builder(conf).numDataNodes(1).format(true).build();
+final DistributedFileSystem fs = miniDFSCluster.getFileSystem();
+HashMap hiveConfigs = new HashMap() {{
+  put("fs.defaultFS", fs.getUri().toString());
+  put("hive.support.concurrency", "true");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.strict.managed.tables", "true");
+  put("hive.metastore.transactional.event.listeners", "");
+}};
+replica = new WarehouseInstance(LOG, miniDFSCluster, hiveConfigs);
+
+HashMap configsForPrimary = new HashMap() 
{{
+  put("fs.defaultFS", fs.getUri().toString());
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.support.concurrency", "false");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager");
+  put("hive.strict.managed.tables", "false");
+}};
+configsForPrimary.putAll(overrideConfigs);
+primary = new 

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259308757
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigration - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationMisc {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259308804
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
 
 Review comment:
   not done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259308377
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
 ##
 @@ -610,6 +610,11 @@ public void testIncrementalDumpOfWarehouse() throws 
Throwable {
 .run("show tables")
 .verifyResults(new String[] { "t1" });
 
+//default db is already created at replica before load. So the incr flag 
should not be set.
+
assertTrue(ReplUtils.isFirstIncDone(replica.getDatabase("default").getParameters()));
+
assertTrue(!ReplUtils.isFirstIncDone(replica.getDatabase(primaryDbName).getParameters()));
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259277982
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLOperation.java
 ##
 @@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.DataOutputStream;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatUtils;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Abstract ancestor class of all DDL Operation classes.
+ */
+public abstract class DDLOperation {
+  protected static final Logger LOG = 
LoggerFactory.getLogger("hive.ql.exec.DDLTask");
+
+  protected Hive db;
+  protected HiveConf conf;
+  protected DriverContext driverContext;
+  protected T ddlDesc;
+  protected MetaDataFormatter formatter;
+
+  @SuppressWarnings("unchecked")
+  public void init(Hive db, HiveConf conf, DriverContext driverContext, 
DDLDesc ddlDesc) {
+this.db = db;
+this.conf = conf;
+this.driverContext = driverContext;
+this.ddlDesc = (T)ddlDesc;
+this.formatter = MetaDataFormatUtils.getFormatter(conf);
+  }
+
+  public abstract int execute() throws HiveException;
+
+  protected DataOutputStream getOutputStream(String resFile) throws 
HiveException {
 
 Review comment:
   deprecate or remove this method - promote using Path instead of String


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259285841
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/AlterDatabaseDesc.java
 ##
 @@ -16,112 +16,100 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
 import java.util.Map;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
 import org.apache.hadoop.hive.ql.parse.ReplicationSpec;
+import org.apache.hadoop.hive.ql.plan.Explain;
 import org.apache.hadoop.hive.ql.plan.Explain.Level;
+import org.apache.hadoop.hive.ql.plan.PrincipalDesc;
 
 /**
- * AlterDatabaseDesc.
- *
+ * DDL task description for ALTER DATABASE commands.
  */
 @Explain(displayName = "Alter Database", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
 public class AlterDatabaseDesc extends DDLDesc implements Serializable {
-
   private static final long serialVersionUID = 1L;
 
-  // Only altering the database property and owner is currently supported
-  public static enum ALTER_DB_TYPES {
-ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
-  };
-
-  ALTER_DB_TYPES alterType;
-  String databaseName;
-  Map dbProperties;
-  PrincipalDesc ownerPrincipal;
-  ReplicationSpec replicationSpec;
-  String location;
+  static {
+DDLTask2.registerOperator(AlterDatabaseDesc.class, 
AlterDatabaseOperation.class);
 
 Review comment:
   this is executed at class load time; if there are no classes importing this 
class ; the desc/operation will not be registered


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259278401
 
 

 ##
 File path: 
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/HCatSemanticAnalyzerBase.java
 ##
 @@ -122,6 +128,13 @@ protected void 
authorizeDDLWork(HiveSemanticAnalyzerHookContext context,
   Hive hive, DDLWork work) throws HiveException {
   }
 
+  /**
+   * Authorized the given DDLWork2. It is only for the interim time while 
DDLTask and DDLWork are being refactored.
+   */
+  protected void authorizeDDLWork2(HiveSemanticAnalyzerHookContext context,
 
 Review comment:
   can we avoid adding the "2" to the function name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259293344
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/DescDatabaseOperation.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.Map;
+import java.util.TreeMap;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of describing a database.
+ */
+public class DescDatabaseOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
 
 Review comment:
   `IOUtils.closeStream` contains an Exception blackhole.
   use try-with-resources; and let's take our chances :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259298189
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
 ##
 @@ -2571,11 +2567,10 @@ private void analyzeDescDatabase(ASTNode ast) throws 
SemanticException {
   throw new SemanticException("Unexpected Tokens at DESCRIBE DATABASE");
 }
 
-DescDatabaseDesc descDbDesc = new DescDatabaseDesc(ctx.getResFile(),
-dbName, isExtended);
+DescDatabaseDesc descDbDesc = new DescDatabaseDesc(ctx.getResFile(), 
dbName, isExtended);
 inputs.add(new ReadEntity(getDatabase(dbName)));
-rootTasks.add(TaskFactory.get(new DDLWork(getInputs(), getOutputs(), 
descDbDesc)));
-setFetchTask(createFetchTask(descDbDesc.getSchema()));
+rootTasks.add(TaskFactory.get(new DDLWork2(getInputs(), getOutputs(), 
descDbDesc)));
+setFetchTask(createFetchTask(DESC_DATABASE_SCHEMA));
 
 Review comment:
   I think the `schema` should works similarily to earlier; how about asking 
the DDLWork for the schema? it could look up based on the desc


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259288933
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLTask2.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.ExplainConfiguration.AnalyzeState;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+/**
+ * DDLTask implementation.
+**/
+public class DDLTask2 extends Task implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private static final Map, Class>> DESC_TO_OPARATION =
+  new HashMap<>();
+  public static void registerOperator(Class descClass,
+  Class> operationClass) {
+DESC_TO_OPARATION.put(descClass, operationClass);
+  }
+
+  @Override
+  public boolean requireLock() {
+return this.work != null && this.work.getNeedLock();
+  }
+
+  @Override
+  public void initialize(QueryState queryState, QueryPlan queryPlan, 
DriverContext ctx,
+  CompilationOpContext opContext) {
+super.initialize(queryState, queryPlan, ctx, opContext);
+  }
+
+  @Override
+  public int execute(DriverContext driverContext) {
+if (driverContext.getCtx().getExplainAnalyze() == AnalyzeState.RUNNING) {
+  return 0;
+}
+
+try {
+  Hive db = Hive.get(conf);
+  DDLDesc ddlDesc = work.getDDLDesc();
+
+  if (DESC_TO_OPARATION.containsKey(ddlDesc.getClass())) {
+DDLOperation ddlOperation = 
DESC_TO_OPARATION.get(ddlDesc.getClass()).newInstance();
+ddlOperation.init(db, conf, driverContext, ddlDesc);
 
 Review comment:
   before getting into trouble with this - I think it would be better to 
introduce some context to encapsulate `db,conf,driverContext` ; from these 3; I 
think `conf` is redundant; but probably `db` as well


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259293192
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/DescDatabaseOperation.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.Map;
+import java.util.TreeMap;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of describing a database.
+ */
+public class DescDatabaseOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
+try {
+  Database database = db.getDatabase(ddlDesc.getDatabaseName());
+  if (database == null) {
+throw new HiveException(ErrorMsg.DATABASE_NOT_EXISTS, 
ddlDesc.getDatabaseName());
+  }
+
+  Map params = null;
+  if (ddlDesc.isExt()) {
+params = database.getParameters();
+  }
+
+  // If this is a q-test, let's order the params map (lexicographically) by
+  // key. This is to get consistent param ordering between Java7 and Java8.
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) && params 
!= null) {
+params = new TreeMap(params);
+  }
+
+  String location = database.getLocationUri();
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+location = "location/in/test";
+  }
+
+  PrincipalType ownerType = database.getOwnerType();
+  formatter.showDatabaseDescription(outStream, database.getName(), 
database.getDescription(), location,
+  database.getOwnerName(), (null == ownerType) ? null : 
ownerType.name(), params);
 
 Review comment:
   change `showDatabaseDescription` to be able to handle `PrincipalType` 
argument 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259292273
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/DescDatabaseOperation.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.Map;
+import java.util.TreeMap;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of describing a database.
+ */
+public class DescDatabaseOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
+try {
+  Database database = db.getDatabase(ddlDesc.getDatabaseName());
+  if (database == null) {
+throw new HiveException(ErrorMsg.DATABASE_NOT_EXISTS, 
ddlDesc.getDatabaseName());
+  }
+
+  Map params = null;
+  if (ddlDesc.isExt()) {
+params = database.getParameters();
+  }
+
+  // If this is a q-test, let's order the params map (lexicographically) by
+  // key. This is to get consistent param ordering between Java7 and Java8.
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) && params 
!= null) {
+params = new TreeMap(params);
+  }
+
+  String location = database.getLocationUri();
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST)) {
+location = "location/in/test";
 
 Review comment:
   I don't think the "desc database" should be customized because we have 
testswe should have some other way to address this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259292605
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/DescDatabaseOperation.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.Map;
+import java.util.TreeMap;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of describing a database.
+ */
+public class DescDatabaseOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
+try {
+  Database database = db.getDatabase(ddlDesc.getDatabaseName());
+  if (database == null) {
+throw new HiveException(ErrorMsg.DATABASE_NOT_EXISTS, 
ddlDesc.getDatabaseName());
+  }
+
+  Map params = null;
+  if (ddlDesc.isExt()) {
+params = database.getParameters();
+  }
+
+  // If this is a q-test, let's order the params map (lexicographically) by
+  // key. This is to get consistent param ordering between Java7 and Java8.
+  if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) && params 
!= null) {
+params = new TreeMap(params);
 
 Review comment:
   let's do this alwregardless we are in test; what this means is that the 
order in which the `desc` command outputs props is undefined unless we are 
running tests...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259287483
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLOperation.java
 ##
 @@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.DataOutputStream;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatUtils;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Abstract ancestor class of all DDL Operation classes.
+ */
+public abstract class DDLOperation {
+  protected static final Logger LOG = 
LoggerFactory.getLogger("hive.ql.exec.DDLTask");
+
+  protected Hive db;
+  protected HiveConf conf;
+  protected DriverContext driverContext;
+  protected T ddlDesc;
+  protected MetaDataFormatter formatter;
+
+  @SuppressWarnings("unchecked")
+  public void init(Hive db, HiveConf conf, DriverContext driverContext, 
DDLDesc ddlDesc) {
+this.db = db;
+this.conf = conf;
+this.driverContext = driverContext;
+this.ddlDesc = (T)ddlDesc;
 
 Review comment:
   I think hive formatter would write `(T) ddlDesc` (there is a space after 
`(T)`)
   do you have it enabled?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259280708
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLWork2.java
 ##
 @@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import org.apache.hadoop.hive.ql.hooks.ReadEntity;
+import org.apache.hadoop.hive.ql.hooks.WriteEntity;
+
+import java.io.Serializable;
+
+import java.util.Set;
+
+/**
+ * A DDL operation.
+ */
+public class DDLWork2 implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private DDLDesc ddlDesc;
 
 Review comment:
   final?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259287150
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLTask2.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.ExplainConfiguration.AnalyzeState;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+/**
+ * DDLTask implementation.
+**/
+public class DDLTask2 extends Task implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private static final Map, Class>> DESC_TO_OPARATION =
+  new HashMap<>();
+  public static void registerOperator(Class descClass,
+  Class> operationClass) {
+DESC_TO_OPARATION.put(descClass, operationClass);
+  }
+
+  @Override
+  public boolean requireLock() {
+return this.work != null && this.work.getNeedLock();
+  }
+
+  @Override
+  public void initialize(QueryState queryState, QueryPlan queryPlan, 
DriverContext ctx,
+  CompilationOpContext opContext) {
+super.initialize(queryState, queryPlan, ctx, opContext);
+  }
+
+  @Override
+  public int execute(DriverContext driverContext) {
+if (driverContext.getCtx().getExplainAnalyze() == AnalyzeState.RUNNING) {
+  return 0;
+}
+
+try {
+  Hive db = Hive.get(conf);
+  DDLDesc ddlDesc = work.getDDLDesc();
+
+  if (DESC_TO_OPARATION.containsKey(ddlDesc.getClass())) {
+DDLOperation ddlOperation = 
DESC_TO_OPARATION.get(ddlDesc.getClass()).newInstance();
+ddlOperation.init(db, conf, driverContext, ddlDesc);
+return ddlOperation.execute();
+  } else {
+throw new IllegalArgumentException("Unknown DDL request: " + 
ddlDesc.getClass());
+  }
+} catch (Throwable e) {
+  failed(e);
+  return 1;
 
 Review comment:
   all ddl errors are translated to error code 1; I'm not sure if that we want 
that...
   
   * we declare some exception which could haul the errorcode; and then 
unpack/log it here
   * not sure about the contracts this class has - but catching throwable 
includes all kind of crap (like OOM) - are we really want to do that?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259288277
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLTask2.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.QueryPlan;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.ExplainConfiguration.AnalyzeState;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+/**
+ * DDLTask implementation.
+**/
+public class DDLTask2 extends Task implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private static final Map, Class>> DESC_TO_OPARATION =
+  new HashMap<>();
+  public static void registerOperator(Class descClass,
+  Class> operationClass) {
+DESC_TO_OPARATION.put(descClass, operationClass);
+  }
+
+  @Override
+  public boolean requireLock() {
+return this.work != null && this.work.getNeedLock();
+  }
+
+  @Override
+  public void initialize(QueryState queryState, QueryPlan queryPlan, 
DriverContext ctx,
+  CompilationOpContext opContext) {
+super.initialize(queryState, queryPlan, ctx, opContext);
+  }
+
+  @Override
+  public int execute(DriverContext driverContext) {
+if (driverContext.getCtx().getExplainAnalyze() == AnalyzeState.RUNNING) {
+  return 0;
+}
+
+try {
+  Hive db = Hive.get(conf);
+  DDLDesc ddlDesc = work.getDDLDesc();
+
+  if (DESC_TO_OPARATION.containsKey(ddlDesc.getClass())) {
+DDLOperation ddlOperation = 
DESC_TO_OPARATION.get(ddlDesc.getClass()).newInstance();
 
 Review comment:
   instead of `init(x)` could we use a constructor with arguments?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259285374
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/AlterDatabaseDesc.java
 ##
 @@ -16,112 +16,100 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
 import java.util.Map;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
 import org.apache.hadoop.hive.ql.parse.ReplicationSpec;
+import org.apache.hadoop.hive.ql.plan.Explain;
 import org.apache.hadoop.hive.ql.plan.Explain.Level;
+import org.apache.hadoop.hive.ql.plan.PrincipalDesc;
 
 /**
- * AlterDatabaseDesc.
- *
+ * DDL task description for ALTER DATABASE commands.
  */
 @Explain(displayName = "Alter Database", explainLevels = { Level.USER, 
Level.DEFAULT, Level.EXTENDED })
 public class AlterDatabaseDesc extends DDLDesc implements Serializable {
-
   private static final long serialVersionUID = 1L;
 
-  // Only altering the database property and owner is currently supported
-  public static enum ALTER_DB_TYPES {
-ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
-  };
-
-  ALTER_DB_TYPES alterType;
-  String databaseName;
-  Map dbProperties;
-  PrincipalDesc ownerPrincipal;
-  ReplicationSpec replicationSpec;
-  String location;
+  static {
+DDLTask2.registerOperator(AlterDatabaseDesc.class, 
AlterDatabaseOperation.class);
+  }
 
   /**
-   * For serialization only.
+   * Supported type of alter db commands.
+   * Only altering the database property and owner is currently supported
*/
-  public AlterDatabaseDesc() {
-  }
+  public enum AlterDbType {
+ALTER_PROPERTY, ALTER_OWNER, ALTER_LOCATION
+  };
 
-  public AlterDatabaseDesc(String databaseName, Map dbProps,
 
 Review comment:
   how about putting the operation/desc into the same "container" class?
   
   ```
   public class AlterDatabase {
 public class Desc extends DDLDesc implements Serializable {
   [...]
 }
 public class Operation extends DDLOperation {
   [...]
 }
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259291262
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/CreateDatabaseOperation.java
 ##
 @@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.AlreadyExistsException;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.PrincipalType;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.session.SessionState;
+
+/**
+ * Operation process of creating a database.
+ */
+public class CreateDatabaseOperation extends DDLOperation {
+  private static final String DATABASE_PATH_SUFFIX = ".db";
+
+  @Override
+  public int execute() throws HiveException {
+Database database = new Database();
+database.setName(ddlDesc.getName());
+database.setDescription(ddlDesc.getComment());
+database.setLocationUri(ddlDesc.getLocationUri());
+database.setParameters(ddlDesc.getDatabaseProperties());
+database.setOwnerName(SessionState.getUserFromAuthenticator());
+database.setOwnerType(PrincipalType.USER);
+
+try {
+  makeLocationQualified(database);
+  db.createDatabase(database, ddlDesc.getIfNotExists());
+} catch (AlreadyExistsException ex) {
+  //it would be better if AlreadyExistsException had an errorCode field
+  throw new HiveException(ex, ErrorMsg.DATABASE_ALREADY_EXISTS, 
ddlDesc.getName());
+}
+
+return 0;
+  }
+
+  private void makeLocationQualified(Database database) throws HiveException {
 
 Review comment:
   please make this method to work based on the `locationUri` instead of some 
operation on `Database`
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259287685
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLDesc.java
 ##
 @@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.Serializable;
+
+/**
+ * Abstract ancestor of all DDL operation descriptors.
+ */
+public abstract class DDLDesc implements Serializable {
 
 Review comment:
   why this isn't an interface?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259295178
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/ShowDatabasesOperation.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl.database;
+
+import java.io.DataOutputStream;
+import java.util.List;
+
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLOperation;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.io.IOUtils;
+
+/**
+ * Operation process of locking a database.
+ */
+public class ShowDatabasesOperation extends DDLOperation {
+  @Override
+  public int execute() throws HiveException {
+// get the databases for the desired pattern - populate the output stream
+List databases = null;
+if (ddlDesc.getPattern() != null) {
+  LOG.debug("pattern: {}", ddlDesc.getPattern());
+  databases = db.getDatabasesByPattern(ddlDesc.getPattern());
+} else {
+  databases = db.getAllDatabases();
+}
+
+LOG.info("Found {} database(s) matching the SHOW DATABASES statement.", 
databases.size());
+
+// write the results in the file
+DataOutputStream outStream = getOutputStream(ddlDesc.getResFile());
+try {
+  formatter.showDatabases(outStream, databases);
 
 Review comment:
   there seems to be a `formatter` for each of these operations; I guess there 
are less formatter than operations; but I think the formatters should also 
"find" a new home


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259296712
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/database/SwitchDatabaseDesc.java
 ##
 @@ -16,37 +16,34 @@
  * limitations under the License.
  */
 
-package org.apache.hadoop.hive.ql.plan;
+package org.apache.hadoop.hive.ql.exec.ddl.database;
 
 import java.io.Serializable;
-import org.apache.hadoop.hive.ql.plan.Explain.Level;
 
+import org.apache.hadoop.hive.ql.exec.ddl.DDLDesc;
+import org.apache.hadoop.hive.ql.exec.ddl.DDLTask2;
+import org.apache.hadoop.hive.ql.plan.Explain;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
 
 /**
- * SwitchDatabaseDesc.
- *
+ * DDL task description for SWITCH DATABASE commands.
 
 Review comment:
   `SWITCH DATABASE` :)
   I'm not aware if that's a valid statment  - is it? :)
   I usually use `USE db1` ; can we rename things around here to something like 
`UseDatabaseDesc` ?
   
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Alter/UseDatabase
   
   according to:
   ```
   $ git grep -i switch|grep -i database|grep -v HOOK:|grep -v java:|grep -v 
q.out|grep -v js:|less
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:TOK_SWITCHDATABASE;
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:| 
switchDatabaseStatement
   
ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:switchDatabaseStatement
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:@init { 
pushMsg("switch database statement", state); }
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:-> 
^(TOK_SWITCHDATABASE identifier)
   ql/src/test/queries/clientnegative/database_switch_does_not_exist.q:-- Try 
to switch to a database that does not exist
   ```
   "switchdatabase" is also the keyword in the `HiveParser`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up DDLTask 1 - extract Database related operations

2019-02-22 Thread GitBox
kgyrtkirk commented on a change in pull request #543: HIVE-21292: Break up 
DDLTask 1 - extract Database related operations
URL: https://github.com/apache/hive/pull/543#discussion_r259295681
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ddl/DDLOperation.java
 ##
 @@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.ddl;
+
+import java.io.DataOutputStream;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.DriverContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatUtils;
+import org.apache.hadoop.hive.ql.metadata.formatting.MetaDataFormatter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Abstract ancestor class of all DDL Operation classes.
+ */
+public abstract class DDLOperation {
+  protected static final Logger LOG = 
LoggerFactory.getLogger("hive.ql.exec.DDLTask");
+
+  protected Hive db;
 
 Review comment:
   make fields final?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259283514
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -60,6 +60,7 @@
 
   public static final String LAST_REPL_ID_KEY = "hive.repl.last.repl.id";
   public static final String REPL_CHECKPOINT_KEY = "hive.repl.ckpt.key";
+  public static final String REPL_FIRST_INC_PENDING_FLAG = 
"hive.repl.first.inc.pending";
 
 Review comment:
   add a TODO for external bootstrap during inc load


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259283120
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -60,6 +60,7 @@
 
   public static final String LAST_REPL_ID_KEY = "hive.repl.last.repl.id";
   public static final String REPL_CHECKPOINT_KEY = "hive.repl.ckpt.key";
+  public static final String REPL_FIRST_INC_PENDING_FLAG = 
"hive.repl.first.inc.pending";
 
 Review comment:
   add a comment to explain


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259281934
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
 ##
 @@ -53,6 +56,20 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) 
throws Exception {
 }
   }
 
+  @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws 
TException {
 
 Review comment:
   all methods are duplicated so kept it same 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259280119
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -133,8 +161,11 @@ protected int execute(DriverContext driverContext) {
 return 6;
   }
   long writeId = Long.parseLong(writeIdString);
-  toPath = new Path(toPath, 
AcidUtils.baseOrDeltaSubdir(work.getDeleteDestIfExist(), writeId, writeId,
-  
driverContext.getCtx().getHiveTxnManager().getStmtIdAndIncrement()));
+  // Set stmt id 0 for bootstrap load as the directory needs to be 
searched during incremental load to avoid any
+  // duplicate copy from the source. Check HIVE-21197 for more detail.
+  int stmtId = (writeId == 
ReplUtils.REPL_BOOTSTRAP_MIGRATION_BASE_WRITE_ID) ? 0 :
 
 Review comment:
   dont use magic number 0


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259278076
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -661,6 +663,10 @@ public int execute(DriverContext driverContext) {
   if (work.getAlterMaterializedViewDesc() != null) {
 return alterMaterializedView(db, work.getAlterMaterializedViewDesc());
   }
+
+  if (work.getReplSetFirstIncLoadFlagDesc() != null) {
 
 Review comment:
   in repl inc load task ..get the database list and alter the property 
..remove table level property


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259274648
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigration - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationMisc {
 
 Review comment:
   change it to Ex


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259260745
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -158,6 +159,15 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
 // Add the checkpoint key to the Database binding it to current dump 
directory.
 // So, if retry using same dump, we shall skip Database object update.
 parameters.put(ReplUtils.REPL_CHECKPOINT_KEY, dumpDirectory);
+
+if (needSetIncFlag) {
 
 Review comment:
   No need to check it. Instead just set needSetIncFlag directly using 
setBoolVar.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259253737
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/plan/ReplSetFirstIncLoadFlagDesc.java
 ##
 @@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.plan;
+import org.apache.hadoop.hive.ql.plan.Explain.Level;
+
+import java.io.Serializable;
+
+/**
+ * ReplSetFirstIncLoadFlagDesc.
+ *
+ */
+@Explain(displayName = "Set First Incr Load Flag", explainLevels = { 
Level.USER, Level.DEFAULT, Level.EXTENDED })
+public class ReplSetFirstIncLoadFlagDesc extends DDLDesc implements 
Serializable {
+
+  private static final long serialVersionUID = 1L;
+  String databaseName;
+  String tableName;
+
+  /**
+   * For serialization only.
+   */
+  public ReplSetFirstIncLoadFlagDesc() {
+  }
+
+  public ReplSetFirstIncLoadFlagDesc(String databaseName, String tableName) {
 
 Review comment:
   Why is it not taking extra boolean argument for the flag?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259252494
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1136,6 +1133,7 @@ private static void createReplImportTasks(
 
 Task dropTblTask = null;
 WriteEntity.WriteType lockType = WriteEntity.WriteType.DDL_NO_LOCK;
+Boolean firstIncDone;
 
 Review comment:
   Shall use primitive boolean.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259261832
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -60,6 +60,7 @@
 
   public static final String LAST_REPL_ID_KEY = "hive.repl.last.repl.id";
   public static final String REPL_CHECKPOINT_KEY = "hive.repl.ckpt.key";
+  public static final String REPL_FIRST_INC_PENDING_FLAG = 
"hive.repl.first.inc.pending";
 
 Review comment:
   Can we use generic name like "hive.repl.is.data.consistent"? This would 
avoid confusion for incremental bootstrap case of table level replication where 
few tables data will be consistent only after next incremental load.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259252954
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1147,6 +1145,12 @@ private static void createReplImportTasks(
   if (!waitOnPrecursor){
 throw new 
SemanticException(ErrorMsg.DATABASE_NOT_EXISTS.getMsg(tblDesc.getDatabaseName()));
   }
+  // For warehouse level replication, if the database itself is getting 
created in this load, then no need to
+  // check for duplicate copy. Check HIVE-21197 for more detail.
+  firstIncDone = true;
 
 Review comment:
   Shall use ternary operator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259232906
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -661,6 +663,10 @@ public int execute(DriverContext driverContext) {
   if (work.getAlterMaterializedViewDesc() != null) {
 return alterMaterializedView(db, work.getAlterMaterializedViewDesc());
   }
+
+  if (work.getReplSetFirstIncLoadFlagDesc() != null) {
 
 Review comment:
   Why don't we use alterTableDesc or alterDatabaseDesc instead of new type? We 
are just editing a db/table/partition property.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259233648
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -61,6 +62,25 @@ public ReplCopyTask(){
 super();
   }
 
+  // If file is already present in base directory, then remove it from the 
list.
+  // Check  HIVE-21197 for more detail
+  private void updateSrcFileListForDupCopy(FileSystem dstFs, Path toPath, 
List srcFiles,
+   long writeId, int stmtId) throws 
IOException {
+ListIterator iter = srcFiles.listIterator();
+Path basePath = new Path(toPath, AcidUtils.baseOrDeltaSubdir(true, 
writeId, writeId, stmtId));
+while (iter.hasNext()) {
+  Path filePath = new Path(basePath, 
iter.next().getSourcePath().getName());
+  try {
+if (dstFs.getFileStatus(filePath) != null) {
 
 Review comment:
   Shall use fs.exists(path) method instead of getFileStatus. Then, need not 
handle exception as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259254156
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
 ##
 @@ -71,6 +74,20 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) 
throws Exception {
 }
   }
 
+  @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws 
TException {
+try {
+  Database database = rs.getDatabase(getDefaultCatalog(conf), dbName);
+  if (database != null) {
+return !ReplUtils.isFirstIncDone(database.getParameters());
+  }
+  LOG.info("Unable to find database " + dbName);
 
 Review comment:
   Why 2 different flows for No database case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259261039
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -164,6 +165,12 @@ public IncrementalLoadTasksBuilder(String dbName, String 
tableName, String loadP
   lastEventid);
 }
   }
+
+  ReplSetFirstIncLoadFlagDesc desc = new 
ReplSetFirstIncLoadFlagDesc(dbName, tableName);
 
 Review comment:
   Can we combine this with update last replID alterDb task?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259228579
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigration - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationMisc {
 
 Review comment:
   The name can be more meaningful. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259234204
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -133,8 +161,11 @@ protected int execute(DriverContext driverContext) {
 return 6;
   }
   long writeId = Long.parseLong(writeIdString);
-  toPath = new Path(toPath, 
AcidUtils.baseOrDeltaSubdir(work.getDeleteDestIfExist(), writeId, writeId,
-  
driverContext.getCtx().getHiveTxnManager().getStmtIdAndIncrement()));
+  // Set stmt id 0 for bootstrap load as the directory needs to be 
searched during incremental load to avoid any
+  // duplicate copy from the source. Check HIVE-21197 for more detail.
+  int stmtId = (writeId == 
ReplUtils.REPL_BOOTSTRAP_MIGRATION_BASE_WRITE_ID) ? 0 :
 
 Review comment:
   No need to have this check as it is migration flow. Set it to 0 always.
   Also, it is better to add a final variable in ReplUtils for this instead of 
hardcoding it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259237893
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
 ##
 @@ -112,6 +118,12 @@ public void run() {
 continue;
   }
 
+  if (replIsCompactionDisabledForTable(t)) {
 
 Review comment:
   I think, we can disable compaction in DB level only. Even table level 
replication replicates it DB level with list of tables. It will be simple and 
easy to maintain.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259230730
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
+
+import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.SOURCE_OF_REPLICATION;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isFullAcidTable;
+import static org.apache.hadoop.hive.ql.io.AcidUtils.isTransactionalTable;
+import static org.junit.Assert.*;
+
+/**
+ * TestReplicationWithTableMigration - test replication for Hive2 to Hive3 
(Strict managed tables)
+ */
+public class TestReplicationWithTableMigrationMisc {
+  @Rule
+  public final TestName testName = new TestName();
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(TestReplicationWithTableMigrationMisc.class);
+  private static WarehouseInstance primary, replica;
+  private String primaryDbName, replicatedDbName;
+
+  @BeforeClass
+  public static void classLevelSetup() throws Exception {
+HashMap overrideProperties = new HashMap<>();
+internalBeforeClassSetup(overrideProperties);
+  }
+
+  static void internalBeforeClassSetup(Map overrideConfigs) 
throws Exception {
+HiveConf conf = new HiveConf(TestReplicationWithTableMigrationMisc.class);
+conf.set("dfs.client.use.datanode.hostname", "true");
+conf.set("hadoop.proxyuser." + Utils.getUGI().getShortUserName() + 
".hosts", "*");
+MiniDFSCluster miniDFSCluster =
+new MiniDFSCluster.Builder(conf).numDataNodes(1).format(true).build();
+final DistributedFileSystem fs = miniDFSCluster.getFileSystem();
+HashMap hiveConfigs = new HashMap() {{
+  put("fs.defaultFS", fs.getUri().toString());
+  put("hive.support.concurrency", "true");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.strict.managed.tables", "true");
+  put("hive.metastore.transactional.event.listeners", "");
+}};
+replica = new WarehouseInstance(LOG, miniDFSCluster, hiveConfigs);
+
+HashMap configsForPrimary = new HashMap() 
{{
+  put("fs.defaultFS", fs.getUri().toString());
+  put("hive.metastore.client.capability.check", "false");
+  put("hive.repl.bootstrap.dump.open.txn.timeout", "1s");
+  put("hive.exec.dynamic.partition.mode", "nonstrict");
+  put("hive.strict.checks.bucketing", "false");
+  put("hive.mapred.mode", "nonstrict");
+  put("mapred.input.dir.recursive", "true");
+  put("hive.metastore.disallow.incompatible.col.type.changes", "false");
+  put("hive.support.concurrency", "false");
+  put("hive.txn.manager", 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager");
+  put("hive.strict.managed.tables", "false");
+}};
+configsForPrimary.putAll(overrideConfigs);
+primary = new 

[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259235570
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -271,12 +302,13 @@ public String getName() {
 LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
 if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
   ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-  if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
 rcwork.setDeleteDestIfExist(true);
 rcwork.setAutoPurge(isAutoPurge);
 rcwork.setNeedRecycle(needRecycle);
   }
   rcwork.setCopyToMigratedTxnTable(copyToMigratedTxnTable);
+  rcwork.setCheckDuplicateCopy(replicationSpec.needDupCopyCheck());
 
 Review comment:
   needDupCopyCheck seems to be a generic flag but we handle only for migration 
case. Instead, shall change it to isFirstIncAfterBootstrap which makes it clear 
that for some flow, need to handle this case differently.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259230584
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationWithTableMigrationMisc.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.parse;
+
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore;
+import 
org.apache.hadoop.hive.metastore.InjectableBehaviourObjectStore.BehaviourInjection;
+import org.apache.hadoop.hive.metastore.api.CurrentNotificationEventId;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.shims.Utils;
+import org.junit.*;
+import org.junit.rules.TestName;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.util.*;
 
 Review comment:
   Shall avoid importing *.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259254333
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
 ##
 @@ -53,6 +56,20 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) 
throws Exception {
 }
   }
 
+  @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws 
TException {
 
 Review comment:
   Duplicate method. Can we have a static method in MetastoreUtils that takes 
MSC as input?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259227325
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
 ##
 @@ -610,6 +610,11 @@ public void testIncrementalDumpOfWarehouse() throws 
Throwable {
 .run("show tables")
 .verifyResults(new String[] { "t1" });
 
+//default db is already created at replica before load. So the incr flag 
should not be set.
+
assertTrue(ReplUtils.isFirstIncDone(replica.getDatabase("default").getParameters()));
+
assertTrue(!ReplUtils.isFirstIncDone(replica.getDatabase(primaryDbName).getParameters()));
 
 Review comment:
   Shall use assertFalse method to be clear. Check for other occurrences too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-22 Thread GitBox
sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259236827
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +188,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
+if (parameter == null) {
+  return true;
+}
+String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG);
 
 Review comment:
   Shall use getBoolVar to avoid string comparison.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services