[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217840
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 05:44
Start Date: 25/Mar/19 05:44
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217840)
Time Spent: 7h  (was: 6h 50m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Fix For: 4.0.0
>
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217823
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 04:11
Start Date: 25/Mar/19 04:11
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268483462
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   Did you mean, a non-acid table is created as acid table in target (due to 
migration) and later in source they change to external? I think, this case, 
need lot of changes to avoid distcp for external table and so on. I will create 
another ticket for this as there might be other cases too.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217823)
Time Spent: 6h 50m  (was: 6h 40m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217821
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 04:06
Start Date: 25/Mar/19 04:06
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268483462
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   Did you mean, a non-acid table is created as acid table in target (due to 
migration) and later in source they change to external?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217821)
Time Spent: 6h 40m  (was: 6.5h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217815
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 03:40
Start Date: 25/Mar/19 03:40
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480500
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -400,7 +405,26 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   "Unable to change partition or table. Object " +  e.getMessage() + " 
does not exist."
   + " Check metastore logs for detailed stack.");
 } finally {
-  if (!success) {
+  if (success) {
+// Txn was committed successfully.
+// If data location is changed in replication flow, then need to 
delete the old path.
+if (replDataLocationChanged) {
+  assert(olddb != null);
+  assert(oldt != null);
+  Path deleteOldDataLoc = new Path(oldt.getSd().getLocation());
+  boolean isAutoPurge = 
"true".equalsIgnoreCase(oldt.getParameters().get("auto.purge"));
+  try {
+wh.deleteDir(deleteOldDataLoc, true, isAutoPurge, olddb);
+LOG.info("Deleted the old data location: {} for the table: {}",
+deleteOldDataLoc, dbname + "." + name);
+  } catch (MetaException ex) {
+// Eat the exception as it doesn't affect the state of existing 
tables.
+// Expect, user to manually drop this path when exception and so 
logging a warning.
+LOG.warn("Unable to delete the old data location: {} for the 
table: {}",
 
 Review comment:
   I think, if old dir is deleted, then metadata update was already successful. 
In this case, during replay of event in next cycle would not set the flag as 
the table/partition locations were already pointing to new location under base 
dir.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217815)
Time Spent: 6.5h  (was: 6h 20m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217814
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 03:37
Start Date: 25/Mar/19 03:37
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480500
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -400,7 +405,26 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   "Unable to change partition or table. Object " +  e.getMessage() + " 
does not exist."
   + " Check metastore logs for detailed stack.");
 } finally {
-  if (!success) {
+  if (success) {
+// Txn was committed successfully.
+// If data location is changed in replication flow, then need to 
delete the old path.
+if (replDataLocationChanged) {
+  assert(olddb != null);
+  assert(oldt != null);
+  Path deleteOldDataLoc = new Path(oldt.getSd().getLocation());
+  boolean isAutoPurge = 
"true".equalsIgnoreCase(oldt.getParameters().get("auto.purge"));
+  try {
+wh.deleteDir(deleteOldDataLoc, true, isAutoPurge, olddb);
+LOG.info("Deleted the old data location: {} for the table: {}",
+deleteOldDataLoc, dbname + "." + name);
+  } catch (MetaException ex) {
+// Eat the exception as it doesn't affect the state of existing 
tables.
+// Expect, user to manually drop this path when exception and so 
logging a warning.
+LOG.warn("Unable to delete the old data location: {} for the 
table: {}",
 
 Review comment:
   Doesn't matter, we ignore that exception here. REPL LOAD will succeed. Also, 
previous run already archived in CM dir.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217814)
Time Spent: 6h 20m  (was: 6h 10m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217813
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 03:36
Start Date: 25/Mar/19 03:36
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480382
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 && !oldtRelativePath.equalsIgnoreCase(name + Path.SEPARATOR);
 
-if (!tableInSpecifiedLoc) {
+if (replDataLocationChanged) {
+  // If data location is changed in replication flow, then new path 
was already set in
+  // the newt. Also, it is as good as the data is moved and set 
dataWasMoved=true so that
+  // location in partitions are also updated accordingly.
+  destPath = new Path(newt.getSd().getLocation());
 
 Review comment:
   Yes. it is already like that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217813)
Time Spent: 6h 10m  (was: 6h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217811
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 03:35
Start Date: 25/Mar/19 03:35
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480242
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
 
 Review comment:
   I will keep it ReplConst for now. We have ReplUtils for common methods and 
if needed we can add another class. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217811)
Time Spent: 5h 50m  (was: 5h 40m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217808
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 03:33
Start Date: 25/Mar/19 03:33
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268476396
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -400,7 +405,26 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   "Unable to change partition or table. Object " +  e.getMessage() + " 
does not exist."
   + " Check metastore logs for detailed stack.");
 } finally {
-  if (!success) {
+  if (success) {
+// Txn was committed successfully.
+// If data location is changed in replication flow, then need to 
delete the old path.
+if (replDataLocationChanged) {
+  assert(olddb != null);
+  assert(oldt != null);
+  Path deleteOldDataLoc = new Path(oldt.getSd().getLocation());
+  boolean isAutoPurge = 
"true".equalsIgnoreCase(oldt.getParameters().get("auto.purge"));
+  try {
+wh.deleteDir(deleteOldDataLoc, true, isAutoPurge, olddb);
+LOG.info("Deleted the old data location: {} for the table: {}",
+deleteOldDataLoc, dbname + "." + name);
+  } catch (MetaException ex) {
+// Eat the exception as it doesn't affect the state of existing 
tables.
+// Expect, user to manually drop this path when exception and so 
logging a warning.
+LOG.warn("Unable to delete the old data location: {} for the 
table: {}",
 
 Review comment:
   if the delete directory succeeds and the event reply fails for some other 
reason, then the event will be replayed again. In the next replay, if it does 
not finds the directory, cm copy might fail.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217808)
Time Spent: 5.5h  (was: 5h 20m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217806
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 03:32
Start Date: 25/Mar/19 03:32
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268479992
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   I think flag is needed as we shouldn't update partition locations if user 
set location for a table. That is current behaviour and shouldn't be changed. 
Only for repl flow, this is needed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217806)
Time Spent: 5h 20m  (was: 5h 10m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217812
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 03:35
Start Date: 25/Mar/19 03:35
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480328
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
   // 3) the user didn't change the default location (or new location is 
empty), and
   // 4) the table was not initially created with a specified location
-  if (rename
-  && !oldt.getTableType().equals(TableType.VIRTUAL_VIEW.toString())
-  && (oldt.getSd().getLocation().compareTo(newt.getSd().getLocation()) 
== 0
-|| StringUtils.isEmpty(newt.getSd().getLocation()))
-  && !MetaStoreUtils.isExternalTable(oldt)) {
-Database olddb = msdb.getDatabase(catName, dbname);
+  if (replDataLocationChanged
+  || (rename
 
 Review comment:
   It cannot be same because the behaviour is different. If location is set for 
normal flow, we shouldn't update for partitions. Also, it is risky to modify 
any of current behaviour. It is not in scope of this ticket.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217812)
Time Spent: 6h  (was: 5h 50m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217809=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217809
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 03:33
Start Date: 25/Mar/19 03:33
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268480113
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   It is not acid table. It is non-acid managed table.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217809)
Time Spent: 5h 40m  (was: 5.5h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217794
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 02:54
Start Date: 25/Mar/19 02:54
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268475619
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
 
 Review comment:
   The name can be repl common as its in common folder 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217794)
Time Spent: 5h 10m  (was: 5h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217791
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 02:53
Start Date: 25/Mar/19 02:53
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268475390
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 && !oldtRelativePath.equalsIgnoreCase(name + Path.SEPARATOR);
 
-if (!tableInSpecifiedLoc) {
+if (replDataLocationChanged) {
+  // If data location is changed in replication flow, then new path 
was already set in
+  // the newt. Also, it is as good as the data is moved and set 
dataWasMoved=true so that
+  // location in partitions are also updated accordingly.
+  destPath = new Path(newt.getSd().getLocation());
 
 Review comment:
   if for a partition the location is not within the table location and that 
table is altered to a external table ..the partition location need not be 
changed to base path at target ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217791)
Time Spent: 4h 50m  (was: 4h 40m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217793
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 02:53
Start Date: 25/Mar/19 02:53
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268474981
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   the case does not exist that a managed acid table is converted to external 
..it should be always using migration that an acid table will be converted to 
external ..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217793)
Time Spent: 5h  (was: 4h 50m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217790
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 02:52
Start Date: 25/Mar/19 02:52
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268475390
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 && !oldtRelativePath.equalsIgnoreCase(name + Path.SEPARATOR);
 
-if (!tableInSpecifiedLoc) {
+if (replDataLocationChanged) {
+  // If data location is changed in replication flow, then new path 
was already set in
+  // the newt. Also, it is as good as the data is moved and set 
dataWasMoved=true so that
+  // location in partitions are also updated accordingly.
+  destPath = new Path(newt.getSd().getLocation());
 
 Review comment:
   if for a partition the location is not within the table location and that 
table is altered to a external table ..the partition location need not be 
changed to base path ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217790)
Time Spent: 4h 40m  (was: 4.5h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217789
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 02:51
Start Date: 25/Mar/19 02:51
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268475152
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
   // 3) the user didn't change the default location (or new location is 
empty), and
   // 4) the table was not initially created with a specified location
-  if (rename
-  && !oldt.getTableType().equals(TableType.VIRTUAL_VIEW.toString())
-  && (oldt.getSd().getLocation().compareTo(newt.getSd().getLocation()) 
== 0
-|| StringUtils.isEmpty(newt.getSd().getLocation()))
-  && !MetaStoreUtils.isExternalTable(oldt)) {
-Database olddb = msdb.getDatabase(catName, dbname);
+  if (replDataLocationChanged
+  || (rename
 
 Review comment:
   what i meant is the flow can be kept same for replication and normal flow to 
avoid adding extra complexity ..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217789)
Time Spent: 4.5h  (was: 4h 20m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217787
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 02:49
Start Date: 25/Mar/19 02:49
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268474981
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   the case does not exist that a managed acid table is converted to external 
..it should be always in using migration that an acid table will be converted 
to external ..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217787)
Time Spent: 4h 20m  (was: 4h 10m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217786
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 25/Mar/19 02:48
Start Date: 25/Mar/19 02:48
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268474795
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   yes ..i think the flag sent from hive server to meta store using environment 
context is not required ...the code changes are redundant ..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217786)
Time Spent: 4h 10m  (was: 4h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch, 
> HIVE-21471.03.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217591
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 11:13
Start Date: 23/Mar/19 11:13
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381082
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   It is not possible in normal flow to have table type changed from managed to 
external and location set in same ALTER query. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217591)
Time Spent: 4h  (was: 3h 50m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217562
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 04:06
Start Date: 23/Mar/19 04:06
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381343
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -357,6 +368,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 }
   }
 
+  // If data location is changed in replication flow, then need to delete 
the old path.
+  if (replDataLocationChanged) {
+Path deleteOldDataLoc = new Path(oldt.getSd().getLocation());
+boolean isAutoPurge = 
"true".equalsIgnoreCase(oldt.getParameters().get("auto.purge"));
+wh.deleteDir(deleteOldDataLoc, true, isAutoPurge, olddb);
 
 Review comment:
   Good catch. Will move it to finally block only when success=true and eat any 
exception by DeleteDir and just log a warn instead.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217562)
Time Spent: 3h 50m  (was: 3h 40m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217561
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 04:05
Start Date: 23/Mar/19 04:05
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381321
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
 
 Review comment:
   ReplUtils was already taken. I used similar naming as StatsSetupConst. I 
think ReplConst is good enough for now. Will change if any utility methods 
comes up for metastore common.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217561)
Time Spent: 3h 40m  (was: 3.5h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217560
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 04:04
Start Date: 23/Mar/19 04:04
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381295
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
   // 3) the user didn't change the default location (or new location is 
empty), and
   // 4) the table was not initially created with a specified location
-  if (rename
-  && !oldt.getTableType().equals(TableType.VIRTUAL_VIEW.toString())
-  && (oldt.getSd().getLocation().compareTo(newt.getSd().getLocation()) 
== 0
-|| StringUtils.isEmpty(newt.getSd().getLocation()))
-  && !MetaStoreUtils.isExternalTable(oldt)) {
-Database olddb = msdb.getDatabase(catName, dbname);
+  if (replDataLocationChanged
+  || (rename
 
 Review comment:
   This patch is not dealing with set location at source. It deals with 
changing table type to external at source via ALTER table EXTERNAL=true in 
table properties. So, this scenario is not valid and not in scope.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217560)
Time Spent: 3.5h  (was: 3h 20m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217559
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 04:03
Start Date: 23/Mar/19 04:03
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381269
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 
 Review comment:
   OK
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217559)
Time Spent: 3h 20m  (was: 3h 10m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217557
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 04:02
Start Date: 23/Mar/19 04:02
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381242
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
 
 Review comment:
   OK
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217557)
Time Spent: 3h  (was: 2h 50m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217558
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 04:02
Start Date: 23/Mar/19 04:02
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381258
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 && !oldtRelativePath.equalsIgnoreCase(name + Path.SEPARATOR);
 
-if (!tableInSpecifiedLoc) {
+if (replDataLocationChanged) {
+  // If data location is changed in replication flow, then new path 
was already set in
+  // the newt. Also, it is as good as the data is moved and set 
dataWasMoved=true so that
+  // location in partitions are also updated accordingly.
+  destPath = new Path(newt.getSd().getLocation());
 
 Review comment:
   Already handled. If you notice the line 279, the partition location is 
changed only if the old location is a sub-dir of old table location.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217558)
Time Spent: 3h 10m  (was: 3h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217556
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 04:01
Start Date: 23/Mar/19 04:01
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381212
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   OK.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217556)
Time Spent: 2h 50m  (was: 2h 40m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217555
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 04:01
Start Date: 23/Mar/19 04:01
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381204
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableOperation.java
 ##
 @@ -108,6 +116,12 @@ private void createTableReplaceMode(Table tbl) throws 
HiveException {
   }
 }
 
+// If table's data location is moved, then set the corresponding flag in 
environment context to
 
 Review comment:
   Nope. The scenario is location is changed at target not at source. At source 
only the table type is changed from managed to external not location. I will 
update the comment to say that in repl flow.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217555)
Time Spent: 2h 40m  (was: 2.5h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217554
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 03:59
Start Date: 23/Mar/19 03:59
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381165
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableOperation.java
 ##
 @@ -54,17 +55,24 @@ public int execute() throws HiveException {
 Table tbl = desc.toTable(context.getConf());
 LOG.debug("creating table {} on {}", tbl.getFullyQualifiedName(), 
tbl.getDataLocation());
 
-if (desc.getReplicationSpec().isInReplicationScope() && 
(!desc.getReplaceMode())){
-  // if this is a replication spec, then replace-mode semantics might 
apply.
-  // if we're already asking for a table replacement, then we can skip 
this check.
-  // however, otherwise, if in replication scope, and we've not been 
explicitly asked
-  // to replace, we should check if the object we're looking at exists, 
and if so,
+boolean dataLocationChanged = false;
+if (desc.getReplicationSpec().isInReplicationScope()) {
+  // If in replication scope, we should check if the object we're looking 
at exists, and if so,
   // trigger replace-mode semantics.
   Table existingTable = context.getDb().getTable(tbl.getDbName(), 
tbl.getTableName(), false);
-  if (existingTable != null){
+  if (existingTable != null) {
 if 
(desc.getReplicationSpec().allowEventReplacementInto(existingTable.getParameters()))
 {
   desc.setReplaceMode(true); // we replace existing table.
   ReplicationSpec.copyLastReplId(existingTable.getParameters(), 
tbl.getParameters());
+
+  // If location of an existing managed table is changed, then need to 
delete the old location if exists.
+  // This scenario occurs when a managed table is converted into 
external table at source. In this case,
+  // at target, the table data would be moved to different location 
under base directory for external tables.
+  if (existingTable.getTableType().equals(TableType.MANAGED_TABLE)
+  && tbl.getTableType().equals(TableType.EXTERNAL_TABLE)
+  && 
(!existingTable.getDataLocation().equals(tbl.getDataLocation( {
 
 Review comment:
   It's not possible in replication flow. But I kept the check explicitly for 
better readability. There is no harm in keeping it. If you insist, let me 
change it to assert.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217554)
Time Spent: 2.5h  (was: 2h 20m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217553
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 03:58
Start Date: 23/Mar/19 03:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381147
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   Nope. It is not a case of migration. It is external table replication 
specific where the table is converted to external at source not at target. So, 
here location is changed only at target not in source.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217553)
Time Spent: 2h 20m  (was: 2h 10m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217552
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 03:57
Start Date: 23/Mar/19 03:57
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r26838
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
+
+  /**
+   * The constant that denotes the table data location is changed to different 
path. This indicates
+   * Metastore to update corresponding path in Partitions and also need to 
delete old path.
+   */
+  public static final String DATA_LOCATION_CHANGED = "DATA_LOCATION_CHANGED";
 
 Review comment:
   I thought ReplConst is self explanatory that it is Repl specific. Anyways, I 
will prefix with REPL_
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217552)
Time Spent: 2h 10m  (was: 2h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217551
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 23/Mar/19 03:56
Start Date: 23/Mar/19 03:56
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268381082
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   It is not possible in normal flow to have table type changed from managed to 
external and location set in same ALTER query. Also, it won't reach here as it 
is CreateTableTask flow. Only repl flow reaches here for this scenario.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217551)
Time Spent: 2h  (was: 1h 50m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217328
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268222534
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableOperation.java
 ##
 @@ -108,6 +116,12 @@ private void createTableReplaceMode(Table tbl) throws 
HiveException {
   }
 }
 
+// If table's data location is moved, then set the corresponding flag in 
environment context to
 
 Review comment:
   comment should be , if the location is changed at source. The comment gives 
wrong impression that it is done for normal flow also.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217328)
Time Spent: 1.5h  (was: 1h 20m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217331
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268223879
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   environmentContext != null and environmentContext.isSetProperties() is done 
twice
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217331)
Time Spent: 1h 50m  (was: 1h 40m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217325
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268128979
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableOperation.java
 ##
 @@ -54,17 +55,24 @@ public int execute() throws HiveException {
 Table tbl = desc.toTable(context.getConf());
 LOG.debug("creating table {} on {}", tbl.getFullyQualifiedName(), 
tbl.getDataLocation());
 
-if (desc.getReplicationSpec().isInReplicationScope() && 
(!desc.getReplaceMode())){
-  // if this is a replication spec, then replace-mode semantics might 
apply.
-  // if we're already asking for a table replacement, then we can skip 
this check.
-  // however, otherwise, if in replication scope, and we've not been 
explicitly asked
-  // to replace, we should check if the object we're looking at exists, 
and if so,
+boolean dataLocationChanged = false;
+if (desc.getReplicationSpec().isInReplicationScope()) {
+  // If in replication scope, we should check if the object we're looking 
at exists, and if so,
   // trigger replace-mode semantics.
   Table existingTable = context.getDb().getTable(tbl.getDbName(), 
tbl.getTableName(), false);
-  if (existingTable != null){
+  if (existingTable != null) {
 if 
(desc.getReplicationSpec().allowEventReplacementInto(existingTable.getParameters()))
 {
   desc.setReplaceMode(true); // we replace existing table.
   ReplicationSpec.copyLastReplId(existingTable.getParameters(), 
tbl.getParameters());
+
+  // If location of an existing managed table is changed, then need to 
delete the old location if exists.
+  // This scenario occurs when a managed table is converted into 
external table at source. In this case,
+  // at target, the table data would be moved to different location 
under base directory for external tables.
+  if (existingTable.getTableType().equals(TableType.MANAGED_TABLE)
+  && tbl.getTableType().equals(TableType.EXTERNAL_TABLE)
+  && 
(!existingTable.getDataLocation().equals(tbl.getDataLocation( {
 
 Review comment:
   in what scenario the location will be same for conversion from managed to 
external in replication flow ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217325)
Time Spent: 1h 10m  (was: 1h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217330
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268235202
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 
 Review comment:
   these are not required if replDataLocationChanged is true 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217330)
Time Spent: 1h 40m  (was: 1.5h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217326
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268225105
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
 
 Review comment:
   update the comment for the extra condition added
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217326)
Time Spent: 1h 20m  (was: 1h 10m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217327
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268130334
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
 
 Review comment:
   instead of replConst ..some thing more generic like repl common or util 
should be used ..so that it can be used for other purpose also 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217327)
Time Spent: 1.5h  (was: 1h 20m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217329
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268132700
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -192,12 +197,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
   // 2) the table is not an external table, and
   // 3) the user didn't change the default location (or new location is 
empty), and
   // 4) the table was not initially created with a specified location
-  if (rename
-  && !oldt.getTableType().equals(TableType.VIRTUAL_VIEW.toString())
-  && (oldt.getSd().getLocation().compareTo(newt.getSd().getLocation()) 
== 0
-|| StringUtils.isEmpty(newt.getSd().getLocation()))
-  && !MetaStoreUtils.isExternalTable(oldt)) {
-Database olddb = msdb.getDatabase(catName, dbname);
+  if (replDataLocationChanged
+  || (rename
 
 Review comment:
   i think in case of non txn table ..if the location is changed then rename 
..if its txn table  then delete the directory in replication flow. For normal 
flow, txn table, control  should not come till here ... it should fail in hive 
server it self 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217329)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217324
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268226195
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -209,7 +214,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 boolean tableInSpecifiedLoc = !oldtRelativePath.equalsIgnoreCase(name)
 && !oldtRelativePath.equalsIgnoreCase(name + Path.SEPARATOR);
 
-if (!tableInSpecifiedLoc) {
+if (replDataLocationChanged) {
+  // If data location is changed in replication flow, then new path 
was already set in
+  // the newt. Also, it is as good as the data is moved and set 
dataWasMoved=true so that
+  // location in partitions are also updated accordingly.
+  destPath = new Path(newt.getSd().getLocation());
 
 Review comment:
   Need to handle the scenario where the partition is not within table location
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217324)
Time Spent: 1h  (was: 50m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217321
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268126938
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -99,9 +100,12 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
 dbname = dbname.toLowerCase();
 
 final boolean cascade = environmentContext != null
-&& environmentContext.isSetProperties()
-&& StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(
-StatsSetupConst.CASCADE));
+&& environmentContext.isSetProperties()
+&& 
StatsSetupConst.TRUE.equals(environmentContext.getProperties().get(StatsSetupConst.CASCADE));
+final boolean replDataLocationChanged = environmentContext != null
 
 Review comment:
   why only repl ? if the table is changed from managed to external and the 
location is not same ..the old path should be deleted if its owned by hive as 
we do for drop table
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217321)
Time Spent: 0.5h  (was: 20m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217320
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268126548
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 ##
 @@ -357,6 +368,13 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
 }
   }
 
+  // If data location is changed in replication flow, then need to delete 
the old path.
+  if (replDataLocationChanged) {
+Path deleteOldDataLoc = new Path(oldt.getSd().getLocation());
+boolean isAutoPurge = 
"true".equalsIgnoreCase(oldt.getParameters().get("auto.purge"));
+wh.deleteDir(deleteOldDataLoc, true, isAutoPurge, olddb);
 
 Review comment:
   how to rollback the delete directory if txn fails ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217320)
Time Spent: 20m  (was: 10m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217323
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268127055
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ReplConst.java
 ##
 @@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.common;
+
+/**
+ * A class that defines the constant strings used by the replication 
implementation.
+ */
+
+public class ReplConst {
+
+  /**
+   * The constant that denotes the table data location is changed to different 
path. This indicates
+   * Metastore to update corresponding path in Partitions and also need to 
delete old path.
+   */
+  public static final String DATA_LOCATION_CHANGED = "DATA_LOCATION_CHANGED";
 
 Review comment:
   its used only for repl ..so name should suggest that 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217323)
Time Spent: 50m  (was: 40m)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217322
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 16:37
Start Date: 22/Mar/19 16:37
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578#discussion_r268127478
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -673,6 +674,59 @@ public void 
retryIncBootstrapExternalTablesFromDifferentDumpWithoutCleanTablesCo
 ErrorMsg.REPL_BOOTSTRAP_LOAD_PATH_NOT_VALID.getErrorCode());
   }
 
+  @Test
+  public void dynamicallyConvertManagedToExternalTable() throws Throwable {
+List dumpWithClause = Collections.singletonList(
+"'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'"
+);
+List loadWithClause = externalTableBasePathWithClause();
+
+WarehouseInstance.Tuple tupleBootstrapManagedTable = primary.run("use " + 
primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("create table t2 (id int) partitioned by (key int)")
 
 Review comment:
   The case is for migration, so the test should be there for migration case. 
Alter of acid table to external table should be avoided 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217322)
Time Spent: 40m  (was: 0.5h)

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21471) Replicating conversion of managed to external table leaks HDFS files at target.

2019-03-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21471?focusedWorklogId=217160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217160
 ]

ASF GitHub Bot logged work on HIVE-21471:
-

Author: ASF GitHub Bot
Created on: 22/Mar/19 08:39
Start Date: 22/Mar/19 08:39
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #578: HIVE-21471: 
Replicating conversion of managed to external table leaks HDFS files at target.
URL: https://github.com/apache/hive/pull/578
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 217160)
Time Spent: 10m
Remaining Estimate: 0h

> Replicating conversion of managed to external table leaks HDFS files at 
> target.
> ---
>
> Key: HIVE-21471
> URL: https://issues.apache.org/jira/browse/HIVE-21471
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21471.01.patch, HIVE-21471.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While replicating the ALTER event to convert managed table to external table, 
> the data location for the table is changed under input base directory for 
> external tables replication. But, the old location remains there and would be 
> leaked for ever.
> ALTER TABLE T1 SET TBLPROPERTIES('EXTERNAL'='true');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)