[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-05-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=603840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603840
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 29/May/21 16:56
Start Date: 29/May/21 16:56
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2219:
URL: https://github.com/apache/hive/pull/2219


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 603840)
Time Spent: 7.5h  (was: 7h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=590912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-590912
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 29/Apr/21 10:01
Start Date: 29/Apr/21 10:01
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r622905232



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +263,34 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+if 
(Boolean.valueOf(context.getProperties().getOrDefault(MIGRATE_HIVE_TO_ICEBERG, 
"false"))) {
+  LOG.debug("Initiating rollback for table {} at location {}",
+  hmsTable.getTableName(), hmsTable.getSd().getLocation());
+  context.getProperties().put(INITIALIZE_ROLLBACK_MIGRATION, "true");
+  this.catalogProperties = getCatalogProperties(hmsTable);
+  try {
+this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+  } catch (NoSuchTableException nte) {
+// iceberg table was not yet created, no need to delete the metadata 
dir separately
+return;
+  }
+
+  // we want to keep the data files but get rid of the metadata directory
+  String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();

Review comment:
   As we discussed offline, we will leave the metadata dir delete as it is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 590912)
Time Spent: 7h 20m  (was: 7h 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=590478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-590478
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 28/Apr/21 15:37
Start Date: 28/Apr/21 15:37
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r622303087



##
File path: iceberg/pom.xml
##
@@ -31,7 +31,7 @@
 .
 0.11.0
 4.0.2
-1.10.19
+3.4.4

Review comment:
   Do we want to create a different PR for it?
   Would it worth the effort?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 590478)
Time Spent: 7h 10m  (was: 7h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=590477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-590477
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 28/Apr/21 15:36
Start Date: 28/Apr/21 15:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r622302285



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +263,34 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+if 
(Boolean.valueOf(context.getProperties().getOrDefault(MIGRATE_HIVE_TO_ICEBERG, 
"false"))) {
+  LOG.debug("Initiating rollback for table {} at location {}",
+  hmsTable.getTableName(), hmsTable.getSd().getLocation());
+  context.getProperties().put(INITIALIZE_ROLLBACK_MIGRATION, "true");
+  this.catalogProperties = getCatalogProperties(hmsTable);
+  try {
+this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+  } catch (NoSuchTableException nte) {
+// iceberg table was not yet created, no need to delete the metadata 
dir separately
+return;
+  }
+
+  // we want to keep the data files but get rid of the metadata directory
+  String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();

Review comment:
   I wanted to put a comment here, but not sure if I have done it or forget 
it 
   Could we create some `CatalogUtil.dropTableMetaData(deleteIo, 
deleteMetadata);` based on `CatalogUtil.dropTableData(deleteIo, 
deleteMetadata);`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 590477)
Time Spent: 7h  (was: 6h 50m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589804
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 27/Apr/21 15:18
Start Date: 27/Apr/21 15:18
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r621314290



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +263,35 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+if 
(Boolean.valueOf(context.getProperties().getOrDefault(MIGRATE_HIVE_TO_ICEBERG, 
"false"))) {
+  LOG.debug("Initiating rollback for table {} at location {}",
+  hmsTable.getTableName(), hmsTable.getSd().getLocation());
+  context.getProperties().put(INITIALIZE_ROLLBACK_MIGRATION, "true");
+  this.catalogProperties = getCatalogProperties(hmsTable);
+  try {
+this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+  } catch (NoSuchTableException nte) {
+// iceberg table was not yet created, no need to delete the metadata 
dir separately
+return;
+  }
+
+  // we want to keep the data files but get rid of the metadata directory
+  hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   I'm confused too :). This shouldn't be there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589804)
Time Spent: 6h 50m  (was: 6h 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589760
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 27/Apr/21 14:01
Start Date: 27/Apr/21 14:01
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r621238461



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +263,35 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+if 
(Boolean.valueOf(context.getProperties().getOrDefault(MIGRATE_HIVE_TO_ICEBERG, 
"false"))) {
+  LOG.debug("Initiating rollback for table {} at location {}",
+  hmsTable.getTableName(), hmsTable.getSd().getLocation());
+  context.getProperties().put(INITIALIZE_ROLLBACK_MIGRATION, "true");
+  this.catalogProperties = getCatalogProperties(hmsTable);
+  try {
+this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+  } catch (NoSuchTableException nte) {
+// iceberg table was not yet created, no need to delete the metadata 
dir separately
+return;
+  }
+
+  // we want to keep the data files but get rid of the metadata directory
+  hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   Maybe I'm confused, but I thought this was to be removed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589760)
Time Spent: 6h 40m  (was: 6.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589757
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 27/Apr/21 13:58
Start Date: 27/Apr/21 13:58
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r621235567



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -1158,32 +1172,39 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   }
 
   private void validateMigration(String tableName) throws TException, 
InterruptedException {
-List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName);
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
 shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
 
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
-List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName);
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
 Assert.assertEquals(originalResult.size(), alterResult.size());
 for (int i = 0; i < originalResult.size(); i++) {
-  Arrays.equals(originalResult.get(i), alterResult.get(i));
+  Assert.assertTrue(Arrays.equals(originalResult.get(i), 
alterResult.get(i)));
 }
 validateSd(tableName, "iceberg");
   }
 
   private void validateMigrationRollback(String tableName) throws TException, 
InterruptedException {
-List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName);
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
 try (MockedStatic mockedTableUtil = 
Mockito.mockStatic(HiveTableUtil.class)) {
   mockedTableUtil.when(() -> 
HiveTableUtil.importFiles(ArgumentMatchers.anyString(), 
ArgumentMatchers.anyString(),
   ArgumentMatchers.any(PartitionSpecProxy.class), 
ArgumentMatchers.anyList(),
   ArgumentMatchers.any(Properties.class), 
ArgumentMatchers.any(Configuration.class)))
   .thenThrow(new MetaException());
-  shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES 
" +
-  
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
-  List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName);
-  Assert.assertEquals(originalResult.size(), alterResult.size());
-  for (int i = 0; i < originalResult.size(); i++) {
-Arrays.equals(originalResult.get(i), alterResult.get(i));
+  try {
+shell.executeStatement("ALTER TABLE " + tableName + " SET 
TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+  } catch (IllegalArgumentException e) {
+Assert.assertTrue(e.getMessage().contains("Error occurred during hive 
table migration to iceberg."));
+shell.executeStatement("MSCK REPAIR TABLE " + tableName);
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(originalResult.size(), alterResult.size());
+for (int i = 0; i < originalResult.size(); i++) {
+  Assert.assertTrue(Arrays.equals(originalResult.get(i), 
alterResult.get(i)));
+}
+validateSd(tableName, fileFormat.name());

Review comment:
   nit: I think this sd validation check might belong logically before the 
msck repair command. In case the sd wasn't reverted correctly, we'd get a 
failure during the above select which could be harder to figure out why




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589757)
Time Spent: 6.5h  (was: 6h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by 

[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589753
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 27/Apr/21 13:52
Start Date: 27/Apr/21 13:52
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r621229746



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -146,22 +147,23 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589752
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 27/Apr/21 13:51
Start Date: 27/Apr/21 13:51
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r621229448



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -28,6 +28,7 @@
 import org.apache.hadoop.hive.metastore.HiveMetaHook;
 import org.apache.hadoop.hive.metastore.Msck;
 import org.apache.hadoop.hive.metastore.MsckInfo;
+import org.apache.hadoop.hive.metastore.PartitionIterable;

Review comment:
   is this needed somewhere?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589752)
Time Spent: 6h 10m  (was: 6h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589740
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 27/Apr/21 13:37
Start Date: 27/Apr/21 13:37
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r621215726



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589364
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 17:38
Start Date: 26/Apr/21 17:38
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620510669



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589362
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 17:37
Start Date: 26/Apr/21 17:37
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620509730



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -41,6 +39,11 @@
 import org.apache.hadoop.hive.ql.metadata.Partition;
 import org.apache.hadoop.hive.ql.metadata.Table;
 import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.thrift.TException;
+
+import java.util.ArrayList;

Review comment:
   ah, yes because it's not the `hive-iceberg` module :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589362)
Time Spent: 5h 40m  (was: 5.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589317
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:57
Start Date: 26/Apr/21 15:57
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620432398



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589312
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:49
Start Date: 26/Apr/21 15:49
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620403936



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   No, it won't :). This is some leftover code from the previous 
implementation, which I forgot to remove. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589312)
Time Spent: 5h 20m  (was: 5h 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589311
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:48
Start Date: 26/Apr/21 15:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620424855



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();

Review comment:
   Shall we create a new version to remove only metadata files? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589311)
Time Spent: 5h 10m  (was: 5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589310
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:47
Start Date: 26/Apr/21 15:47
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620416625



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());

Review comment:
   You are absolutely right.

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(originalResult.size(), alterResult.size());
+List alterDescribe = shell.executeStatement("DESCRIBE FORMATTED 
" + tableName);
+validateDescribeOutput(alterDescribe, "iceberg");
+  }
+
+  private void validateMigrationRollback(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+try (MockedStatic mockedTableUtil = 
Mockito.mockStatic(HiveTableUtil.class)) {
+  mockedTableUtil.when(() -> 
HiveTableUtil.importFiles(ArgumentMatchers.anyString(), 
ArgumentMatchers.anyString(),
+  ArgumentMatchers.any(PartitionSpecProxy.class), 
ArgumentMatchers.anyList(),
+  ArgumentMatchers.any(Properties.class), 
ArgumentMatchers.any(Configuration.class)))
+  .thenThrow(new MetaException());
+  shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES 
" +
+  
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+  List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+  Assert.assertEquals(originalResult.size(), alterResult.size());
+  List alterDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+  validateDescribeOutput(alterDescribe, fileFormat.name());
+}
+  }
+
+  private void validateDescribeOutput(List describe, String format) {

Review comment:
   It validates whether the contents of the SD (serde, input/output format) 
is changed/retained (in case of rollback). I've changed this method based on 
@marton-bod's suggestions.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List.
 0.11.0
 4.0.2
-1.10.19
+3.4.4

Review comment:
   To bump the version in other places as well I would need to touch 7-8 
modules. It's not a trivial change.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = 

[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589305
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:32
Start Date: 26/Apr/21 15:32
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620410163



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");

Review comment:
   Not anymore.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589305)
Time Spent: 4h 50m  (was: 4h 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589302
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:25
Start Date: 26/Apr/21 15:25
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620403936



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   No, it won't :). This some leftover code from the previous 
implementation, which I forgot to remove. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589302)
Time Spent: 4h 40m  (was: 4.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589301
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:25
Start Date: 26/Apr/21 15:25
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620403357



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();

Review comment:
   The `CatalogUtil.dropTableData` removes everything, including the data 
files. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589301)
Time Spent: 4.5h  (was: 4h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589300
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:23
Start Date: 26/Apr/21 15:23
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620402132



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -194,6 +203,88 @@ public void testScanTable() throws IOException {
 Assert.assertArrayEquals(new Object[] {"Alice", 0L}, descRows.get(2));
   }
 
+  @Test
+  public void testMigrateHiveTableToIceberg() {
+Assume.assumeTrue("migration is only supported for hive catalog",

Review comment:
   It works  :) (with a bit of tweaking)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589300)
Time Spent: 4h 20m  (was: 4h 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589296=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589296
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:22
Start Date: 26/Apr/21 15:22
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620401382



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)

Review comment:
   I've marked in the EnvironmentContext that we are in the middle of a 
migration, so no other alter operation type remove the metadata dir. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589296)
Time Spent: 4h 10m  (was: 4h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589292
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:19
Start Date: 26/Apr/21 15:19
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620398260



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   Good catch, this was a leftover code from a previous version.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589292)
Time Spent: 4h  (was: 3h 50m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589291
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:18
Start Date: 26/Apr/21 15:18
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620397490



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();
+try {
+  Path path = new Path(metadataLocation).getParent();
+  FileSystem fileSystem = FileSystem.get(path.toUri(), conf);
+  if (fileSystem.exists(path)) {

Review comment:
   Right, removed the if




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589291)
Time Spent: 3h 50m  (was: 3h 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589243
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:34
Start Date: 26/Apr/21 13:34
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620274616



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -159,11 +162,16 @@ public void 
commitCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTabl
 
   @Override
   public void preDropTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable) {
+// do nothing

Review comment:
   When does this version of the hook get called (vs the one with 
deleteData param)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589243)
Time Spent: 3h 40m  (was: 3.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589242=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589242
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:33
Start Date: 26/Apr/21 13:33
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620300318



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -41,6 +39,11 @@
 import org.apache.hadoop.hive.ql.metadata.Partition;
 import org.apache.hadoop.hive.ql.metadata.Table;
 import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.thrift.TException;
+
+import java.util.ArrayList;

Review comment:
   I think spotless will complain that these imports are separated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589242)
Time Spent: 3.5h  (was: 3h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589241
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:32
Start Date: 26/Apr/21 13:32
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620299413



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589239
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:25
Start Date: 26/Apr/21 13:25
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620293681



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589237
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:23
Start Date: 26/Apr/21 13:23
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620291888



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589236
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:21
Start Date: 26/Apr/21 13:21
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620290440



##
File path: iceberg/pom.xml
##
@@ -31,7 +31,7 @@
 .
 0.11.0
 4.0.2
-1.10.19
+3.4.4

Review comment:
   How big is this change? Would it worth to separate to a different jira?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589236)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589235
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:13
Start Date: 26/Apr/21 13:13
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620282921



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(originalResult.size(), alterResult.size());

Review comment:
   Would it be difficult to check the contents as well, not just the size? 
Just to make sure the data is all the same after the migration




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589235)
Time Spent: 2h 40m  (was: 2.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589234
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:08
Start Date: 26/Apr/21 13:08
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620279270



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(originalResult.size(), alterResult.size());
+List alterDescribe = shell.executeStatement("DESCRIBE FORMATTED 
" + tableName);

Review comment:
   Can we use `TestHiveMetastore#loadTable` instead and check the contents 
of the `sd`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589234)
Time Spent: 2.5h  (was: 2h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589232
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:07
Start Date: 26/Apr/21 13:07
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620278440



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());

Review comment:
   I'm not sure we need this, since this is only checking hive behaviour, 
not iceberg

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());

Review comment:
   I'm not sure we need this, since this is only checking hive behaviour, 
not iceberg. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589232)
Time Spent: 2h 20m  (was: 2h 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589231
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:03
Start Date: 26/Apr/21 13:03
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620274616



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -159,11 +162,16 @@ public void 
commitCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTabl
 
   @Override
   public void preDropTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable) {
+// do nothing

Review comment:
   When does this version of the hook get called (vs the one with 
deleteData param)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589231)
Time Spent: 2h 10m  (was: 2h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589176
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:36
Start Date: 26/Apr/21 12:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620254521



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(originalResult.size(), alterResult.size());
+List alterDescribe = shell.executeStatement("DESCRIBE FORMATTED 
" + tableName);
+validateDescribeOutput(alterDescribe, "iceberg");
+  }
+
+  private void validateMigrationRollback(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+try (MockedStatic mockedTableUtil = 
Mockito.mockStatic(HiveTableUtil.class)) {
+  mockedTableUtil.when(() -> 
HiveTableUtil.importFiles(ArgumentMatchers.anyString(), 
ArgumentMatchers.anyString(),
+  ArgumentMatchers.any(PartitionSpecProxy.class), 
ArgumentMatchers.anyList(),
+  ArgumentMatchers.any(Properties.class), 
ArgumentMatchers.any(Configuration.class)))
+  .thenThrow(new MetaException());
+  shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES 
" +
+  
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+  List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+  Assert.assertEquals(originalResult.size(), alterResult.size());
+  List alterDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+  validateDescribeOutput(alterDescribe, fileFormat.name());
+}
+  }
+
+  private void validateDescribeOutput(List describe, String format) {

Review comment:
   What does this validation check?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589176)
Time Spent: 2h  (was: 1h 50m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589172
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:35
Start Date: 26/Apr/21 12:35
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620253400



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");

Review comment:
   Can we minimize the time spent here?
   * Do we need this query, or we can expect that the writer of the test knows 
the expected number?
   * Do we need it to be ordered?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589172)
Time Spent: 1.5h  (was: 1h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589174
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:35
Start Date: 26/Apr/21 12:35
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620253582



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");

Review comment:
   Do we need this ordered?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589174)
Time Spent: 1h 50m  (was: 1h 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589173
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:35
Start Date: 26/Apr/21 12:35
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620253478



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   Will this change actually be saved into the HMS db? If so, what if the 
original table had this property as true? Should we change it silently here due 
to the rollback?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589173)
Time Spent: 1h 40m  (was: 1.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589168=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589168
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:32
Start Date: 26/Apr/21 12:32
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620251451



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();
+try {
+  Path path = new Path(metadataLocation).getParent();
+  FileSystem fileSystem = FileSystem.get(path.toUri(), conf);
+  if (fileSystem.exists(path)) {
+fileSystem.delete(path, true);

Review comment:
   Can we add some logging here that the rollback is going to happen for 
`tableName` and metadata under `path`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589168)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589166
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:32
Start Date: 26/Apr/21 12:32
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620250947



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -194,6 +203,88 @@ public void testScanTable() throws IOException {
 Assert.assertArrayEquals(new Object[] {"Alice", 0L}, descRows.get(2));
   }
 
+  @Test
+  public void testMigrateHiveTableToIceberg() {
+Assume.assumeTrue("migration is only supported for hive catalog",

Review comment:
   What happens if different catalog is configured?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589166)
Time Spent: 1h 10m  (was: 1h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589164
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:31
Start Date: 26/Apr/21 12:31
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620250375



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();

Review comment:
   Shouldn't we just use `CatalogUtil.dropTableData(deleteIo, 
deleteMetadata);`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589164)
Time Spent: 1h  (was: 50m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589163
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:30
Start Date: 26/Apr/21 12:30
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620250082



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)

Review comment:
   If this is called for any alter table op, will we be able to recognise 
the operation type and delete the metadata dir only in case of a true migration 
(and not do it for an alter table rename column, a simple alter table setprop 
case, etc)? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589163)
Time Spent: 50m  (was: 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589153
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:20
Start Date: 26/Apr/21 12:20
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620242627



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   Why is this needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589153)
Time Spent: 40m  (was: 0.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589151
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:19
Start Date: 26/Apr/21 12:19
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620242174



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory

Review comment:
   nit: in iceberg code we usually put an extra line after blocks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589151)
Time Spent: 0.5h  (was: 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589150
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:19
Start Date: 26/Apr/21 12:19
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620241896



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();
+try {
+  Path path = new Path(metadataLocation).getParent();
+  FileSystem fileSystem = FileSystem.get(path.toUri(), conf);
+  if (fileSystem.exists(path)) {

Review comment:
   I think we do not need exists. Just try / catch around it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589150)
Time Spent: 20m  (was: 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589145
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:04
Start Date: 26/Apr/21 12:04
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request #2219:
URL: https://github.com/apache/hive/pull/2219


   
   
   ### What changes were proposed in this pull request?
   
   In case of an issue during the table migration this logic is followed:
   - drop altered table if it exists but keep the data
   - recreate the original table 
   - call `msck repair` on new table
   
   Work performed:
   - Enhance `HiveMetaHook` with rollback method for alter operation and 
provide implementation in `HiveIcebergMetaHook`
   - add drop/create/msck repair logic to `AbstractAlterTableOperation`
   - the need for rollback is signalled through the `EnvironmentContext` 
properties. The `HiveMetaHook#INITIALIZE_ROLLBACK_ALTER` is set in 
`HiveIcebergMetaHook#rollbackAlterTable` and evaluated in 
`AbstractAlterTableOperation`
   - Introduced a new `preDropTable` method to `HiveMetaHook` which accepts the 
`deleteData` parameter in order to retain data files while deleting iceberg 
tables.
   - covered rollback with unit tests.
   
   
   
   
   ### Why are the changes needed?
   In case of an error during the migration of a hive table to iceberg the 
original table must be restored.
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Manual test and unit tests.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589145)
Remaining Estimate: 0h
Time Spent: 10m

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)