[jira] [Work logged] (HIVE-24998) IS [NOT] DISTINCT FROM failing with SemanticException

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24998?focusedWorklogId=589567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589567
 ]

ASF GitHub Bot logged work on HIVE-24998:
-

Author: ASF GitHub Bot
Created on: 27/Apr/21 05:45
Start Date: 27/Apr/21 05:45
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #2163:
URL: https://github.com/apache/hive/pull/2163#discussion_r620876210



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRexExecutorImpl.java
##
@@ -52,6 +56,10 @@ public void reduce(RexBuilder rexBuilder, List 
constExps, List
   // initialize the converter
   ExprNodeConverter converter = new ExprNodeConverter("", null, null, null,
   new HashSet<>(), rexBuilder.getTypeFactory());
+
+  if (rexNode.getKind() == SqlKind.IS_DISTINCT_FROM) {

Review comment:
   This should be move to the `ExprNodeConverter` itself. For instance, 
check `visitCall` method.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java
##
@@ -788,6 +789,9 @@ public ASTNode visitCall(RexCall call) {
   }
 }
 break;
+  case IS_DISTINCT_FROM:
+// convert IS DISTINCT FROM to NOT (IS NOT DISTINCT FROM)
+return visitCall((RexCall) 
RexUtil.not(rexBuilder.makeCall(SqlStdOperatorTable.IS_NOT_DISTINCT_FROM, 
call.getOperands(;

Review comment:
   If you rely on `buildAST(SqlOperator op, List children)`, you 
would not have to declare the function in `FunctionRegistry`?
   i) Call `visitCall` on the call operands, ii) then `buildAST` with 
`SqlStdOperatorTable.IS_NOT_DISTINCT_FROM`, iii) then `buildAST` with the 
output of the previous one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589567)
Time Spent: 1h 20m  (was: 1h 10m)

> IS [NOT] DISTINCT FROM failing with SemanticException
> -
>
> Key: HIVE-24998
> URL: https://issues.apache.org/jira/browse/HIVE-24998
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hive: INSERT statements failing with UDFArgumentException and 
> SemanticException
> Problem Statement:
> {code:java}
> CREATE TABLE t2(c0 boolean , c1 FLOAT );
> INSERT INTO t2(c0) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641));
> -- Insert failing with: Error: Error while compiling statement: FAILED: 
> UDFArgumentException UDF tables only one argument (state=42000,code=4)
> INSERT INTO t2(c0,c1) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641), 0.2);
> -- Insert failing with: SemanticException 0:0 Expected 2 columns for 
> insclause-0/database52@t2; select produces 1 columns. Error encountered near 
> token '0.2' (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24976) CBO: count(distinct) in a window function fails CBO

2021-04-26 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24976:
---
Fix Version/s: 4.0.0

> CBO: count(distinct) in a window function fails CBO
> ---
>
> Key: HIVE-24976
> URL: https://issues.apache.org/jira/browse/HIVE-24976
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Gopal Vijayaraghavan
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> create temporary table tmp_tbl(
> `rule_id` string,
> `severity` string,
> `alert_id` string,
> `alert_type` string);
> explain cbo
> select `k`.`rule_id`,
> count(distinct `k`.`alert_id`) over(partition by `k`.`rule_id`) `subj_cnt`
> from tmp_tbl k
> ;
> explain
> select `k`.`rule_id`,
> count(distinct `k`.`alert_id`) over(partition by `k`.`rule_id`) `subj_cnt`
> from tmp_tbl k
> ;
> {code}
> Fails CBO, because the count(distinct) is not being recognized as belonging 
> to a windowing operation.
> So it throws the following exception
> {code}
> throw new CalciteSemanticException("Distinct without an 
> aggregation.",
> UnsupportedFeature.Distinct_without_an_aggreggation);
> {code}
> https://github.com/apache/hive/blob/73c3770d858b063c69dea6c64a759f8fdacad460/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L4914
> This prevents a query like this from using a materialized view which already 
> exists in the system (the MV obviously does not contain this expression, but 
> represents a complex transform from a JSON structure into a columnar layout).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-2420:
-
Labels: pull-request-available  (was: )

> partition pruner expr is not populated due to some bug in ppd
> -
>
> Key: HIVE-2420
> URL: https://issues.apache.org/jira/browse/HIVE-2420
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-2420.reproduce.diff
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-2420?focusedWorklogId=589433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589433
 ]

ASF GitHub Bot logged work on HIVE-2420:


Author: ASF GitHub Bot
Created on: 26/Apr/21 20:27
Start Date: 26/Apr/21 20:27
Worklog Time Spent: 10m 
  Work Description: Dawn2111 commented on a change in pull request #2065:
URL: https://github.com/apache/hive/pull/2065#discussion_r620624406



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java
##
@@ -1248,6 +1334,42 @@ private void processPoolChangesOnMasterThread(
 }
   }
 
+  private void processDelayedMovesForPool(final String poolName, final 
HashSet poolsToRedistribute, final Map 
recordMoveEvents,
+  WmThreadSyncWork syncWork, IdentityHashMap 
toReuse) {
+long currentTime = System.currentTimeMillis();
+PoolState pool = pools.get(poolName);
+int movedCount = 0;
+int queueSize = pool.queue.size();
+int remainingCapacity = pool.queryParallelism - 
pool.getTotalActiveSessions();

Review comment:
   Yes. A new request will wake up the master thread.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589433)
Remaining Estimate: 0h
Time Spent: 10m

> partition pruner expr is not populated due to some bug in ppd
> -
>
> Key: HIVE-2420
> URL: https://issues.apache.org/jira/browse/HIVE-2420
> Project: Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>Priority: Major
> Attachments: HIVE-2420.reproduce.diff
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589364
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 17:38
Start Date: 26/Apr/21 17:38
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620510669



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589362
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 17:37
Start Date: 26/Apr/21 17:37
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620509730



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -41,6 +39,11 @@
 import org.apache.hadoop.hive.ql.metadata.Partition;
 import org.apache.hadoop.hive.ql.metadata.Table;
 import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.thrift.TException;
+
+import java.util.ArrayList;

Review comment:
   ah, yes because it's not the `hive-iceberg` module :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589362)
Time Spent: 5h 40m  (was: 5.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity

2021-04-26 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24914:
--
Fix Version/s: 4.0.0

> Improve LLAP scheduling by only traversing hosts with capacity
> --
>
> Key: HIVE-24914
> URL: https://issues.apache.org/jira/browse/HIVE-24914
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *schedulePendingTasks* on the LlapTaskScheduler currently goes through all 
> the pending tasks and tries to allocate them based on their Priority -- if a 
> priority can not be scheduled completely, we bail out as lower priorities 
> would not be able to get allocations either.
> An optimization here could be to only walk through the nodes with capacity 
> (if any) ,and not all available hosts, for scheduling these tasks based on 
> their priority and locality preferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity

2021-04-26 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24914:
--
Component/s: llap

> Improve LLAP scheduling by only traversing hosts with capacity
> --
>
> Key: HIVE-24914
> URL: https://issues.apache.org/jira/browse/HIVE-24914
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *schedulePendingTasks* on the LlapTaskScheduler currently goes through all 
> the pending tasks and tries to allocate them based on their Priority -- if a 
> priority can not be scheduled completely, we bail out as lower priorities 
> would not be able to get allocations either.
> An optimization here could be to only walk through the nodes with capacity 
> (if any) ,and not all available hosts, for scheduling these tasks based on 
> their priority and locality preferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity

2021-04-26 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis resolved HIVE-24914.
---
Resolution: Fixed

> Improve LLAP scheduling by only traversing hosts with capacity
> --
>
> Key: HIVE-24914
> URL: https://issues.apache.org/jira/browse/HIVE-24914
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *schedulePendingTasks* on the LlapTaskScheduler currently goes through all 
> the pending tasks and tries to allocate them based on their Priority -- if a 
> priority can not be scheduled completely, we bail out as lower priorities 
> would not be able to get allocations either.
> An optimization here could be to only walk through the nodes with capacity 
> (if any) ,and not all available hosts, for scheduling these tasks based on 
> their priority and locality preferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity

2021-04-26 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332597#comment-17332597
 ] 

Panagiotis Garefalakis commented on HIVE-24914:
---

Resolved via https://github.com/apache/hive/pull/2108
Thanks [~mustafaiman] for the review! 

> Improve LLAP scheduling by only traversing hosts with capacity
> --
>
> Key: HIVE-24914
> URL: https://issues.apache.org/jira/browse/HIVE-24914
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *schedulePendingTasks* on the LlapTaskScheduler currently goes through all 
> the pending tasks and tries to allocate them based on their Priority -- if a 
> priority can not be scheduled completely, we bail out as lower priorities 
> would not be able to get allocations either.
> An optimization here could be to only walk through the nodes with capacity 
> (if any) ,and not all available hosts, for scheduling these tasks based on 
> their priority and locality preferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24914?focusedWorklogId=589349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589349
 ]

ASF GitHub Bot logged work on HIVE-24914:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 17:03
Start Date: 26/Apr/21 17:03
Worklog Time Spent: 10m 
  Work Description: pgaref merged pull request #2108:
URL: https://github.com/apache/hive/pull/2108


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589349)
Time Spent: 1.5h  (was: 1h 20m)

> Improve LLAP scheduling by only traversing hosts with capacity
> --
>
> Key: HIVE-24914
> URL: https://issues.apache.org/jira/browse/HIVE-24914
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *schedulePendingTasks* on the LlapTaskScheduler currently goes through all 
> the pending tasks and tries to allocate them based on their Priority -- if a 
> priority can not be scheduled completely, we bail out as lower priorities 
> would not be able to get allocations either.
> An optimization here could be to only walk through the nodes with capacity 
> (if any) ,and not all available hosts, for scheduling these tasks based on 
> their priority and locality preferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589317
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:57
Start Date: 26/Apr/21 15:57
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620432398



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589312
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:49
Start Date: 26/Apr/21 15:49
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620403936



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   No, it won't :). This is some leftover code from the previous 
implementation, which I forgot to remove. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589312)
Time Spent: 5h 20m  (was: 5h 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589311
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:48
Start Date: 26/Apr/21 15:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620424855



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();

Review comment:
   Shall we create a new version to remove only metadata files? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589311)
Time Spent: 5h 10m  (was: 5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589310
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:47
Start Date: 26/Apr/21 15:47
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620416625



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());

Review comment:
   You are absolutely right.

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(originalResult.size(), alterResult.size());
+List alterDescribe = shell.executeStatement("DESCRIBE FORMATTED 
" + tableName);
+validateDescribeOutput(alterDescribe, "iceberg");
+  }
+
+  private void validateMigrationRollback(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+try (MockedStatic mockedTableUtil = 
Mockito.mockStatic(HiveTableUtil.class)) {
+  mockedTableUtil.when(() -> 
HiveTableUtil.importFiles(ArgumentMatchers.anyString(), 
ArgumentMatchers.anyString(),
+  ArgumentMatchers.any(PartitionSpecProxy.class), 
ArgumentMatchers.anyList(),
+  ArgumentMatchers.any(Properties.class), 
ArgumentMatchers.any(Configuration.class)))
+  .thenThrow(new MetaException());
+  shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES 
" +
+  
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+  List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+  Assert.assertEquals(originalResult.size(), alterResult.size());
+  List alterDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+  validateDescribeOutput(alterDescribe, fileFormat.name());
+}
+  }
+
+  private void validateDescribeOutput(List describe, String format) {

Review comment:
   It validates whether the contents of the SD (serde, input/output format) 
is changed/retained (in case of rollback). I've changed this method based on 
@marton-bod's suggestions.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List.
 0.11.0
 4.0.2
-1.10.19
+3.4.4

Review comment:
   To bump the version in other places as well I would need to touch 7-8 
modules. It's not a trivial change.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = 

[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589305
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:32
Start Date: 26/Apr/21 15:32
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620410163



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");

Review comment:
   Not anymore.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589305)
Time Spent: 4h 50m  (was: 4h 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589302
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:25
Start Date: 26/Apr/21 15:25
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620403936



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   No, it won't :). This some leftover code from the previous 
implementation, which I forgot to remove. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589302)
Time Spent: 4h 40m  (was: 4.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589301
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:25
Start Date: 26/Apr/21 15:25
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620403357



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();

Review comment:
   The `CatalogUtil.dropTableData` removes everything, including the data 
files. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589301)
Time Spent: 4.5h  (was: 4h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589300
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:23
Start Date: 26/Apr/21 15:23
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620402132



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -194,6 +203,88 @@ public void testScanTable() throws IOException {
 Assert.assertArrayEquals(new Object[] {"Alice", 0L}, descRows.get(2));
   }
 
+  @Test
+  public void testMigrateHiveTableToIceberg() {
+Assume.assumeTrue("migration is only supported for hive catalog",

Review comment:
   It works  :) (with a bit of tweaking)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589300)
Time Spent: 4h 20m  (was: 4h 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589296=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589296
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:22
Start Date: 26/Apr/21 15:22
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620401382



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)

Review comment:
   I've marked in the EnvironmentContext that we are in the middle of a 
migration, so no other alter operation type remove the metadata dir. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589296)
Time Spent: 4h 10m  (was: 4h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589292
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:19
Start Date: 26/Apr/21 15:19
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620398260



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   Good catch, this was a leftover code from a previous version.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589292)
Time Spent: 4h  (was: 3h 50m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589291
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 15:18
Start Date: 26/Apr/21 15:18
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620397490



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();
+try {
+  Path path = new Path(metadataLocation).getParent();
+  FileSystem fileSystem = FileSystem.get(path.toUri(), conf);
+  if (fileSystem.exists(path)) {

Review comment:
   Right, removed the if




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589291)
Time Spent: 3h 50m  (was: 3h 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2021-04-26 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-21354:
--
Comment: was deleted

(was: Closing as duplicate.)

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.2.0, 4.0.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25047) Remove unused fields/methods and deprecated calls in HiveProject

2021-04-26 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis resolved HIVE-25047.
---
Resolution: Fixed

> Remove unused fields/methods and deprecated calls in HiveProject
> 
>
> Key: HIVE-25047
> URL: https://issues.apache.org/jira/browse/HIVE-25047
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Small refactoring of 
> [HiveProject|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java]
>  operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25047) Remove unused fields/methods and deprecated calls in HiveProject

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25047?focusedWorklogId=589273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589273
 ]

ASF GitHub Bot logged work on HIVE-25047:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 14:49
Start Date: 26/Apr/21 14:49
Worklog Time Spent: 10m 
  Work Description: pgaref merged pull request #2206:
URL: https://github.com/apache/hive/pull/2206


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589273)
Time Spent: 20m  (was: 10m)

> Remove unused fields/methods and deprecated calls in HiveProject
> 
>
> Key: HIVE-25047
> URL: https://issues.apache.org/jira/browse/HIVE-25047
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Small refactoring of 
> [HiveProject|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java]
>  operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25047) Remove unused fields/methods and deprecated calls in HiveProject

2021-04-26 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332474#comment-17332474
 ] 

Panagiotis Garefalakis commented on HIVE-25047:
---

Resolved via https://github.com/apache/hive/pull/2206 
Thanks [~zabetak] ! 

> Remove unused fields/methods and deprecated calls in HiveProject
> 
>
> Key: HIVE-25047
> URL: https://issues.apache.org/jira/browse/HIVE-25047
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Small refactoring of 
> [HiveProject|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java]
>  operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21354) Lock The Entire Table If Majority Of Partitions Are Locked

2021-04-26 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332473#comment-17332473
 ] 

Denys Kuzmenko commented on HIVE-21354:
---

Closing as duplicate.

> Lock The Entire Table If Majority Of Partitions Are Locked
> --
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.2.0, 4.0.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.  
> When a Hive query interacts with a table which has a lot of partitions, this 
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal 
> to half the total number of partitions, simply create one ZNode lock at the 
> table level.
> This would improve performance of many queries, but in particular, a {{select 
> count(1) from table}} ... or ... {{select * from table limit 5}} where the 
> table has many partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589243
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:34
Start Date: 26/Apr/21 13:34
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620274616



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -159,11 +162,16 @@ public void 
commitCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTabl
 
   @Override
   public void preDropTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable) {
+// do nothing

Review comment:
   When does this version of the hook get called (vs the one with 
deleteData param)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589243)
Time Spent: 3h 40m  (was: 3.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589242=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589242
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:33
Start Date: 26/Apr/21 13:33
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620300318



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -41,6 +39,11 @@
 import org.apache.hadoop.hive.ql.metadata.Partition;
 import org.apache.hadoop.hive.ql.metadata.Table;
 import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.thrift.TException;
+
+import java.util.ArrayList;

Review comment:
   I think spotless will complain that these imports are separated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589242)
Time Spent: 3.5h  (was: 3h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589241
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:32
Start Date: 26/Apr/21 13:32
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620299413



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589239
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:25
Start Date: 26/Apr/21 13:25
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620293681



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589237
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:23
Start Date: 26/Apr/21 13:23
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620291888



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/AbstractAlterTableOperation.java
##
@@ -138,8 +141,32 @@ private void finalizeAlterTableWithWriteIdOp(Table table, 
Table oldTable, List Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589236
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:21
Start Date: 26/Apr/21 13:21
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620290440



##
File path: iceberg/pom.xml
##
@@ -31,7 +31,7 @@
 .
 0.11.0
 4.0.2
-1.10.19
+3.4.4

Review comment:
   How big is this change? Would it worth to separate to a different jira?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589236)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589235
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:13
Start Date: 26/Apr/21 13:13
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620282921



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(originalResult.size(), alterResult.size());

Review comment:
   Would it be difficult to check the contents as well, not just the size? 
Just to make sure the data is all the same after the migration




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589235)
Time Spent: 2h 40m  (was: 2.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589234
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:08
Start Date: 26/Apr/21 13:08
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620279270



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(originalResult.size(), alterResult.size());
+List alterDescribe = shell.executeStatement("DESCRIBE FORMATTED 
" + tableName);

Review comment:
   Can we use `TestHiveMetastore#loadTable` instead and check the contents 
of the `sd`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589234)
Time Spent: 2.5h  (was: 2h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589232
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:07
Start Date: 26/Apr/21 13:07
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620278440



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());

Review comment:
   I'm not sure we need this, since this is only checking hive behaviour, 
not iceberg

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());

Review comment:
   I'm not sure we need this, since this is only checking hive behaviour, 
not iceberg. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589232)
Time Spent: 2h 20m  (was: 2h 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589231
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 13:03
Start Date: 26/Apr/21 13:03
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620274616



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -159,11 +162,16 @@ public void 
commitCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTabl
 
   @Override
   public void preDropTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable) {
+// do nothing

Review comment:
   When does this version of the hook get called (vs the one with 
deleteData param)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589231)
Time Spent: 2h 10m  (was: 2h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589176
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:36
Start Date: 26/Apr/21 12:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620254521



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(originalResult.size(), alterResult.size());
+List alterDescribe = shell.executeStatement("DESCRIBE FORMATTED 
" + tableName);
+validateDescribeOutput(alterDescribe, "iceberg");
+  }
+
+  private void validateMigrationRollback(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+try (MockedStatic mockedTableUtil = 
Mockito.mockStatic(HiveTableUtil.class)) {
+  mockedTableUtil.when(() -> 
HiveTableUtil.importFiles(ArgumentMatchers.anyString(), 
ArgumentMatchers.anyString(),
+  ArgumentMatchers.any(PartitionSpecProxy.class), 
ArgumentMatchers.anyList(),
+  ArgumentMatchers.any(Properties.class), 
ArgumentMatchers.any(Configuration.class)))
+  .thenThrow(new MetaException());
+  shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES 
" +
+  
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+  List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+  Assert.assertEquals(originalResult.size(), alterResult.size());
+  List alterDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+  validateDescribeOutput(alterDescribe, fileFormat.name());
+}
+  }
+
+  private void validateDescribeOutput(List describe, String format) {

Review comment:
   What does this validation check?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589176)
Time Spent: 2h  (was: 1h 50m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589172
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:35
Start Date: 26/Apr/21 12:35
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620253400



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");

Review comment:
   Can we minimize the time spent here?
   * Do we need this query, or we can expect that the writer of the test knows 
the expected number?
   * Do we need it to be ordered?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589172)
Time Spent: 1.5h  (was: 1h 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589174
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:35
Start Date: 26/Apr/21 12:35
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620253582



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -955,4 +1046,44 @@ private void validateBasicStats(Table icebergTable, 
String dbName, String tableN
   Assert.assertEquals(summary.get(entry.getValue()), 
hmsParams.get(entry.getKey()));
 }
   }
+
+  private void validateMigration(String tableName, int recordCount) {
+List originalResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");
+Assert.assertEquals(recordCount, originalResult.size());
+List originalDescribe = shell.executeStatement("DESCRIBE 
FORMATTED " + tableName);
+validateDescribeOutput(originalDescribe, fileFormat.name());
+shell.executeStatement("ALTER TABLE " + tableName + " SET TBLPROPERTIES " +
+
"('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')");
+List alterResult = shell.executeStatement("SELECT * FROM " + 
tableName + " ORDER BY a");

Review comment:
   Do we need this ordered?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589174)
Time Spent: 1h 50m  (was: 1h 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589173
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:35
Start Date: 26/Apr/21 12:35
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620253478



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   Will this change actually be saved into the HMS db? If so, what if the 
original table had this property as true? Should we change it silently here due 
to the rollback?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589173)
Time Spent: 1h 40m  (was: 1.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589168=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589168
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:32
Start Date: 26/Apr/21 12:32
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620251451



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();
+try {
+  Path path = new Path(metadataLocation).getParent();
+  FileSystem fileSystem = FileSystem.get(path.toUri(), conf);
+  if (fileSystem.exists(path)) {
+fileSystem.delete(path, true);

Review comment:
   Can we add some logging here that the rollback is going to happen for 
`tableName` and metadata under `path`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589168)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589166
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:32
Start Date: 26/Apr/21 12:32
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620250947



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -194,6 +203,88 @@ public void testScanTable() throws IOException {
 Assert.assertArrayEquals(new Object[] {"Alice", 0L}, descRows.get(2));
   }
 
+  @Test
+  public void testMigrateHiveTableToIceberg() {
+Assume.assumeTrue("migration is only supported for hive catalog",

Review comment:
   What happens if different catalog is configured?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589166)
Time Spent: 1h 10m  (was: 1h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589164
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:31
Start Date: 26/Apr/21 12:31
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620250375



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();

Review comment:
   Shouldn't we just use `CatalogUtil.dropTableData(deleteIo, 
deleteMetadata);`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589164)
Time Spent: 1h  (was: 50m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589163
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:30
Start Date: 26/Apr/21 12:30
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620250082



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)

Review comment:
   If this is called for any alter table op, will we be able to recognise 
the operation type and delete the metadata dir only in case of a true migration 
(and not do it for an alter table rename column, a simple alter table setprop 
case, etc)? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589163)
Time Spent: 50m  (was: 40m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589153
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:20
Start Date: 26/Apr/21 12:20
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620242627



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");

Review comment:
   Why is this needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589153)
Time Spent: 40m  (was: 0.5h)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589151
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:19
Start Date: 26/Apr/21 12:19
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620242174



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory

Review comment:
   nit: in iceberg code we usually put an extra line after blocks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589151)
Time Spent: 0.5h  (was: 20m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589150
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:19
Start Date: 26/Apr/21 12:19
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2219:
URL: https://github.com/apache/hive/pull/2219#discussion_r620241896



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -254,6 +262,32 @@ public void 
commitAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable
 }
   }
 
+  @Override
+  public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table 
hmsTable, EnvironmentContext context)
+  throws MetaException {
+context.getProperties().put(INITIALIZE_ROLLBACK_ALTER, "true");
+this.catalogProperties = getCatalogProperties(hmsTable);
+try {
+  this.icebergTable = Catalogs.loadTable(conf, catalogProperties);
+} catch (NoSuchTableException nte) {
+  // iceberg table was not yet created, no need to delete the metadata dir 
separately
+  return;
+}
+// we want to keep the data files but get rid of the metadata directory
+hmsTable.getParameters().put(InputFormatConfig.EXTERNAL_TABLE_PURGE, 
"FALSE");
+String metadataLocation = ((BaseTable) 
this.icebergTable).operations().current().metadataFileLocation();
+try {
+  Path path = new Path(metadataLocation).getParent();
+  FileSystem fileSystem = FileSystem.get(path.toUri(), conf);
+  if (fileSystem.exists(path)) {

Review comment:
   I think we do not need exists. Just try / catch around it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589150)
Time Spent: 20m  (was: 10m)

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25057:
--
Labels: pull-request-available  (was: )

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?focusedWorklogId=589145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589145
 ]

ASF GitHub Bot logged work on HIVE-25057:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 12:04
Start Date: 26/Apr/21 12:04
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request #2219:
URL: https://github.com/apache/hive/pull/2219


   
   
   ### What changes were proposed in this pull request?
   
   In case of an issue during the table migration this logic is followed:
   - drop altered table if it exists but keep the data
   - recreate the original table 
   - call `msck repair` on new table
   
   Work performed:
   - Enhance `HiveMetaHook` with rollback method for alter operation and 
provide implementation in `HiveIcebergMetaHook`
   - add drop/create/msck repair logic to `AbstractAlterTableOperation`
   - the need for rollback is signalled through the `EnvironmentContext` 
properties. The `HiveMetaHook#INITIALIZE_ROLLBACK_ALTER` is set in 
`HiveIcebergMetaHook#rollbackAlterTable` and evaluated in 
`AbstractAlterTableOperation`
   - Introduced a new `preDropTable` method to `HiveMetaHook` which accepts the 
`deleteData` parameter in order to retain data files while deleting iceberg 
tables.
   - covered rollback with unit tests.
   
   
   
   
   ### Why are the changes needed?
   In case of an error during the migration of a hive table to iceberg the 
original table must be restored.
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Manual test and unit tests.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589145)
Remaining Estimate: 0h
Time Spent: 10m

> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25057) Implement rollback for hive to iceberg migration

2021-04-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-25057:



> Implement rollback for hive to iceberg migration
> 
>
> Key: HIVE-25057
> URL: https://issues.apache.org/jira/browse/HIVE-25057
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> This is a follow-up Jira of HIVE-25008.
> In case of an error during hive to iceberg migration, the original hive table 
> must be restored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25050) Disable 'hive.metastore.acid.truncate.usebase' config as it's introducing backward incompatible change

2021-04-26 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-25050:
-

Assignee: Denys Kuzmenko

> Disable 'hive.metastore.acid.truncate.usebase' config as it's introducing 
> backward incompatible change
> --
>
> Key: HIVE-25050
> URL: https://issues.apache.org/jira/browse/HIVE-25050
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-20137 introduces new file metadata and writes it into base files during 
> truncate table operations.
> If there is an older version of HMS running in the cluster that doesn't 
> contain HIVE-20137 it won't be able to handle the new file metadata and will 
> fail to inspect tables that have been truncated, thereby skipping compacting 
> them.
> We should disable 'hive.metastore.acid.truncate.usebase' until we don't have 
> a fix that addresses the backward incompatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25050) Disable 'hive.metastore.acid.truncate.usebase' config as it's introducing backward incompatible change

2021-04-26 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-25050.
---
Resolution: Fixed

> Disable 'hive.metastore.acid.truncate.usebase' config as it's introducing 
> backward incompatible change
> --
>
> Key: HIVE-25050
> URL: https://issues.apache.org/jira/browse/HIVE-25050
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-20137 introduces new file metadata and writes it into base files during 
> truncate table operations.
> If there is an older version of HMS running in the cluster that doesn't 
> contain HIVE-20137 it won't be able to handle the new file metadata and will 
> fail to inspect tables that have been truncated, thereby skipping compacting 
> them.
> We should disable 'hive.metastore.acid.truncate.usebase' until we don't have 
> a fix that addresses the backward incompatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24201) WorkloadManager can support delayed move if destination pool does not have enough sessions

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24201?focusedWorklogId=589120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589120
 ]

ASF GitHub Bot logged work on HIVE-24201:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 10:26
Start Date: 26/Apr/21 10:26
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #2065:
URL: https://github.com/apache/hive/pull/2065#discussion_r620083627



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3749,6 +3749,17 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 new TimeValidator(TimeUnit.SECONDS),
 "The timeout for AM registry registration, after which (on attempting 
to use the\n" +
 "session), we kill it and try to get another one."),
+HIVE_SERVER2_WM_DELAYED_MOVE("hive.server2.wm.delayed.move", false,
+"Determines behavior of the wm move trigger when destination pool is 
full.\n" +
+"If true, the query will run in source pool as long as possible if 
destination pool is full;\n" +
+"if false, the query will be killed if destination pool is full."),
+
HIVE_SERVER2_WM_DELAYED_MOVE_TIMEOUT("hive.server2.wm.delayed.move.timeout", 
"600",
+new TimeValidator(TimeUnit.SECONDS),
+"The amount of time a delayed move is allowed to run in the source 
pool,\n" +
+"when a delayed move session times out, the session is moved to the 
destination pool.\n"),
+
HIVE_SERVER2_WM_DELAYED_MOVE_VALIDATOR_INTERVAL("hive.server2.wm.delayed.move.validator.interval",
 "10",
+new TimeValidator(TimeUnit.SECONDS),
+"Interval for checking for expired delayed moves and retries. Value of 
0 indicates no checks."),

Review comment:
   Does "0" means no timeout check or no support of delayed move itself? I 
think, in any case, this creates confusion. We shouldn't allow 0 and this 
config should be > 0.
   hive.server2.wm.delayed.move.timeout=0 can be used for no timeout case.

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3749,6 +3749,17 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 new TimeValidator(TimeUnit.SECONDS),
 "The timeout for AM registry registration, after which (on attempting 
to use the\n" +
 "session), we kill it and try to get another one."),
+HIVE_SERVER2_WM_DELAYED_MOVE("hive.server2.wm.delayed.move", false,
+"Determines behavior of the wm move trigger when destination pool is 
full.\n" +
+"If true, the query will run in source pool as long as possible if 
destination pool is full;\n" +
+"if false, the query will be killed if destination pool is full."),
+
HIVE_SERVER2_WM_DELAYED_MOVE_TIMEOUT("hive.server2.wm.delayed.move.timeout", 
"600",
+new TimeValidator(TimeUnit.SECONDS),
+"The amount of time a delayed move is allowed to run in the source 
pool,\n" +
+"when a delayed move session times out, the session is moved to the 
destination pool.\n"),

Review comment:
   If value 0 have special meaning such as "doesn't expire", then need to 
capture it here.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java
##
@@ -790,45 +842,72 @@ private void dumpPoolState(PoolState ps, List 
set) {
 }
   }
 
-  private void handleMoveSessionOnMasterThread(final MoveSession moveSession,
-final WmThreadSyncWork syncWork,
-final HashSet poolsToRedistribute,
-final Map toReuse,
-final Map recordMoveEvents) {
+  private static enum MoveSessionResult {
+OK, // Normal case - the session was moved.
+KILLED, // Killed because destination pool was full and delayed move is 
false.
+CONVERTED_TO_DELAYED_MOVE, // the move session was added to the pool's 
delayed moves as the dest. pool was full
+// and delayed move is true.
+ERROR
+  }
+
+  private MoveSessionResult handleMoveSessionOnMasterThread(final MoveSession 
moveSession,
+  final WmThreadSyncWork syncWork,
+  final HashSet poolsToRedistribute,
+  final Map toReuse,
+  final Map recordMoveEvents,
+  final boolean convertToDelayedMove) {
 String destPoolName = moveSession.destPool;
-LOG.info("Handling move session event: {}", moveSession);
+LOG.info("Handling move session event: {}, Convert to Delayed Move: {}", 
moveSession, convertToDelayedMove);
 if (validMove(moveSession.srcSession, destPoolName)) {
+  String srcPoolName = moveSession.srcSession.getPoolName();
+  PoolState srcPool = pools.get(srcPoolName);
+  boolean capacityAvailableInDest = capacityAvailable(destPoolName);
+  // If delayed move is set to true and if destination pool doesn't have 
enough capacity, don't kill the query.
+  // Let the query run in source 

[jira] [Assigned] (HIVE-25056) cast ('000-00-00 00:00:00' as timestamp/datetime) results in wrong conversion

2021-04-26 Thread Anurag Shekhar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Shekhar reassigned HIVE-25056:
-


> cast ('000-00-00 00:00:00' as timestamp/datetime) results in wrong conversion 
> --
>
> Key: HIVE-25056
> URL: https://issues.apache.org/jira/browse/HIVE-25056
> Project: Hive
>  Issue Type: Bug
>Reporter: Anurag Shekhar
>Assignee: Anurag Shekhar
>Priority: Minor
>
> select cast ('-00-00' as date) , cast ('000-00-00 00:00:00' as timestamp) 
> +--+---+
> |_c0|_c1|
> +--+---+
> |0002-11-30|0002-11-30 00:00:00.0|
> +--+---+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25033) HPL/SQL thrift call fails when returning null

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25033?focusedWorklogId=589028=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589028
 ]

ASF GitHub Bot logged work on HIVE-25033:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 07:42
Start Date: 26/Apr/21 07:42
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #2194:
URL: https://github.com/apache/hive/pull/2194#discussion_r620046903



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -10548,13 +10565,20 @@ public void 
drop_stored_procedure(StoredProcedureRequest request) throws MetaExc
 }
   }
 
-public Package find_package(GetPackageRequest request) throws MetaException {
+public Package find_package(GetPackageRequest request) throws MetaException, 
NoSuchObjectException {
 startFunction("find_package");
 Exception ex = null;
 try {
-  return getMS().findPackage(request);
+  Package pkg = getMS().findPackage(request);
+  if (pkg == null) {
+throw new NoSuchObjectException(
+"HPL/SQL package " + request.getDbName() + "." + 
request.getPackageName() + " does not exist");
+  }
+  return pkg;
 } catch (Exception e) {
-  LOG.error("Caught exception", e);
+  if (!(e instanceof NoSuchObjectException)) {
+LOG.error("Caught exception", e);

Review comment:
   No, but this is happening on HMS side. We log the exception in the 
client side. The former is written to metastore logs  while the latter in hs2 
logs and the stacktraces are not the same. Many of the other methods of hms 
handler do this as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589028)
Time Spent: 0.5h  (was: 20m)

> HPL/SQL thrift call fails when returning null
> -
>
> Key: HIVE-25033
> URL: https://issues.apache.org/jira/browse/HIVE-25033
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25050) Disable 'hive.metastore.acid.truncate.usebase' config as it's introducing backward incompatible change

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25050?focusedWorklogId=589025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589025
 ]

ASF GitHub Bot logged work on HIVE-25050:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 07:28
Start Date: 26/Apr/21 07:28
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #2212:
URL: https://github.com/apache/hive/pull/2212


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589025)
Time Spent: 20m  (was: 10m)

> Disable 'hive.metastore.acid.truncate.usebase' config as it's introducing 
> backward incompatible change
> --
>
> Key: HIVE-25050
> URL: https://issues.apache.org/jira/browse/HIVE-25050
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-20137 introduces new file metadata and writes it into base files during 
> truncate table operations.
> If there is an older version of HMS running in the cluster that doesn't 
> contain HIVE-20137 it won't be able to handle the new file metadata and will 
> fail to inspect tables that have been truncated, thereby skipping compacting 
> them.
> We should disable 'hive.metastore.acid.truncate.usebase' until we don't have 
> a fix that addresses the backward incompatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25052) Writing to Iceberg tables can fail when inserting empty result set

2021-04-26 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25052.
---
Resolution: Fixed

> Writing to Iceberg tables can fail when inserting empty result set
> --
>
> Key: HIVE-25052
> URL: https://issues.apache.org/jira/browse/HIVE-25052
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25052) Writing to Iceberg tables can fail when inserting empty result set

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25052?focusedWorklogId=589023=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-589023
 ]

ASF GitHub Bot logged work on HIVE-25052:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 07:21
Start Date: 26/Apr/21 07:21
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2213:
URL: https://github.com/apache/hive/pull/2213


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 589023)
Time Spent: 20m  (was: 10m)

> Writing to Iceberg tables can fail when inserting empty result set
> --
>
> Key: HIVE-25052
> URL: https://issues.apache.org/jira/browse/HIVE-25052
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588954
 ]

ASF GitHub Bot logged work on HIVE-25035:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 07:06
Start Date: 26/Apr/21 07:06
Worklog Time Spent: 10m 
  Work Description: aasha merged pull request #2197:
URL: https://github.com/apache/hive/pull/2197


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588954)
Time Spent: 1h  (was: 50m)

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=588957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588957
 ]

ASF GitHub Bot logged work on HIVE-25055:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 07:06
Start Date: 26/Apr/21 07:06
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #2218:
URL: https://github.com/apache/hive/pull/2218






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588957)
Time Spent: 1h 10m  (was: 1h)

> Improve the exception handling in HMSHandler
> 
>
> Key: HIVE-25055
> URL: https://issues.apache.org/jira/browse/HIVE-25055
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=588854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588854
 ]

ASF GitHub Bot logged work on HIVE-25055:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 06:54
Start Date: 26/Apr/21 06:54
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #2218:
URL: https://github.com/apache/hive/pull/2218






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588854)
Time Spent: 1h  (was: 50m)

> Improve the exception handling in HMSHandler
> 
>
> Key: HIVE-25055
> URL: https://issues.apache.org/jira/browse/HIVE-25055
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25035) Allow creating single copy tasks for configured paths during external table replication

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25035?focusedWorklogId=588780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588780
 ]

ASF GitHub Bot logged work on HIVE-25035:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 06:45
Start Date: 26/Apr/21 06:45
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2197:
URL: https://github.com/apache/hive/pull/2197#discussion_r618959413



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -671,10 +672,14 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData 
dmd, Path cmRoot, Hive
 Path dbRootData = new Path(bootstrapRoot, EximUtil.DATA_PATH_NAME + 
File.separator + dbName);
 boolean dataCopyAtLoad = 
conf.getBoolVar(HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET);
 ReplExternalTables externalTablesWriter = new ReplExternalTables(conf);
-Path dbPath = null;
 boolean isSingleCopyTaskForExternalTables =
-conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK)
-&& work.replScope.includeAllTables();
+conf.getBoolVar(REPL_EXTERNAL_WAREHOUSE_SINGLE_COPY_TASK) && 
work.replScope.includeAllTables();
+ArrayList singleCopyPaths = new ArrayList<>();
+if (db != null && isSingleCopyTaskForExternalTables) {

Review comment:
   can be added as a util

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -114,11 +112,27 @@ void dataLocationDump(Table table, FileList fileList,
 }
   }
 
-  void dbLocationDump(String dbName, Path dbLocation, FileList fileList,
-  HiveConf conf) throws Exception {
-Path fullyQualifiedDataLocation = PathBuilder
-.fullyQualifiedHDFSUri(dbLocation, FileSystem.get(hiveConf));
-dirLocationToCopy(dbName, fileList, fullyQualifiedDataLocation, conf);
+  void singleLocationsDump(List singlePathLocations, FileList 
fileList, HiveConf conf) throws Exception {

Review comment:
   nit : can rename the method to something more intuitive. or add comments

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -1680,4 +1682,104 @@ public void testDataCopyEndLog(boolean 
runCopyTasksOnTarget) throws Throwable {
 ctx.updateLoggers();
 appender.removeFromLogger(logger.getName());
   }
+
+  @Test
+  public void testSingleCopyTasksAtSource() throws Throwable {
+testDataCopyEndLog(false);
+  }
+
+  @Test
+  public void testSingleCopyTasksAtTarget() throws Throwable {
+testDataCopyEndLog(true);
+  }
+
+  public void testSingleCopyTasks(boolean runCopyTasksOnTarget)

Review comment:
   What happens if there are extra paths apart from table location in the 
parent path?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588780)
Time Spent: 50m  (was: 40m)

> Allow creating single copy tasks for configured paths during external table 
> replication
> ---
>
> Key: HIVE-25035
> URL: https://issues.apache.org/jira/browse/HIVE-25035
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As of now one tasks per table is created for external table replication, in 
> case there are multiple tables under one common directory, provide a way to 
> create a single task for all those table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23820) [HS2] Send tableId in request for get_table_request API

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23820?focusedWorklogId=588731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-588731
 ]

ASF GitHub Bot logged work on HIVE-23820:
-

Author: ASF GitHub Bot
Created on: 26/Apr/21 06:40
Start Date: 26/Apr/21 06:40
Worklog Time Spent: 10m 
  Work Description: sankarh merged pull request #2153:
URL: https://github.com/apache/hive/pull/2153


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 588731)
Time Spent: 2h  (was: 1h 50m)

> [HS2] Send tableId in request for get_table_request API
> ---
>
> Key: HIVE-23820
> URL: https://issues.apache.org/jira/browse/HIVE-23820
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Standalone Metastore
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)