[jira] [Assigned] (HIVE-24233) except subquery throws nullpointer with cbo disabled

2020-10-05 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-24233:
--


> except subquery throws nullpointer with cbo disabled
> 
>
> Key: HIVE-24233
> URL: https://issues.apache.org/jira/browse/HIVE-24233
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> Except and intersect was only implemented with Calcite in HIVE-12764. If cbo 
> is disabled it would just throw a nullpointer exception. We should at least 
> throw a SemanticException stating this is not supported.
> Repro:
> set hive.cbo.enable=false;
> create table test(id int);
> insert into table test values(1);
> select id from test except select id from test;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-10-05 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Comment: was deleted

(was: [~kgyrtkirk] As per your comments, I have changed the implementation. 
Please review the PR.
Thanks.)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can 

[jira] [Issue Comment Deleted] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-10-05 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Comment: was deleted

(was: [~kgyrtkirk] Does the new approach makes sense?)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class MsckPartitionExp

[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-10-05 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208490#comment-17208490
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~kgyrtkirk] Could you please review the PR?

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for 

[jira] [Comment Edited] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-05 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208486#comment-17208486
 ] 

Syed Shameerur Rahman edited comment on HIVE-18284 at 10/6/20, 5:15 AM:


[~kgyrtkirk] [~jcamachorodriguez] [~ashutoshc] ping for review request!


was (Author: srahman):
[~kgyrtkirk] [~jcamachorodriguez] ping for review request!

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.p

[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=495716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495716
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 06/Oct/20 05:00
Start Date: 06/Oct/20 05:00
Worklog Time Spent: 10m 
  Work Description: okumin commented on a change in pull request #1531:
URL: https://github.com/apache/hive/pull/1531#discussion_r56396



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2921,6 +2920,97 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
 }
   }
 
+  /**
+   * LateralViewJoinOperator changes the data size and column level statistics.
+   *
+   * A diagram of LATERAL VIEW.
+   *
+   *   [Lateral View Forward]
+   *  / \
+   *[Select]  [Select]
+   *||
+   *| [UDTF]
+   *\   /
+   *   [Lateral View Join]
+   *
+   * For each row of the source, the left branch just picks columns and the 
right branch processes UDTF.
+   * And then LVJ joins a row from the left branch with rows from the right 
branch.
+   * The join has one-to-many relationship since UDTF can generate multiple 
rows.
+   *
+   * This rule multiplies the stats from the left branch by T(right) / T(left) 
and sums up the both sides.
+   */
+  public static class LateralViewJoinStatsRule extends DefaultStatsRule 
implements SemanticNodeProcessor {
+@Override
+public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx,
+  Object... nodeOutputs) throws SemanticException {
+  final LateralViewJoinOperator lop = (LateralViewJoinOperator) nd;
+  final AnnotateStatsProcCtx aspCtx = (AnnotateStatsProcCtx) procCtx;
+  final HiveConf conf = aspCtx.getConf();
+
+  if (!isAllParentsContainStatistics(lop)) {
+return null;
+  }
+
+  final List> parents = 
lop.getParentOperators();
+  if (parents.size() != 2) {
+LOG.warn("LateralViewJoinOperator should have just two parents but 
actually has "
++ parents.size() + " parents.");
+return null;
+  }
+
+  final Statistics selectStats = 
parents.get(LateralViewJoinOperator.SELECT_TAG).getStatistics().clone();

Review comment:
   As for `udtfStats`, we can totally avoid clone.
   As for `udtfStats`, its column stats will be updated. However, looks like 
`StatsUtils.getColStatisticsFromExprMap` clones them?
   Anyway I think we can remove them if CI passes. I will try it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495716)
Time Spent: 1h  (was: 50m)

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0, 3.1.2, 2.3.7
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-05 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208486#comment-17208486
 ] 

Syed Shameerur Rahman commented on HIVE-18284:
--

[~kgyrtkirk] [~jcamachorodriguez] ping for review request!

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>

[jira] [Issue Comment Deleted] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-05 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Comment: was deleted

(was: [~jcamachorodriguez] Could you please review the PR?)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more
> {code}




[jira] [Issue Comment Deleted] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-10-05 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-18284:
-
Comment: was deleted

(was: [~kgyrtkirk]  I have addressed your comments. Please take a look!)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>   ... 17 more

[jira] [Updated] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-10-05 Thread Ashutosh Chauhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-24209:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Ganesha!

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-24209.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24069?focusedWorklogId=495713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495713
 ]

ASF GitHub Bot logged work on HIVE-24069:
-

Author: ASF GitHub Bot
Created on: 06/Oct/20 04:29
Start Date: 06/Oct/20 04:29
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1429:
URL: https://github.com/apache/hive/pull/1429#issuecomment-704020199


   @ashutoshc Could you please take a look? thanks much!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495713)
Time Spent: 40m  (was: 0.5h)

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files

2020-10-05 Thread Ashutosh Chauhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-24224.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks, Panos!

> Fix skipping header/footer for Hive on Tez on compressed files
> --
>
> Key: HIVE-24224
> URL: https://issues.apache.org/jira/browse/HIVE-24224
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Compressed file with Hive on Tez  returns header and footers - for both 
> select * and select count ( * ):
> {noformat}
> printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 
> 1357\",123\nrst,rst,rst" > data.csv
> hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
> bzip2 -f data.csv 
> hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/
> beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
>   sequence   int,
>   id string,
>   other  string) 
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
> TBLPROPERTIES (
>   'skip.header.line.count'='1',
>   'skip.footer.line.count'='1');"
> beeline -e "
>   SET hive.fetch.task.conversion = none;
>   SELECT * FROM default.bz2tst2;"
> +---+++
> | bz2tst2.sequence  | bz2tst2.id | bz2tst2.other  |
> +---+++
> | offset| id | other  |
> | 9 | 20200315 X00 1356  | 123|
> | 17| 20200315 X00 1357  | 123|
> | rst   | rst| rst|
> +---+++
> {noformat}
> PS: HIVE-22769 addressed the issue for Hive on LLAP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes

2020-10-05 Thread Ashutosh Chauhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-24205:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Mustafa!

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24224?focusedWorklogId=495707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495707
 ]

ASF GitHub Bot logged work on HIVE-24224:
-

Author: ASF GitHub Bot
Created on: 06/Oct/20 04:02
Start Date: 06/Oct/20 04:02
Worklog Time Spent: 10m 
  Work Description: ashutoshc commented on pull request #1546:
URL: https://github.com/apache/hive/pull/1546#issuecomment-704013804


   +1 LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495707)
Time Spent: 0.5h  (was: 20m)

> Fix skipping header/footer for Hive on Tez on compressed files
> --
>
> Key: HIVE-24224
> URL: https://issues.apache.org/jira/browse/HIVE-24224
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Compressed file with Hive on Tez  returns header and footers - for both 
> select * and select count ( * ):
> {noformat}
> printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 
> 1357\",123\nrst,rst,rst" > data.csv
> hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
> bzip2 -f data.csv 
> hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/
> beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
>   sequence   int,
>   id string,
>   other  string) 
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
> TBLPROPERTIES (
>   'skip.header.line.count'='1',
>   'skip.footer.line.count'='1');"
> beeline -e "
>   SET hive.fetch.task.conversion = none;
>   SELECT * FROM default.bz2tst2;"
> +---+++
> | bz2tst2.sequence  | bz2tst2.id | bz2tst2.other  |
> +---+++
> | offset| id | other  |
> | 9 | 20200315 X00 1356  | 123|
> | 17| 20200315 X00 1357  | 123|
> | rst   | rst| rst|
> +---+++
> {noformat}
> PS: HIVE-22769 addressed the issue for Hive on LLAP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24232) Incorrect translation of rollup expression from Calcite

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24232:
--
Labels: pull-request-available  (was: )

> Incorrect translation of rollup expression from Calcite
> ---
>
> Key: HIVE-24232
> URL: https://issues.apache.org/jira/browse/HIVE-24232
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Calcite, it is not necessary that the columns in the group set are in the 
> same order as the rollup. For instance, this is the Calcite representation of 
> a rollup for a given query:
> {code}
> HiveAggregate(group=[{1, 6, 7}], groups=[[{1, 6, 7}, {1, 7}, {1}, {}]], 
> agg#0=[sum($12)], agg#1=[count($12)], agg#2=[sum($4)], agg#3=[count($4)], 
> agg#4=[sum($15)], agg#5=[count($15)])
> {code}
> When we generate the Hive plan from the Calcite operator, we make such 
> assumption incorrectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24232) Incorrect translation of rollup expression from Calcite

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24232?focusedWorklogId=495704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495704
 ]

ASF GitHub Bot logged work on HIVE-24232:
-

Author: ASF GitHub Bot
Created on: 06/Oct/20 03:34
Start Date: 06/Oct/20 03:34
Worklog Time Spent: 10m 
  Work Description: jcamachor opened a new pull request #1554:
URL: https://github.com/apache/hive/pull/1554


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495704)
Remaining Estimate: 0h
Time Spent: 10m

> Incorrect translation of rollup expression from Calcite
> ---
>
> Key: HIVE-24232
> URL: https://issues.apache.org/jira/browse/HIVE-24232
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Calcite, it is not necessary that the columns in the group set are in the 
> same order as the rollup. For instance, this is the Calcite representation of 
> a rollup for a given query:
> {code}
> HiveAggregate(group=[{1, 6, 7}], groups=[[{1, 6, 7}, {1, 7}, {1}, {}]], 
> agg#0=[sum($12)], agg#1=[count($12)], agg#2=[sum($4)], agg#3=[count($4)], 
> agg#4=[sum($15)], agg#5=[count($15)])
> {code}
> When we generate the Hive plan from the Calcite operator, we make such 
> assumption incorrectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24232) Incorrect translation of rollup expression from Calcite

2020-10-05 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24232:
---
Status: Patch Available  (was: Open)

> Incorrect translation of rollup expression from Calcite
> ---
>
> Key: HIVE-24232
> URL: https://issues.apache.org/jira/browse/HIVE-24232
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> In Calcite, it is not necessary that the columns in the group set are in the 
> same order as the rollup. For instance, this is the Calcite representation of 
> a rollup for a given query:
> {code}
> HiveAggregate(group=[{1, 6, 7}], groups=[[{1, 6, 7}, {1, 7}, {1}, {}]], 
> agg#0=[sum($12)], agg#1=[count($12)], agg#2=[sum($4)], agg#3=[count($4)], 
> agg#4=[sum($15)], agg#5=[count($15)])
> {code}
> When we generate the Hive plan from the Calcite operator, we make such 
> assumption incorrectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24232) Incorrect translation of rollup expression from Calcite

2020-10-05 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208469#comment-17208469
 ] 

Jesus Camacho Rodriguez commented on HIVE-24232:


The PR also adds printing of grouping sets for the Hive Group By operators in 
the Hive plan.

> Incorrect translation of rollup expression from Calcite
> ---
>
> Key: HIVE-24232
> URL: https://issues.apache.org/jira/browse/HIVE-24232
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> In Calcite, it is not necessary that the columns in the group set are in the 
> same order as the rollup. For instance, this is the Calcite representation of 
> a rollup for a given query:
> {code}
> HiveAggregate(group=[{1, 6, 7}], groups=[[{1, 6, 7}, {1, 7}, {1}, {}]], 
> agg#0=[sum($12)], agg#1=[count($12)], agg#2=[sum($4)], agg#3=[count($4)], 
> agg#4=[sum($15)], agg#5=[count($15)])
> {code}
> When we generate the Hive plan from the Calcite operator, we make such 
> assumption incorrectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24232) Incorrect translation of rollup expression from Calcite

2020-10-05 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-24232:
--


> Incorrect translation of rollup expression from Calcite
> ---
>
> Key: HIVE-24232
> URL: https://issues.apache.org/jira/browse/HIVE-24232
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> In Calcite, it is not necessary that the columns in the group set are in the 
> same order as the rollup. For instance, this is the Calcite representation of 
> a rollup for a given query:
> {code}
> HiveAggregate(group=[{1, 6, 7}], groups=[[{1, 6, 7}, {1, 7}, {1}, {}]], 
> agg#0=[sum($12)], agg#1=[count($12)], agg#2=[sum($4)], agg#3=[count($4)], 
> agg#4=[sum($15)], agg#5=[count($15)])
> {code}
> When we generate the Hive plan from the Calcite operator, we make such 
> assumption incorrectly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24203) Implement stats annotation rule for the LateralViewJoinOperator

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24203?focusedWorklogId=495700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495700
 ]

ASF GitHub Bot logged work on HIVE-24203:
-

Author: ASF GitHub Bot
Created on: 06/Oct/20 03:14
Start Date: 06/Oct/20 03:14
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1531:
URL: https://github.com/apache/hive/pull/1531#discussion_r497866947



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2921,6 +2920,97 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
 }
   }
 
+  /**
+   * LateralViewJoinOperator changes the data size and column level statistics.
+   *
+   * A diagram of LATERAL VIEW.
+   *
+   *   [Lateral View Forward]
+   *  / \
+   *[Select]  [Select]
+   *||
+   *| [UDTF]
+   *\   /
+   *   [Lateral View Join]
+   *
+   * For each row of the source, the left branch just picks columns and the 
right branch processes UDTF.
+   * And then LVJ joins a row from the left branch with rows from the right 
branch.
+   * The join has one-to-many relationship since UDTF can generate multiple 
rows.
+   *
+   * This rule multiplies the stats from the left branch by T(right) / T(left) 
and sums up the both sides.
+   */
+  public static class LateralViewJoinStatsRule extends DefaultStatsRule 
implements SemanticNodeProcessor {
+@Override
+public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx,
+  Object... nodeOutputs) throws SemanticException {
+  final LateralViewJoinOperator lop = (LateralViewJoinOperator) nd;
+  final AnnotateStatsProcCtx aspCtx = (AnnotateStatsProcCtx) procCtx;
+  final HiveConf conf = aspCtx.getConf();
+
+  if (!isAllParentsContainStatistics(lop)) {
+return null;
+  }
+
+  final List> parents = 
lop.getParentOperators();
+  if (parents.size() != 2) {
+LOG.warn("LateralViewJoinOperator should have just two parents but 
actually has "
++ parents.size() + " parents.");
+return null;
+  }
+
+  final Statistics selectStats = 
parents.get(LateralViewJoinOperator.SELECT_TAG).getStatistics().clone();

Review comment:
   Do you need to clone them? Are you modifying them? (Same for next line)

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -2921,6 +2920,97 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx,
 }
   }
 
+  /**
+   * LateralViewJoinOperator changes the data size and column level statistics.
+   *
+   * A diagram of LATERAL VIEW.
+   *
+   *   [Lateral View Forward]
+   *  / \
+   *[Select]  [Select]
+   *||
+   *| [UDTF]
+   *\   /
+   *   [Lateral View Join]
+   *
+   * For each row of the source, the left branch just picks columns and the 
right branch processes UDTF.
+   * And then LVJ joins a row from the left branch with rows from the right 
branch.
+   * The join has one-to-many relationship since UDTF can generate multiple 
rows.

Review comment:
   Just leaving a note. I took a quick look at the UDTF logic and it seems 
the selectivity is hardcoded via config. It seems the outer flag is not taken 
into account either, which could be a straightforward improvement for the 
estimates, i.e., UDFT will produce at least as many rows as it receives.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495700)
Time Spent: 50m  (was: 40m)

> Implement stats annotation rule for the LateralViewJoinOperator
> ---
>
> Key: HIVE-24203
> URL: https://issues.apache.org/jira/browse/HIVE-24203
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0, 3.1.2, 2.3.7
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> StatsRulesProcFactory doesn't have any rules to handle a JOIN by LATERAL VIEW.
> This can cause an underestimation in case that UDTF in LATERAL VIEW generates 
> multiple rows.
> HIVE-20262 has already added the rule for UDTF.
> This issue would add the rule for LateralViewJoinOperator.

[jira] [Work logged] (HIVE-24202) Clean up local HS2 HMS cache code (II)

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24202?focusedWorklogId=495682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495682
 ]

ASF GitHub Bot logged work on HIVE-24202:
-

Author: ASF GitHub Bot
Created on: 06/Oct/20 01:47
Start Date: 06/Oct/20 01:47
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1543:
URL: https://github.com/apache/hive/pull/1543#issuecomment-703980374


   @vineetgarg02 , could you take a look? Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495682)
Time Spent: 20m  (was: 10m)

> Clean up local HS2 HMS cache code (II)
> --
>
> Key: HIVE-24202
> URL: https://issues.apache.org/jira/browse/HIVE-24202
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Follow-up for HIVE-24183 (split into different JIRAs).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23712) metadata-only queries return incorrect results with empty acid partition

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23712?focusedWorklogId=495661&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495661
 ]

ASF GitHub Bot logged work on HIVE-23712:
-

Author: ASF GitHub Bot
Created on: 06/Oct/20 00:52
Start Date: 06/Oct/20 00:52
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1182:
URL: https://github.com/apache/hive/pull/1182#issuecomment-703966215


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495661)
Time Spent: 20m  (was: 10m)

> metadata-only queries return incorrect results with empty acid partition
> 
>
> Key: HIVE-23712
> URL: https://issues.apache.org/jira/browse/HIVE-23712
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Similarly to HIVE-15397, queries can return incorrect results for 
> metadata-only queries, here is a repro scenario which affects master:
> {code}
> set hive.support.concurrency=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.metadataonly=true;
> create table test1 (id int, val string) partitioned by (val2 string) STORED 
> AS ORC TBLPROPERTIES ('transactional'='true');
> describe formatted test1;
> alter table test1 add partition (val2='foo');
> alter table test1 add partition (val2='bar');
> insert into test1 partition (val2='foo') values (1, 'abc');
> select distinct val2, current_timestamp from test1;
> insert into test1 partition (val2='bar') values (1, 'def');
> delete from test1 where val2 = 'bar';
> select '--> hive.optimize.metadataonly=true';
> select distinct val2, current_timestamp from test1;
> set hive.optimize.metadataonly=false;
> select '--> hive.optimize.metadataonly=false';
> select distinct val2, current_timestamp from test1;
> select current_timestamp, * from test1;
> {code}
> in this case 2 rows returned instead of 1 after a delete with metadata only 
> optimization:
> https://github.com/abstractdog/hive/commit/a7f03513564d01f7c3ba4aa61c4c6537100b4d3f#diff-cb23043000831f41fe7041cb38f82224R114-R128



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23757) Pushing TopN Key operator through MAPJOIN

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23757?focusedWorklogId=495662&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495662
 ]

ASF GitHub Bot logged work on HIVE-23757:
-

Author: ASF GitHub Bot
Created on: 06/Oct/20 00:52
Start Date: 06/Oct/20 00:52
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1181:
URL: https://github.com/apache/hive/pull/1181


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495662)
Time Spent: 40m  (was: 0.5h)

> Pushing TopN Key operator through MAPJOIN
> -
>
> Key: HIVE-23757
> URL: https://issues.apache.org/jira/browse/HIVE-23757
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> So far only MERGEJOIN + JOIN cases are handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-24209) Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled

2020-10-05 Thread Ganesha Shreedhara (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-24209:
--
Comment: was deleted

(was: [~ashutoshc] Thanks for reviewing. Please help with pushing this fix to 
master. )

> Incorrect search argument conversion for NOT BETWEEN operation when 
> vectorization is enabled
> 
>
> Key: HIVE-24209
> URL: https://issues.apache.org/jira/browse/HIVE-24209
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24209.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We skipped adding GenericUDFOPNot UDF in filter expression for NOT BETWEEN 
> operation when vectorization is enabled because of the improvement done as 
> part of HIVE-15884. But, this is not handled during the conversion of filter 
> expression to search argument due to which incorrect predicate gets pushed 
> down to storage layer that leads to incorrect splits generation and incorrect 
> result. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?focusedWorklogId=495617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495617
 ]

ASF GitHub Bot logged work on HIVE-24120:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 22:22
Start Date: 05/Oct/20 22:22
Worklog Time Spent: 10m 
  Work Description: gatorblue commented on a change in pull request #1470:
URL: https://github.com/apache/hive/pull/1470#discussion_r499904277



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java
##
@@ -20,71 +20,666 @@
 
 import java.sql.SQLException;
 import java.sql.SQLTransactionRollbackException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
 
-/** Database product infered via JDBC. */
-public enum DatabaseProduct {
-  DERBY, MYSQL, POSTGRES, ORACLE, SQLSERVER, OTHER;
+import org.apache.hadoop.conf.Configurable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
+import com.google.common.base.Preconditions;
+
+/** Database product inferred via JDBC. Encapsulates all SQL logic associated 
with
+ * the database product.
+ * This class is a singleton, which is instantiated the first time
+ * method determineDatabaseProduct is invoked.
+ * Tests that need to create multiple instances can use the reset method
+ * */
+public class DatabaseProduct implements Configurable {
+  static final private Logger LOG = 
LoggerFactory.getLogger(DatabaseProduct.class.getName());
+
+  private static enum DbType {DERBY, MYSQL, POSTGRES, ORACLE, SQLSERVER, 
CUSTOM, UNDEFINED};
+  public DbType dbType;
+  
+  // Singleton instance
+  private static DatabaseProduct theDatabaseProduct;
+
+  Configuration myConf;
+  /**
+   * Protected constructor for singleton class
+   * @param id
+   */
+  protected DatabaseProduct() {}
+
+  public static final String DERBY_NAME = "derby";
+  public static final String SQL_SERVER_NAME = "microsoft sql server";
+  public static final String MYSQL_NAME = "mysql";
+  public static final String POSTGRESQL_NAME = "postgresql";
+  public static final String ORACLE_NAME = "oracle";
+  public static final String UNDEFINED_NAME = "other";
+  
   /**
* Determine the database product type
* @param productName string to defer database connection
* @return database product type
*/
-  public static DatabaseProduct determineDatabaseProduct(String productName) 
throws SQLException {
-if (productName == null) {
-  return OTHER;
+  public static DatabaseProduct determineDatabaseProduct(String productName, 
Configuration c) {
+DbType dbt;
+
+if (theDatabaseProduct != null) {
+  Preconditions.checkState(theDatabaseProduct.dbType == 
getDbType(productName));
+  return theDatabaseProduct;
 }
+
+// This method may be invoked by concurrent connections
+synchronized (DatabaseProduct.class) {
+
+  if (productName == null) {
+productName = UNDEFINED_NAME;
+  }
+
+  dbt = getDbType(productName);
+
+  // Check for null again in case of race condition
+  if (theDatabaseProduct == null) {
+final Configuration conf = c!= null ? c : 
MetastoreConf.newMetastoreConf();
+// Check if we are using an external database product
+boolean isExternal = MetastoreConf.getBoolVar(conf, 
ConfVars.USE_CUSTOM_RDBMS);
+
+if (isExternal) {
+  // The DatabaseProduct will be created by instantiating an external 
class via
+  // reflection. The external class can override any method in the 
current class
+  String className = MetastoreConf.getVar(conf, 
ConfVars.CUSTOM_RDBMS_CLASSNAME);
+  
+  if (className != null) {
+try {
+  theDatabaseProduct = (DatabaseProduct)
+  ReflectionUtils.newInstance(Class.forName(className), conf);
+  
+  LOG.info(String.format("Using custom RDBMS %s. Overriding 
DbType: %s", className, dbt));

Review comment:
   Yeah, I put this for my own unit testing. Removed it now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495617)
Time Spent: 1h 40m  (was: 1.5h)

> Plugin for external Datab

[jira] [Work logged] (HIVE-19253) HMS ignores tableType property for external tables

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19253?focusedWorklogId=495600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495600
 ]

ASF GitHub Bot logged work on HIVE-19253:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 21:49
Start Date: 05/Oct/20 21:49
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1537:
URL: https://github.com/apache/hive/pull/1537#discussion_r499890809



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestObjectStore.java
##
@@ -931,6 +924,64 @@ public void testNotificationOps() throws 
InterruptedException, MetaException {
 Assert.assertEquals(0, eventResponse.getEventsSize());
   }
 
+  /**
+   * Verify that table type is set correctly based on input table properties.
+   * Two things are verified:
+   * 
+   *   When EXTERNAL property is set to true, table type 
should be external
+   *   When table type is set to external it should remain external
+   * 
+   * @throws Exception
+   */
+  @Test
+  public void testExternalTable() throws Exception {
+Database db1 = new DatabaseBuilder()
+.setName(DB1)
+.setDescription("description")
+.setLocation("locationurl")
+.build(conf);
+objectStore.createDatabase(db1);
+
+List tables = new ArrayList<>(4);
+Map expectedValues = new HashMap<>();
+
+int i = 1;
+// Case 1: EXTERNAL = true, tableType == MANAGED_TABLE
+// The result should be external table
+Table tbl1 = buildTable(conf, db1, "t" + i++, true, null);
+tables.add(tbl1);
+expectedValues.put(tbl1.getTableName(), true);
+// Case 2: EXTERNAL = false, tableType == EXTERNAL_TABLE
+// The result should be external table
+Table tbl2 = buildTable(conf, db1, "t" + i++, false, 
TableType.EXTERNAL_TABLE.name());
+tables.add(tbl2);
+expectedValues.put(tbl2.getTableName(), true);
+// Case 3: EXTERNAL = false, tableType == EXTERNAL_TABLE

Review comment:
   the comment should state EXTERNAL = true

##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestObjectStore.java
##
@@ -84,15 +85,7 @@
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.sql.Statement;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.LinkedList;
-import java.util.List;
-import java.util.Random;
-import java.util.Set;
-import java.util.UUID;
+import java.util.*;

Review comment:
   this change can be reverted since we don't use wildcard imports.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495600)
Time Spent: 20m  (was: 10m)

> HMS ignores tableType property for external tables
> --
>
> Key: HIVE-19253
> URL: https://issues.apache.org/jira/browse/HIVE-19253
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: Alex Kolbasov
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-19253.01.patch, HIVE-19253.02.patch, 
> HIVE-19253.03.patch, HIVE-19253.03.patch, HIVE-19253.04.patch, 
> HIVE-19253.05.patch, HIVE-19253.06.patch, HIVE-19253.07.patch, 
> HIVE-19253.08.patch, HIVE-19253.09.patch, HIVE-19253.10.patch, 
> HIVE-19253.11.patch, HIVE-19253.12.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When someone creates a table using Thrift API they may think that setting 
> tableType to {{EXTERNAL_TABLE}} creates an external table. And boom - their 
> table is gone later because HMS will silently change it to managed table.
> here is the offending code:
> {code:java}
>   private MTable convertToMTable(Table tbl) throws InvalidObjectException,
>   MetaException {
> ...
> // If the table has property EXTERNAL set, update table type
> // accordingly
> String tableType = tbl.getTableType();
> boolean isExternal = 
> Boolean.parseBoolean(tbl.getParameters().get("EXTERNAL"));
> if (TableType.MANAGED_TABLE.toString().equals(tableType)) {
>   if (isExternal) {
> tableType = TableType.EXTERNAL_TABLE.toString();
>   }
> }
> if (TableType.EXTERNAL_TABLE.toString().equals(tableType)) {
>   if (!isExternal) { // Here!
> tableType = TableType.MANAGED

[jira] [Work logged] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?focusedWorklogId=495585&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495585
 ]

ASF GitHub Bot logged work on HIVE-24120:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 21:18
Start Date: 05/Oct/20 21:18
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1470:
URL: https://github.com/apache/hive/pull/1470#discussion_r499869697



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java
##
@@ -20,71 +20,666 @@
 
 import java.sql.SQLException;
 import java.sql.SQLTransactionRollbackException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
 
-/** Database product infered via JDBC. */
-public enum DatabaseProduct {
-  DERBY, MYSQL, POSTGRES, ORACLE, SQLSERVER, OTHER;
+import org.apache.hadoop.conf.Configurable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
+import com.google.common.base.Preconditions;
+
+/** Database product inferred via JDBC. Encapsulates all SQL logic associated 
with
+ * the database product.
+ * This class is a singleton, which is instantiated the first time
+ * method determineDatabaseProduct is invoked.
+ * Tests that need to create multiple instances can use the reset method
+ * */
+public class DatabaseProduct implements Configurable {
+  static final private Logger LOG = 
LoggerFactory.getLogger(DatabaseProduct.class.getName());
+
+  private static enum DbType {DERBY, MYSQL, POSTGRES, ORACLE, SQLSERVER, 
CUSTOM, UNDEFINED};
+  public DbType dbType;
+  
+  // Singleton instance
+  private static DatabaseProduct theDatabaseProduct;
+
+  Configuration myConf;
+  /**
+   * Protected constructor for singleton class
+   * @param id
+   */
+  protected DatabaseProduct() {}
+
+  public static final String DERBY_NAME = "derby";
+  public static final String SQL_SERVER_NAME = "microsoft sql server";
+  public static final String MYSQL_NAME = "mysql";
+  public static final String POSTGRESQL_NAME = "postgresql";
+  public static final String ORACLE_NAME = "oracle";
+  public static final String UNDEFINED_NAME = "other";
+  
   /**
* Determine the database product type
* @param productName string to defer database connection
* @return database product type
*/
-  public static DatabaseProduct determineDatabaseProduct(String productName) 
throws SQLException {
-if (productName == null) {
-  return OTHER;
+  public static DatabaseProduct determineDatabaseProduct(String productName, 
Configuration c) {
+DbType dbt;
+
+if (theDatabaseProduct != null) {
+  Preconditions.checkState(theDatabaseProduct.dbType == 
getDbType(productName));
+  return theDatabaseProduct;
 }
+
+// This method may be invoked by concurrent connections
+synchronized (DatabaseProduct.class) {
+
+  if (productName == null) {
+productName = UNDEFINED_NAME;
+  }
+
+  dbt = getDbType(productName);
+
+  // Check for null again in case of race condition
+  if (theDatabaseProduct == null) {
+final Configuration conf = c!= null ? c : 
MetastoreConf.newMetastoreConf();
+// Check if we are using an external database product
+boolean isExternal = MetastoreConf.getBoolVar(conf, 
ConfVars.USE_CUSTOM_RDBMS);
+
+if (isExternal) {
+  // The DatabaseProduct will be created by instantiating an external 
class via
+  // reflection. The external class can override any method in the 
current class
+  String className = MetastoreConf.getVar(conf, 
ConfVars.CUSTOM_RDBMS_CLASSNAME);
+  
+  if (className != null) {
+try {
+  theDatabaseProduct = (DatabaseProduct)
+  ReflectionUtils.newInstance(Class.forName(className), conf);
+  
+  LOG.info(String.format("Using custom RDBMS %s. Overriding 
DbType: %s", className, dbt));
+  dbt = DbType.CUSTOM;
+}catch (Exception e) {
+  LOG.warn("Caught exception instantiating custom database 
product. Reverting to " + dbt, e);
+}
+  }
+  else {

Review comment:
   nit, the else could go in the same line as 113 as per the coding 
conventions.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java
##
@@ -20,71 +20,666 @@
 
 import java.sql.SQLException;
 import java.sql.SQLT

[jira] [Work logged] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?focusedWorklogId=495584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495584
 ]

ASF GitHub Bot logged work on HIVE-24120:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 21:12
Start Date: 05/Oct/20 21:12
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on a change in pull request #1470:
URL: https://github.com/apache/hive/pull/1470#discussion_r499869127



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java
##
@@ -20,71 +20,666 @@
 
 import java.sql.SQLException;
 import java.sql.SQLTransactionRollbackException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
 
-/** Database product infered via JDBC. */
-public enum DatabaseProduct {
-  DERBY, MYSQL, POSTGRES, ORACLE, SQLSERVER, OTHER;
+import org.apache.hadoop.conf.Configurable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
+import com.google.common.base.Preconditions;
+
+/** Database product inferred via JDBC. Encapsulates all SQL logic associated 
with
+ * the database product.
+ * This class is a singleton, which is instantiated the first time
+ * method determineDatabaseProduct is invoked.
+ * Tests that need to create multiple instances can use the reset method
+ * */
+public class DatabaseProduct implements Configurable {
+  static final private Logger LOG = 
LoggerFactory.getLogger(DatabaseProduct.class.getName());
+
+  private static enum DbType {DERBY, MYSQL, POSTGRES, ORACLE, SQLSERVER, 
CUSTOM, UNDEFINED};
+  public DbType dbType;
+  
+  // Singleton instance
+  private static DatabaseProduct theDatabaseProduct;
+
+  Configuration myConf;
+  /**
+   * Protected constructor for singleton class
+   * @param id
+   */
+  protected DatabaseProduct() {}
+
+  public static final String DERBY_NAME = "derby";
+  public static final String SQL_SERVER_NAME = "microsoft sql server";
+  public static final String MYSQL_NAME = "mysql";
+  public static final String POSTGRESQL_NAME = "postgresql";
+  public static final String ORACLE_NAME = "oracle";
+  public static final String UNDEFINED_NAME = "other";
+  
   /**
* Determine the database product type
* @param productName string to defer database connection
* @return database product type
*/
-  public static DatabaseProduct determineDatabaseProduct(String productName) 
throws SQLException {
-if (productName == null) {
-  return OTHER;
+  public static DatabaseProduct determineDatabaseProduct(String productName, 
Configuration c) {
+DbType dbt;
+
+if (theDatabaseProduct != null) {
+  Preconditions.checkState(theDatabaseProduct.dbType == 
getDbType(productName));
+  return theDatabaseProduct;
 }
+
+// This method may be invoked by concurrent connections
+synchronized (DatabaseProduct.class) {
+
+  if (productName == null) {
+productName = UNDEFINED_NAME;
+  }
+
+  dbt = getDbType(productName);
+
+  // Check for null again in case of race condition
+  if (theDatabaseProduct == null) {
+final Configuration conf = c!= null ? c : 
MetastoreConf.newMetastoreConf();
+// Check if we are using an external database product
+boolean isExternal = MetastoreConf.getBoolVar(conf, 
ConfVars.USE_CUSTOM_RDBMS);
+
+if (isExternal) {
+  // The DatabaseProduct will be created by instantiating an external 
class via
+  // reflection. The external class can override any method in the 
current class
+  String className = MetastoreConf.getVar(conf, 
ConfVars.CUSTOM_RDBMS_CLASSNAME);
+  
+  if (className != null) {
+try {
+  theDatabaseProduct = (DatabaseProduct)
+  ReflectionUtils.newInstance(Class.forName(className), conf);
+  
+  LOG.info(String.format("Using custom RDBMS %s. Overriding 
DbType: %s", className, dbt));

Review comment:
   The Overriding DbType: is bit confusing. Why is that log useful?

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -1337,6 +1337,15 @@ public static ConfVars getMetaConf(String name) {
 HIVE_TXN_STATS_ENABLED("hive.txn.stats.enabled", "hive.txn.stats.enabled", 
true,
 "Whether Hive supports transactional stats (accurate stats for 
transactional tables)"),
 
+// External RDBMS support
+USE_CUSTOM_RDBMS

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495528
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 19:39
Start Date: 05/Oct/20 19:39
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499827644



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
##
@@ -237,6 +237,7 @@ void run(HiveConf conf, String jobName, Table t, Partition 
p, StorageDescriptor
 }
 
 JobConf job = createBaseJobConf(conf, jobName, t, sd, writeIds, ci);
+QueryCompactor.Util.removeAbortedDirsForAcidTable(conf, dir);

Review comment:
   removed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495528)
Time Spent: 5.5h  (was: 5h 20m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495463&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495463
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 17:21
Start Date: 05/Oct/20 17:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499754946



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {
+  return true;
+} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) {
+  return true;
+}
+  }
+  return false;
+};
+List deleted = new ArrayList<>();
+deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted);
+return deleted;
+  }
+
+  private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, 
PathFilter filter, List deleted)
+  throws IOException {
+RemoteIterator it = listIterator(fs, root, null);
+
+while (it.hasNext()) {
+  FileStatus fStatus = it.next();
+  if (fStatus.isDirectory()) {
+if (filter.accept(fStatus.getPath())) {
+  fs.delete(fStatus.getPath(), true);
+  deleted.add(fStatus);
+} else {
+  deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted);
+  if (isDirectoryEmpty(fs, fStatus.getPath())) {
+fs.delete(fStatus.getPath(), false);
+deleted.add(fStatus);
+  }
+}
+  }
+}
+  }
+
+  private static boolean isDirectoryEmpty(FileSystem fs, Path path) throws 
IOException {
+RemoteIterator it = listIterator(fs, path, null);
+return !it.hasNext();
+  }
+
+  private static RemoteIterator listIterator(FileSystem fs, Path 
path, PathFilter filter)
+  throws IOException {
+try {
+  return new ToFileStatusIterator(SHIMS.listLocatedHdfsStatusIterator(fs, 
path, filter));
+} catch (Throwable t) {

Review comment:
   removed it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495463)
Time Spent: 5h 20m  (was: 5h 10m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> 

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495460&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495460
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 17:20
Start Date: 05/Oct/20 17:20
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499754142



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {
+  return true;
+} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) {
+  return true;
+}
+  }
+  return false;
+};
+List deleted = new ArrayList<>();
+deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted);
+return deleted;
+  }
+
+  private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, 
PathFilter filter, List deleted)

Review comment:
   changed to use getHdfsDirSnapshots, 
   @pvargacl do you know. if i should access cached data somehow?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495460)
Time Spent: 5h  (was: 4h 50m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495461&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495461
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 17:20
Start Date: 05/Oct/20 17:20
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499754423



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {

Review comment:
   changed, also excluded base directory from listing





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495461)
Time Spent: 5h 10m  (was: 5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495459&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495459
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 17:12
Start Date: 05/Oct/20 17:12
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499749748



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {
+  return true;
+} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) {
+  return true;
+}
+  }
+  return false;
+};
+List deleted = new ArrayList<>();
+deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted);
+return deleted;
+  }
+
+  private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, 
PathFilter filter, List deleted)
+  throws IOException {
+RemoteIterator it = listIterator(fs, root, null);
+
+while (it.hasNext()) {
+  FileStatus fStatus = it.next();
+  if (fStatus.isDirectory()) {
+if (filter.accept(fStatus.getPath())) {
+  fs.delete(fStatus.getPath(), true);
+  deleted.add(fStatus);
+} else {
+  deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted);
+  if (isDirectoryEmpty(fs, fStatus.getPath())) {

Review comment:
   + partitions are not removed in HMS





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495459)
Time Spent: 4h 50m  (was: 4h 40m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495457&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495457
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 17:08
Start Date: 05/Oct/20 17:08
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499747334



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -107,11 +107,12 @@ public CompactionTxnHandler() {
 // Check for aborted txns: number of aborted txns past threshold and 
age of aborted txns
 // past time threshold
 boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
-final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\","
-+ "MIN(\"TXN_STARTED\"), COUNT(*)"
+String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\", "
++ "MIN(\"TXN_STARTED\"), COUNT(*), "
++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART 
+ " THEN 1 ELSE 0 END) AS \"IS_DP\" "

Review comment:
   why is that? aborted dynPart is just a special case that would be 
handled separately (IS_DP=1). 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495457)
Time Spent: 4h 40m  (was: 4.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24231) Enhance shared work optimizer to merge scans with semijoin filters on both sides

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?focusedWorklogId=495441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495441
 ]

ASF GitHub Bot logged work on HIVE-24231:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 16:49
Start Date: 05/Oct/20 16:49
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #1553:
URL: https://github.com/apache/hive/pull/1553


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495441)
Remaining Estimate: 0h
Time Spent: 10m

> Enhance shared work optimizer to merge scans with semijoin filters on both 
> sides
> 
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24231) Enhance shared work optimizer to merge scans with semijoin filters on both sides

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24231:
--
Labels: pull-request-available  (was: )

> Enhance shared work optimizer to merge scans with semijoin filters on both 
> sides
> 
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24231) Enhance shared work optimizer to merge scans with semijoin filters on both sides

2020-10-05 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24231:
---


> Enhance shared work optimizer to merge scans with semijoin filters on both 
> sides
> 
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23867) Truncate table fail with AccessControlException if doAs enabled and tbl database has source of replication

2020-10-05 Thread Anishek Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208129#comment-17208129
 ] 

Anishek Agarwal commented on HIVE-23867:


all managed table locations should be owned by hive. i dont think we should 
support otherwise cc [~thejas]

> Truncate table fail with AccessControlException if doAs enabled and tbl 
> database has source of replication
> --
>
> Key: HIVE-23867
> URL: https://issues.apache.org/jira/browse/HIVE-23867
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, repl
>Affects Versions: 3.1.1
>Reporter: Rajkumar Singh
>Priority: Major
>
> Steps to repro:
> 1. enable doAs
> 2. with some user (not a super user) create database 
> create database sampledb with dbproperties('repl.source.for'='1,2,3');
> 3. create table using create table sampledb.sampletble (id int);
> 4. insert some data into it insert into sampledb.sampletble values (1), 
> (2),(3);
> 5. Run truncate command on the table which fail with following error
> {code:java}
>  org.apache.hadoop.ipc.RemoteException: User username is not a super user 
> (non-super user cannot change owner).
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:85)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1907)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:866)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:531)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
>  at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1498) 
> ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at org.apache.hadoop.ipc.Client.call(Client.java:1444) 
> ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at org.apache.hadoop.ipc.Client.call(Client.java:1354) 
> ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>  ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>  ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at com.sun.proxy.$Proxy31.setOwner(Unknown Source) ~[?:?]
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setOwner(ClientNamenodeProtocolTranslatorPB.java:470)
>  ~[hadoop-hdfs-client-3.1.1.3.1.5.0-152.jar:?]
>  at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source) ~[?:?]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_232]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232]
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  [hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  ~[hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  [hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>  [hadoop-common-3.1.1.3.1.5.0-152.jar:?]
>  at com.sun.proxy.$Proxy32.setOwner(Unknown Source) [?:?]
>  at org.apache.hadoop.hdfs.DFSClient.setOwner(DFSClient.java:1914) 
> [hadoop-hdfs-client-3.1.1.3.1.5.0-152.jar:?]
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$36.doCall(DistributedFileSystem.java:1764)
>  [hadoop-hdfs-client-3.1.1.3.1.5.0-152.jar:?]
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$36.doCall(DistributedFileSystem.java:1761)
>  [hadoop-hdfs-client-3.1.1.3.1.5.0-152.jar:

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495383&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495383
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 15:10
Start Date: 05/Oct/20 15:10
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499671848



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {

Review comment:
   You are right, I got confused, the p entry will solve this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495383)
Time Spent: 4.5h  (was: 4h 20m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495373&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495373
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:53
Start Date: 05/Oct/20 14:53
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499659807



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLTask.java
##
@@ -82,8 +89,32 @@ public int execute() {
 throw new IllegalArgumentException("Unknown DDL request: " + 
ddlDesc.getClass());
   }
 } catch (Throwable e) {
+  LOG.error("DDLTask failed", e);
+  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  try {
+ReplicationMetricCollector metricCollector = work.getMetricCollector();
+if (errorCode > 4) {
+  //in case of replication related task, dumpDirectory should not be 
null
+  if(work.dumpDirectory != null) {
+Path nonRecoverableMarker = new Path(work.dumpDirectory, 
ReplAck.NON_RECOVERABLE_MARKER.toString());
+org.apache.hadoop.hive.ql.parse.repl.dump.Utils.writeStackTrace(e, 
nonRecoverableMarker, conf);
+if(metricCollector != null){
+  metricCollector.reportStageEnd(getName(), Status.FAILED_ADMIN, 
nonRecoverableMarker.toString());
+}
+  }
+  if(metricCollector != null){

Review comment:
   In replication flows, dumpDirectory and metricCollector both should be 
non-null. This line considers the corner case where metricCollector might have 
been configured but not dumpDirectory. Still it is a replication case since 
only replication tasks can initialise and pass metricCollector. So we should 
indicate FAILED_ADMIN state at-least (non-recoverable path is null).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495373)
Time Spent: 2.5h  (was: 2h 20m)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495369&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495369
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:49
Start Date: 05/Oct/20 14:49
Worklog Time Spent: 10m 
  Work Description: vpnvishv commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499656397



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {

Review comment:
   I was also wondering the same, as this code was there in the earlier 
patches so I have just kept it. We can remove this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495369)
Time Spent: 4h 20m  (was: 4h 10m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495368
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:47
Start Date: 05/Oct/20 14:47
Worklog Time Spent: 10m 
  Work Description: vpnvishv commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499655306



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {

Review comment:
   @pvargacl Sorry I may be missing something here, but with this change, 
how can compactor read the data of an aborted delta. It should be in the 
aborted list right, due to this dummy p type entry in TXN_COMPONENTS?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495368)
Time Spent: 4h 10m  (was: 4h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495367
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:46
Start Date: 05/Oct/20 14:46
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499654760



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/AlterDatabaseHandler.java
##
@@ -77,9 +79,22 @@
 alterDbDesc = new AlterDatabaseSetOwnerDesc(actualDbName, new 
PrincipalDesc(newDb.getOwnerName(),
 newDb.getOwnerType()), context.eventOnlyReplicationSpec());
   }
+  Path metricPath = null;
+  ReplicationMetricCollector metricCollector = null;
+  try{
+metricPath = ReplUtils.getMetricPath(context, context.hiveConf);

Review comment:
   hiveConf has default access in Context, can't be accessed by ReplUtils.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495367)
Time Spent: 2h 20m  (was: 2h 10m)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24197) Check for write transactions for the db under replication at a frequent interval

2020-10-05 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24197:
---
Attachment: HIVE-24197.04.patch
Status: Patch Available  (was: In Progress)

> Check for write transactions for the db under replication at a frequent 
> interval
> 
>
> Key: HIVE-24197
> URL: https://issues.apache.org/jira/browse/HIVE-24197
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-24197.01.patch, HIVE-24197.02.patch, 
> HIVE-24197.03.patch, HIVE-24197.04.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24197) Check for write transactions for the db under replication at a frequent interval

2020-10-05 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24197:
---
Status: In Progress  (was: Patch Available)

> Check for write transactions for the db under replication at a frequent 
> interval
> 
>
> Key: HIVE-24197
> URL: https://issues.apache.org/jira/browse/HIVE-24197
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-24197.01.patch, HIVE-24197.02.patch, 
> HIVE-24197.03.patch, HIVE-24197.04.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495366
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:40
Start Date: 05/Oct/20 14:40
Worklog Time Spent: 10m 
  Work Description: vpnvishv commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499649919



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
##
@@ -237,6 +237,7 @@ void run(HiveConf conf, String jobName, Table t, Partition 
p, StorageDescriptor
 }
 
 JobConf job = createBaseJobConf(conf, jobName, t, sd, writeIds, ci);
+QueryCompactor.Util.removeAbortedDirsForAcidTable(conf, dir);

Review comment:
   @pvargacl You are right, this is not required, as now compactor run in a 
transaction and the cleaner has validTxnList with aborted bits set. This we 
have added wrt to Hive-3, in which cleaner doesn't have aborted bits set, as we 
create validWriteIdList for cleaner based on the highestWriteId.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495366)
Time Spent: 4h  (was: 3h 50m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-10-05 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208111#comment-17208111
 ] 

Attila Magyar commented on HIVE-24230:
--

cc: [~kgyrtkirk]

> Integrate HPL/SQL into HiveServer2
> --
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> HPL/SQL is a standalone command line program that can store and load scripts 
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up 
> some possibilities which are currently not feasable to implement. For example 
> one might want to use a third party SQL tool to run selects on stored 
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
> work with the current architecture.
> Another important factor is performance. Declarative SQL commands are sent to 
> Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
> and use HiveSever’s internal API for compilation and execution.
> The third factor is that existing tools like Beeline or Hue cannot be used 
> with HPL/SQL since it has its own, separated CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-10-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24230:



> Integrate HPL/SQL into HiveServer2
> --
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> HPL/SQL is a standalone command line program that can store and load scripts 
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up 
> some possibilities which are currently not feasable to implement. For example 
> one might want to use a third party SQL tool to run selects on stored 
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
> work with the current architecture.
> Another important factor is performance. Declarative SQL commands are sent to 
> Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
> and use HiveSever’s internal API for compilation and execution.
> The third factor is that existing tools like Beeline or Hue cannot be used 
> with HPL/SQL since it has its own, separated CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=495360&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495360
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:33
Start Date: 05/Oct/20 14:33
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r499644273



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -1549,6 +1549,83 @@
 
   
 
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+

Review comment:
   I changed it to CLOB, that is already used at multiple places.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495360)
Time Spent: 1.5h  (was: 1h 20m)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495358
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:27
Start Date: 05/Oct/20 14:27
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499639831



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {
+  return true;
+} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) {
+  return true;
+}
+  }
+  return false;
+};
+List deleted = new ArrayList<>();
+deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted);
+return deleted;
+  }
+
+  private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, 
PathFilter filter, List deleted)
+  throws IOException {
+RemoteIterator it = listIterator(fs, root, null);
+
+while (it.hasNext()) {
+  FileStatus fStatus = it.next();
+  if (fStatus.isDirectory()) {
+if (filter.accept(fStatus.getPath())) {
+  fs.delete(fStatus.getPath(), true);
+  deleted.add(fStatus);
+} else {
+  deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted);
+  if (isDirectoryEmpty(fs, fStatus.getPath())) {

Review comment:
   agree, that would simplify re-use of getHdfsDirSnapshots





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495358)
Time Spent: 3h 50m  (was: 3h 40m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495356
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:25
Start Date: 05/Oct/20 14:25
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499623864



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {

Review comment:
   Why would it read the aborted data as valid if txn is in still in 
aborted state? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495356)
Time Spent: 3h 40m  (was: 3.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495355
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:25
Start Date: 05/Oct/20 14:25
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499638418



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -140,7 +142,23 @@ public int execute() {
 }
   });
 } catch (Exception e) {
-  throw new 
SecurityException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e);

Review comment:
   This check is being done in the following lines.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495355)
Time Spent: 2h 10m  (was: 2h)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24229) DirectSql fails in case of OracleDB

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24229:
--
Labels: pull-request-available  (was: )

> DirectSql fails in case of OracleDB
> ---
>
> Key: HIVE-24229
> URL: https://issues.apache.org/jira/browse/HIVE-24229
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Direct Sql fails due to different data type mapping incase of Oracle DB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24229) DirectSql fails in case of OracleDB

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24229?focusedWorklogId=495351&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495351
 ]

ASF GitHub Bot logged work on HIVE-24229:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:15
Start Date: 05/Oct/20 14:15
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #1552:
URL: https://github.com/apache/hive/pull/1552


   https://issues.apache.org/jira/browse/HIVE-24229
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495351)
Remaining Estimate: 0h
Time Spent: 10m

> DirectSql fails in case of OracleDB
> ---
>
> Key: HIVE-24229
> URL: https://issues.apache.org/jira/browse/HIVE-24229
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Direct Sql fails due to different data type mapping incase of Oracle DB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=495349&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495349
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:14
Start Date: 05/Oct/20 14:14
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498902147



##
File path: 
ql/src/test/queries/clientpositive/alter_numbuckets_partitioned_table_h23.q
##
@@ -52,6 +52,12 @@ alter table tst1_n1 clustered by (value) into 12 buckets;
 
 describe formatted tst1_n1;
 
+-- Test changing name of bucket column
+
+alter table tst1_n1 change key keys string;
+
+describe formatted tst1_n1;

Review comment:
   after adding show create table test start failing. because result expect 
"### masked information ". which lead to multiple test failure. The 
information shown by show create table is same as describe table as we are only 
change column name. Hence after multiple retry I decide to remove it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495349)
Time Spent: 2h 50m  (was: 2h 40m)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495341&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495341
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:05
Start Date: 05/Oct/20 14:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499623864



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {

Review comment:
   Why would it read the aborted data as valid if txn is in aborted state? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495341)
Time Spent: 3.5h  (was: 3h 20m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495338&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495338
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 14:00
Start Date: 05/Oct/20 14:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499620518



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {
+  return true;
+} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) {
+  return true;
+}
+  }
+  return false;
+};
+List deleted = new ArrayList<>();
+deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted);
+return deleted;
+  }
+
+  private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, 
PathFilter filter, List deleted)

Review comment:
   getHdfsDirSnapshots does the same recursive listing, isn't it?
   ```
   RemoteIterator itr = fs.listFiles(path, true);
   while (itr.hasNext()) {
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495338)
Time Spent: 3h 20m  (was: 3h 10m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24229) DirectSql fails in case of OracleDB

2020-10-05 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-24229:
---


> DirectSql fails in case of OracleDB
> ---
>
> Key: HIVE-24229
> URL: https://issues.apache.org/jira/browse/HIVE-24229
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>
> Direct Sql fails due to different data type mapping incase of Oracle DB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495288
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:59
Start Date: 05/Oct/20 11:59
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499545127



##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
##
@@ -2128,6 +2128,395 @@ public void testCleanerForTxnToWriteId() throws 
Exception {
 0, TxnDbUtil.countQueryAgent(hiveConf, "select count(*) from 
TXN_TO_WRITE_ID"));
   }
 
+  @Test
+public void testMmTableAbortWithCompaction() throws Exception {
+// 1. Insert some rows into MM table
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)");
+// There should be 1 delta directory
+int [][] resultData1 =  new int[][] {{1,2}};
+verifyDeltaDirAndResult(1, Table.MMTBL.toString(), "", resultData1);
+List r1 = runStatementOnDriver("select count(*) from " + 
Table.MMTBL);
+Assert.assertEquals("1", r1.get(0));
+
+// 2. Let a transaction be aborted
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true);
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(3,4)");
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false);
+// There should be 1 delta and 1 base directory. The base one is the 
aborted one.
+verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1);
+
+r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL);
+Assert.assertEquals("1", r1.get(0));
+
+// Verify query result
+int [][] resultData2 = new int[][] {{1,2}, {5,6}};
+
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)");
+verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData2);
+r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL);
+Assert.assertEquals("2", r1.get(0));
+
+// 4. Perform a MINOR compaction, expectation is it should remove aborted 
base dir
+runStatementOnDriver("alter table "+ Table.MMTBL + " compact 'MINOR'");
+// The worker should remove the subdir for aborted transaction
+runWorker(hiveConf);
+verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData2);
+verifyBaseDirAndResult(0, Table.MMTBL.toString(), "", resultData2);
+// 5. Run Cleaner. Shouldn't impact anything.
+runCleaner(hiveConf);
+// 6. Run initiator remove aborted entry from TXNS table
+runInitiator(hiveConf);
+
+// Verify query result
+List rs = runStatementOnDriver("select a,b from " + Table.MMTBL + 
" order by a");
+Assert.assertEquals(stringifyValues(resultData2), rs);
+
+int [][] resultData3 = new int[][] {{1,2}, {5,6}, {7,8}};
+// 7. add few more rows
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(7,8)");
+// 8. add one more aborted delta
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true);
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(9,10)");
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false);
+
+// 9. Perform a MAJOR compaction, expectation is it should remove aborted 
base dir
+runStatementOnDriver("alter table "+ Table.MMTBL + " compact 'MAJOR'");
+verifyDeltaDirAndResult(4, Table.MMTBL.toString(), "", resultData3);
+runWorker(hiveConf);
+verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData3);
+verifyBaseDirAndResult(1, Table.MMTBL.toString(), "", resultData3);
+runCleaner(hiveConf);
+verifyDeltaDirAndResult(0, Table.MMTBL.toString(), "", resultData3);
+verifyBaseDirAndResult(1, Table.MMTBL.toString(), "", resultData3);
+runInitiator(hiveConf);
+verifyDeltaDirAndResult(0, Table.MMTBL.toString(), "", resultData3);
+verifyBaseDirAndResult(1, Table.MMTBL.toString(), "", resultData3);
+
+// Verify query result
+rs = runStatementOnDriver("select a,b from " + Table.MMTBL + " order by 
a");
+Assert.assertEquals(stringifyValues(resultData3), rs);
+  }
+  @Test
+  public void testMmTableAbortWithCompactionNoCleanup() throws Exception {
+// 1. Insert some rows into MM table
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)");
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)");
+// There should be 1 delta directory
+int [][] resultData1 =  new int[][] {{1,2}, {5,6}};
+verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1);
+List r1 = runStatementOnDriver("select count(*) from " + 
Table.MMTBL);
+Assert.assertEquals("2", r1.get(0));
+
+// 2. Let a transaction be aborted
+hiveConf.set

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495289
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:59
Start Date: 05/Oct/20 11:59
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #1548:
URL: https://github.com/apache/hive/pull/1548#issuecomment-703585092


   > @deniskuzZ Overall change LGTM.
   > Looked into the test failures, one of test requires change in expected 
values wrt master branch. Other two looks genuine failures to me. Please check 
the inline comments.
   
   @vpnvishv, thank you for the review! 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495289)
Time Spent: 3h 10m  (was: 3h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495287&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495287
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:58
Start Date: 05/Oct/20 11:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499544579



##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
##
@@ -2128,6 +2128,395 @@ public void testCleanerForTxnToWriteId() throws 
Exception {
 0, TxnDbUtil.countQueryAgent(hiveConf, "select count(*) from 
TXN_TO_WRITE_ID"));
   }
 
+  @Test
+public void testMmTableAbortWithCompaction() throws Exception {
+// 1. Insert some rows into MM table
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)");
+// There should be 1 delta directory
+int [][] resultData1 =  new int[][] {{1,2}};
+verifyDeltaDirAndResult(1, Table.MMTBL.toString(), "", resultData1);
+List r1 = runStatementOnDriver("select count(*) from " + 
Table.MMTBL);
+Assert.assertEquals("1", r1.get(0));
+
+// 2. Let a transaction be aborted
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true);
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(3,4)");
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false);
+// There should be 1 delta and 1 base directory. The base one is the 
aborted one.
+verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1);
+
+r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL);
+Assert.assertEquals("1", r1.get(0));
+
+// Verify query result
+int [][] resultData2 = new int[][] {{1,2}, {5,6}};
+
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)");
+verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData2);
+r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL);
+Assert.assertEquals("2", r1.get(0));

Review comment:
   fixed, turned off StatsOptimizer

##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
##
@@ -2128,6 +2128,395 @@ public void testCleanerForTxnToWriteId() throws 
Exception {
 0, TxnDbUtil.countQueryAgent(hiveConf, "select count(*) from 
TXN_TO_WRITE_ID"));
   }
 
+  @Test
+public void testMmTableAbortWithCompaction() throws Exception {
+// 1. Insert some rows into MM table
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(1,2)");
+// There should be 1 delta directory
+int [][] resultData1 =  new int[][] {{1,2}};
+verifyDeltaDirAndResult(1, Table.MMTBL.toString(), "", resultData1);
+List r1 = runStatementOnDriver("select count(*) from " + 
Table.MMTBL);
+Assert.assertEquals("1", r1.get(0));
+
+// 2. Let a transaction be aborted
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true);
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(3,4)");
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, false);
+// There should be 1 delta and 1 base directory. The base one is the 
aborted one.
+verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData1);
+
+r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL);
+Assert.assertEquals("1", r1.get(0));
+
+// Verify query result
+int [][] resultData2 = new int[][] {{1,2}, {5,6}};
+
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(5,6)");
+verifyDeltaDirAndResult(3, Table.MMTBL.toString(), "", resultData2);
+r1 = runStatementOnDriver("select count(*) from " + Table.MMTBL);
+Assert.assertEquals("2", r1.get(0));
+
+// 4. Perform a MINOR compaction, expectation is it should remove aborted 
base dir
+runStatementOnDriver("alter table "+ Table.MMTBL + " compact 'MINOR'");
+// The worker should remove the subdir for aborted transaction
+runWorker(hiveConf);
+verifyDeltaDirAndResult(2, Table.MMTBL.toString(), "", resultData2);
+verifyBaseDirAndResult(0, Table.MMTBL.toString(), "", resultData2);
+// 5. Run Cleaner. Shouldn't impact anything.
+runCleaner(hiveConf);
+// 6. Run initiator remove aborted entry from TXNS table
+runInitiator(hiveConf);
+
+// Verify query result
+List rs = runStatementOnDriver("select a,b from " + Table.MMTBL + 
" order by a");
+Assert.assertEquals(stringifyValues(resultData2), rs);
+
+int [][] resultData3 = new int[][] {{1,2}, {5,6}, {7,8}};
+// 7. add few more rows
+runStatementOnDriver("insert into " + Table.MMTBL + "(a,b) values(7,8)");
+// 8. add one more aborted delta
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVETESTMODEROLLBACKTXN, true);
+runSt

[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495283&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495283
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:50
Start Date: 05/Oct/20 11:50
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499539993



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -280,6 +299,49 @@ public static PathFilter getBootstrapDirectoryFilter(final 
FileSystem fs) {
 };
   }
 
+  public static Path getMetricPath(MessageHandler.Context context, HiveConf 
hiveConf) throws Exception{
+DumpType dumpType;
+Path metricPath = null;
+String dumpMetaFile = DumpMetaData.getDmdFileName();
+FileSystem fs = null;
+if(context.dmd != null) {
+  dumpType = context.dmd.getDumpType();
+  fs = context.dmd.getDumpFilePath().getFileSystem(hiveConf);
+  metricPath = context.dmd.getDumpFilePath().getParent();
+}
+else {
+  dumpType = null;
+  if(context.location != null){
+metricPath = (new Path(context.location)).getParent();
+fs = (new Path(context.location)).getFileSystem(hiveConf);
+  }
+}
+//traverse to hiveDumpRoot required by metric-collector
+while (metricPath != null && fs != null && dumpType != DumpType.BOOTSTRAP 
&& dumpType != DumpType.INCREMENTAL) {
+  metricPath = metricPath.getParent();
+  if (fs.exists(new Path(metricPath, dumpMetaFile))) {
+dumpType = (new DumpMetaData(metricPath, hiveConf)).getDumpType();
+  }
+}
+return metricPath;
+  }
+
+  public static ReplicationMetricCollector 
getMetricCollector(MessageHandler.Context context, String dbName,
+   Path metricPath, 
HiveConf hiveConf) throws Exception {
+if (metricPath != null) {
+  DumpType dumpType = (new DumpMetaData(metricPath, 
hiveConf)).getDumpType();
+  //for using this, dumpType should be either INCREMENTAL or BOOTSTRAP.
+  if (dumpType == DumpType.BOOTSTRAP) {
+return new BootstrapLoadMetricCollector(dbName, metricPath.toString(),
+context.dmd.getDumpExecutionId(), hiveConf);

Review comment:
   you can pass just the DumpExecutionId. no need to pass the entire context





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495283)
Time Spent: 1h 50m  (was: 1h 40m)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495284
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:50
Start Date: 05/Oct/20 11:50
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499540414



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -280,6 +299,49 @@ public static PathFilter getBootstrapDirectoryFilter(final 
FileSystem fs) {
 };
   }
 
+  public static Path getMetricPath(MessageHandler.Context context, HiveConf 
hiveConf) throws Exception{
+DumpType dumpType;
+Path metricPath = null;
+String dumpMetaFile = DumpMetaData.getDmdFileName();
+FileSystem fs = null;
+if(context.dmd != null) {
+  dumpType = context.dmd.getDumpType();
+  fs = context.dmd.getDumpFilePath().getFileSystem(hiveConf);
+  metricPath = context.dmd.getDumpFilePath().getParent();
+}
+else {
+  dumpType = null;
+  if(context.location != null){
+metricPath = (new Path(context.location)).getParent();
+fs = (new Path(context.location)).getFileSystem(hiveConf);
+  }
+}
+//traverse to hiveDumpRoot required by metric-collector
+while (metricPath != null && fs != null && dumpType != DumpType.BOOTSTRAP 
&& dumpType != DumpType.INCREMENTAL) {

Review comment:
   this may be error prone. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495284)
Time Spent: 2h  (was: 1h 50m)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20137) Truncate for Transactional tables should use base_x

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20137?focusedWorklogId=495282&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495282
 ]

ASF GitHub Bot logged work on HIVE-20137:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:49
Start Date: 05/Oct/20 11:49
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1532:
URL: https://github.com/apache/hive/pull/1532#discussion_r499536263



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2999,6 +2980,10 @@ Seems much cleaner if each stmt is identified as a 
particular HiveOperation (whi
 compBuilder.setExclusive();
 compBuilder.setOperationType(DataOperationType.NO_TXN);
 break;
+  case DDL_EXCL_WRITE:
+compBuilder.setExclWrite();

Review comment:
   ExclWrite is gonna block concurrent reads. Is it expected?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495282)
Time Spent: 1h  (was: 50m)

> Truncate for Transactional tables should use base_x
> ---
>
> Key: HIVE-20137
> URL: https://issues.apache.org/jira/browse/HIVE-20137
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is a follow up to HIVE-19387.
> Once we have a lock that blocks writers but not readers (HIVE-19369), it 
> would make sense to make truncate create a new base_x, where is x is a 
> writeId in current txn - the same as Insert Overwrite does.
> This would mean it can work w/o interfering with existing writers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495281
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:41
Start Date: 05/Oct/20 11:41
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499535302



##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ReplTxnWork.java
##
@@ -92,6 +120,18 @@ public ReplTxnWork(String dbName, String tableName, 
List partNames,
 this.operation = type;
   }
 
+  public ReplTxnWork(String dbName, String tableName, List partNames,

Review comment:
   have 2 constructors. one with the dumpDirectory and metricCollector and 
one without. That way you don't need to change existing code





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495281)
Time Spent: 1h 40m  (was: 1.5h)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495280
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:38
Start Date: 05/Oct/20 11:38
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499533749



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/AlterDatabaseHandler.java
##
@@ -77,9 +79,22 @@
 alterDbDesc = new AlterDatabaseSetOwnerDesc(actualDbName, new 
PrincipalDesc(newDb.getOwnerName(),
 newDb.getOwnerType()), context.eventOnlyReplicationSpec());
   }
+  Path metricPath = null;
+  ReplicationMetricCollector metricCollector = null;
+  try{
+metricPath = ReplUtils.getMetricPath(context, context.hiveConf);

Review comment:
   you are passing the context already. hiveconf is part of that





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495280)
Time Spent: 1.5h  (was: 1h 20m)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495277&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495277
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:30
Start Date: 05/Oct/20 11:30
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499529820



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
##
@@ -355,8 +360,32 @@ public int execute() {
 } catch (Exception e) {
   setException(e);
   LOG.info("Failed to persist stats in metastore", e);
+  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  try {
+ReplicationMetricCollector metricCollector = work.getMetricCollector();
+if (errorCode > 4) {

Review comment:
   Same applies to all the tasks





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495277)
Time Spent: 1h 20m  (was: 1h 10m)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495276&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495276
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:30
Start Date: 05/Oct/20 11:30
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499529679



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -242,14 +250,25 @@ public static String getNonEmpty(String configParam, 
HiveConf hiveConf, String e
 return taskList;
   }
 
+  public static List> addTasksForLoadingColStats(ColumnStatistics 
colStats,
+ HiveConf conf,
+ 
UpdatedMetaDataTracker updatedMetadata,
+ 
org.apache.hadoop.hive.metastore.api.Table tableObj,
+ long writeId) throws 
IOException, TException{
+return addTasksForLoadingColStats(colStats, conf, updatedMetadata, 
tableObj,

Review comment:
   Same applies to other places as well





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495276)
Time Spent: 1h 10m  (was: 1h)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495272&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495272
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:25
Start Date: 05/Oct/20 11:25
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499526784



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -242,14 +250,25 @@ public static String getNonEmpty(String configParam, 
HiveConf hiveConf, String e
 return taskList;
   }
 
+  public static List> addTasksForLoadingColStats(ColumnStatistics 
colStats,
+ HiveConf conf,
+ 
UpdatedMetaDataTracker updatedMetadata,
+ 
org.apache.hadoop.hive.metastore.api.Table tableObj,
+ long writeId) throws 
IOException, TException{
+return addTasksForLoadingColStats(colStats, conf, updatedMetadata, 
tableObj,

Review comment:
   create a overloaded method. Needn't pass null 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495272)
Time Spent: 1h  (was: 50m)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495270&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495270
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:17
Start Date: 05/Oct/20 11:17
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499522906



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -140,7 +142,23 @@ public int execute() {
 }
   });
 } catch (Exception e) {
-  throw new 
SecurityException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e);

Review comment:
   need to check why this task was throwing the exception initially





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495270)
Time Spent: 50m  (was: 40m)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495268&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495268
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:15
Start Date: 05/Oct/20 11:15
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499521480



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CopyTask.java
##
@@ -103,7 +108,32 @@ protected int copyOnePath(Path fromPath, Path toPath) {
 } catch (Exception e) {
   console.printError("Failed with exception " + e.getMessage(), "\n"
   + StringUtils.stringifyException(e));
-  return (1);
+  LOG.error("CopyTask failed", e);

Review comment:
   exception is not set at the task level





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495268)
Time Spent: 40m  (was: 0.5h)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495267&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495267
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:14
Start Date: 05/Oct/20 11:14
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499521480



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/CopyTask.java
##
@@ -103,7 +108,32 @@ protected int copyOnePath(Path fromPath, Path toPath) {
 } catch (Exception e) {
   console.printError("Failed with exception " + e.getMessage(), "\n"
   + StringUtils.stringifyException(e));
-  return (1);
+  LOG.error("CopyTask failed", e);

Review comment:
   exception is not set





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495267)
Time Spent: 0.5h  (was: 20m)

> sys.replication_metrics table shows incorrect status for failed policies
> 
>
> Key: HIVE-24227
> URL: https://issues.apache.org/jira/browse/HIVE-24227
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24227) sys.replication_metrics table shows incorrect status for failed policies

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24227?focusedWorklogId=495266&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495266
 ]

ASF GitHub Bot logged work on HIVE-24227:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 11:13
Start Date: 05/Oct/20 11:13
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1550:
URL: https://github.com/apache/hive/pull/1550#discussion_r499519404



##
File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLTask.java
##
@@ -82,8 +89,32 @@ public int execute() {
 throw new IllegalArgumentException("Unknown DDL request: " + 
ddlDesc.getClass());
   }
 } catch (Throwable e) {
+  LOG.error("DDLTask failed", e);
+  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  try {
+ReplicationMetricCollector metricCollector = work.getMetricCollector();
+if (errorCode > 4) {
+  //in case of replication related task, dumpDirectory should not be 
null
+  if(work.dumpDirectory != null) {
+Path nonRecoverableMarker = new Path(work.dumpDirectory, 
ReplAck.NON_RECOVERABLE_MARKER.toString());
+org.apache.hadoop.hive.ql.parse.repl.dump.Utils.writeStackTrace(e, 
nonRecoverableMarker, conf);
+if(metricCollector != null){
+  metricCollector.reportStageEnd(getName(), Status.FAILED_ADMIN, 
nonRecoverableMarker.toString());
+}
+  }
+  if(metricCollector != null){

Review comment:
   this is needed only in replication case
   

##
File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLTask.java
##
@@ -82,8 +89,32 @@ public int execute() {
 throw new IllegalArgumentException("Unknown DDL request: " + 
ddlDesc.getClass());
   }
 } catch (Throwable e) {
+  LOG.error("DDLTask failed", e);
+  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  try {
+ReplicationMetricCollector metricCollector = work.getMetricCollector();
+if (errorCode > 4) {
+  //in case of replication related task, dumpDirectory should not be 
null
+  if(work.dumpDirectory != null) {
+Path nonRecoverableMarker = new Path(work.dumpDirectory, 
ReplAck.NON_RECOVERABLE_MARKER.toString());
+org.apache.hadoop.hive.ql.parse.repl.dump.Utils.writeStackTrace(e, 
nonRecoverableMarker, conf);
+if(metricCollector != null){
+  metricCollector.reportStageEnd(getName(), Status.FAILED_ADMIN, 
nonRecoverableMarker.toString());
+}
+  }
+  if(metricCollector != null){
+metricCollector.reportStageEnd(getName(), Status.FAILED_ADMIN, 
null);
+  }
+} else {
+  if(metricCollector != null){
+work.getMetricCollector().reportStageEnd(getName(), Status.FAILED);

Review comment:
   use metricCollector directly

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
##
@@ -355,8 +360,32 @@ public int execute() {
 } catch (Exception e) {
   setException(e);
   LOG.info("Failed to persist stats in metastore", e);
+  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  try {
+ReplicationMetricCollector metricCollector = work.getMetricCollector();
+if (errorCode > 4) {

Review comment:
   All this code can be part of a util method.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -464,14 +468,63 @@ public int execute() {
   console.printInfo("\n", StringUtils.stringifyException(he),false);
 }
   }
-
   setException(he);
+  LOG.error("MoveTask failed", he);
+  errorCode = ErrorMsg.getErrorMsg(he.getMessage()).getErrorCode();
+  try {
+ReplicationMetricCollector metricCollector = work.getMetricCollector();

Review comment:
   util method

##
File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLTask.java
##
@@ -82,8 +89,32 @@ public int execute() {
 throw new IllegalArgumentException("Unknown DDL request: " + 
ddlDesc.getClass());
   }
 } catch (Throwable e) {
+  LOG.error("DDLTask failed", e);

Review comment:
   print the DDL operation too. DDL task can be called for different 
operation

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
##
@@ -355,8 +360,32 @@ public int execute() {
 } catch (Exception e) {
   setException(e);
   LOG.info("Failed to persist stats in metastore", e);
+  int errorCode = ErrorMsg.getErrorMsg(e.getMessage()).getErrorCode();
+  try {
+ReplicationMetricCollector metricCollector = work.getMetricCollector();
+if (errorCode > 4) {
+  //in case of repl

[jira] [Commented] (HIVE-24205) Optimise CuckooSetBytes

2020-10-05 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207989#comment-17207989
 ] 

Rajesh Balamohan commented on HIVE-24205:
-

Thanks [~mustafaiman]. With repeated runs (i.e without any data miss), I see 
around 9-10% improvement with the PR. This is based on a small 5 node LLAP 
cluster with TPCH12 (43.82 seconds vs 39.01 seconds). 

Tried with "select count(*) from lineitem where l_shipmode in ('REG AIR', 
'MAIL');" which showed much better improvement with and without PR (10.94 
seconds vs 8.49 seconds).

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495235
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 09:49
Start Date: 05/Oct/20 09:49
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499475796



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -107,11 +107,12 @@ public CompactionTxnHandler() {
 // Check for aborted txns: number of aborted txns past threshold and 
age of aborted txns
 // past time threshold
 boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
-final String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\","
-+ "MIN(\"TXN_STARTED\"), COUNT(*)"
+String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\", "
++ "MIN(\"TXN_STARTED\"), COUNT(*), "
++ "MAX(CASE WHEN \"TC_OPERATION_TYPE\" = " + OperationType.DYNPART 
+ " THEN 1 ELSE 0 END) AS \"IS_DP\" "

Review comment:
   I might be mistaken here, but does this mean, that if we have many 
"normal" aborted txn and 1 aborted dynpart txn, we will not initiate a normal 
compaction until the dynpart stuff is not cleaned up? Is this ok, shouldn't we 
doing both?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495235)
Time Spent: 2h 40m  (was: 2.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495226&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495226
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 09:20
Start Date: 05/Oct/20 09:20
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499457494



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
##
@@ -237,6 +237,7 @@ void run(HiveConf conf, String jobName, Table t, Partition 
p, StorageDescriptor
 }
 
 JobConf job = createBaseJobConf(conf, jobName, t, sd, writeIds, ci);
+QueryCompactor.Util.removeAbortedDirsForAcidTable(conf, dir);

Review comment:
   @vpnvishv Why do we do this here? I understand we can, but why don't we 
let the Cleaner to delete the files? This just makes the compactor slower. Do 
we have a functionality reason for this?
   After this change it will run in CompactorMR and in MMQueryCompactors, but 
not in normal QueryCompactors?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495226)
Time Spent: 2.5h  (was: 2h 20m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495221
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 09:08
Start Date: 05/Oct/20 09:08
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499449636



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -97,9 +100,9 @@ public void run() {
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
   LOG.info("Cleaning based on min open txn id: " + minOpenTxnId);
   List cleanerList = new ArrayList<>();
-  for(CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
+  for (CompactionInfo compactionInfo : txnHandler.findReadyToClean()) {
 
cleanerList.add(CompletableFuture.runAsync(CompactorUtil.ThrowingRunnable.unchecked(()
 ->
-clean(compactionInfo, minOpenTxnId)), cleanerExecutor));
+  clean(compactionInfo, minOpenTxnId)), cleanerExecutor));

Review comment:
   Two questions here:
   1. In the original Jira there was discussion about not allowing concurrent 
cleanings of the same stuff (partition / table). Should we worry about this?
   2. The slow cleanAborted will clog the executor service, we should do 
something about this, either in this patch, or follow up something like 
https://issues.apache.org/jira/browse/HIVE-21150 immediately after this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495221)
Time Spent: 2h 20m  (was: 2h 10m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24193) Select query on renamed hive acid table does not produce any output

2020-10-05 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24193.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~Rajkumar Singh] for fixing this ; and Peter for 
reviewing the changes!

> Select query on renamed hive acid table does not produce any output
> ---
>
> Key: HIVE-24193
> URL: https://issues.apache.org/jira/browse/HIVE-24193
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During onRename, HMS update COMPLETED_TXN_COMPONENTS which fail with 
> CTC_DATABASE column does not exist, upon investigation I found that enclosing 
> quotes are missing for columns thats db query fail with this exception
> Steps to repro:
> 1. create table test(id int);
> 2. insert into table test values(1);
> 3. alter table test rename to test1;
> 3. select * from test1 produce no output



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24193) Select query on renamed hive acid table does not produce any output

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24193:
--
Labels: pull-request-available  (was: )

> Select query on renamed hive acid table does not produce any output
> ---
>
> Key: HIVE-24193
> URL: https://issues.apache.org/jira/browse/HIVE-24193
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During onRename, HMS update COMPLETED_TXN_COMPONENTS which fail with 
> CTC_DATABASE column does not exist, upon investigation I found that enclosing 
> quotes are missing for columns thats db query fail with this exception
> Steps to repro:
> 1. create table test(id int);
> 2. insert into table test values(1);
> 3. alter table test rename to test1;
> 3. select * from test1 produce no output



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24193) Select query on renamed hive acid table does not produce any output

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24193?focusedWorklogId=495207&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495207
 ]

ASF GitHub Bot logged work on HIVE-24193:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 08:02
Start Date: 05/Oct/20 08:02
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1520:
URL: https://github.com/apache/hive/pull/1520


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495207)
Remaining Estimate: 0h
Time Spent: 10m

> Select query on renamed hive acid table does not produce any output
> ---
>
> Key: HIVE-24193
> URL: https://issues.apache.org/jira/browse/HIVE-24193
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During onRename, HMS update COMPLETED_TXN_COMPONENTS which fail with 
> CTC_DATABASE column does not exist, upon investigation I found that enclosing 
> quotes are missing for columns thats db query fail with this exception
> Steps to repro:
> 1. create table test(id int);
> 2. insert into table test values(1);
> 3. alter table test rename to test1;
> 3. select * from test1 produce no output



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495204&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495204
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 07:56
Start Date: 05/Oct/20 07:56
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499405728



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {
+  return true;
+} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) {
+  return true;
+}
+  }
+  return false;
+};
+List deleted = new ArrayList<>();
+deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted);
+return deleted;
+  }
+
+  private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, 
PathFilter filter, List deleted)
+  throws IOException {
+RemoteIterator it = listIterator(fs, root, null);
+
+while (it.hasNext()) {
+  FileStatus fStatus = it.next();
+  if (fStatus.isDirectory()) {
+if (filter.accept(fStatus.getPath())) {
+  fs.delete(fStatus.getPath(), true);
+  deleted.add(fStatus);
+} else {
+  deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted);
+  if (isDirectoryEmpty(fs, fStatus.getPath())) {
+fs.delete(fStatus.getPath(), false);
+deleted.add(fStatus);
+  }
+}
+  }
+}
+  }
+
+  private static boolean isDirectoryEmpty(FileSystem fs, Path path) throws 
IOException {
+RemoteIterator it = listIterator(fs, path, null);
+return !it.hasNext();
+  }
+
+  private static RemoteIterator listIterator(FileSystem fs, Path 
path, PathFilter filter)
+  throws IOException {
+try {
+  return new ToFileStatusIterator(SHIMS.listLocatedHdfsStatusIterator(fs, 
path, filter));
+} catch (Throwable t) {

Review comment:
   This should be similar to tryListLocatedHdfsStatus don't catch all 
Throwable. And maybe add all this to the HdfsUtils class





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495204)
Time Spent: 2h 10m  (was: 2h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495203&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495203
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 07:56
Start Date: 05/Oct/20 07:56
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499405728



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {
+  return true;
+} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) {
+  return true;
+}
+  }
+  return false;
+};
+List deleted = new ArrayList<>();
+deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted);
+return deleted;
+  }
+
+  private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, 
PathFilter filter, List deleted)
+  throws IOException {
+RemoteIterator it = listIterator(fs, root, null);
+
+while (it.hasNext()) {
+  FileStatus fStatus = it.next();
+  if (fStatus.isDirectory()) {
+if (filter.accept(fStatus.getPath())) {
+  fs.delete(fStatus.getPath(), true);
+  deleted.add(fStatus);
+} else {
+  deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted);
+  if (isDirectoryEmpty(fs, fStatus.getPath())) {
+fs.delete(fStatus.getPath(), false);
+deleted.add(fStatus);
+  }
+}
+  }
+}
+  }
+
+  private static boolean isDirectoryEmpty(FileSystem fs, Path path) throws 
IOException {
+RemoteIterator it = listIterator(fs, path, null);
+return !it.hasNext();
+  }
+
+  private static RemoteIterator listIterator(FileSystem fs, Path 
path, PathFilter filter)
+  throws IOException {
+try {
+  return new ToFileStatusIterator(SHIMS.listLocatedHdfsStatusIterator(fs, 
path, filter));
+} catch (Throwable t) {

Review comment:
   This should be similar to tryListLocatedHdfsStatus don't catch all 
Throwable





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495203)
Time Spent: 2h  (was: 1h 50m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds 

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495202&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495202
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 07:53
Start Date: 05/Oct/20 07:53
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499404466



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {
+  return true;
+} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) {
+  return true;
+}
+  }
+  return false;
+};
+List deleted = new ArrayList<>();
+deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted);
+return deleted;
+  }
+
+  private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, 
PathFilter filter, List deleted)
+  throws IOException {
+RemoteIterator it = listIterator(fs, root, null);
+
+while (it.hasNext()) {
+  FileStatus fStatus = it.next();
+  if (fStatus.isDirectory()) {
+if (filter.accept(fStatus.getPath())) {
+  fs.delete(fStatus.getPath(), true);
+  deleted.add(fStatus);
+} else {
+  deleteDeltaDirectoriesAux(fStatus.getPath(), fs, filter, deleted);
+  if (isDirectoryEmpty(fs, fStatus.getPath())) {

Review comment:
   Are we doing this to delete newly created partitions if there are no 
other writes? Is this ok, what if we found a valid empty partition that is 
registered in the HMS? We should not delete that. I think this can be skipped 
all together, the empty partition dir will not bother anybody





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495202)
Time Spent: 1h 50m  (was: 1h 40m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495201&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495201
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 07:50
Start Date: 05/Oct/20 07:50
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499402826



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {

Review comment:
   I am wondering are we covering all the use cases here? Is it possible 
that this dynamic part query was writing to an existing partition with existing 
older writes and a compaction was running before we managed to delete the 
aborted delta? I think in this case sadly, we still going to read the aborted 
data as valid. Could you add a test case to check if it is indeed a problem or 
not? (I do not have an idea for a solution...)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495201)
Time Spent: 1h 40m  (was: 1.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495199
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 07:44
Start Date: 05/Oct/20 07:44
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499399709



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {

Review comment:
   Why the contains "=", are we checking for a partition where the user 
named the column exactly like a valid delta dir? I don't think we should 
support that





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495199)
Time Spent: 1.5h  (was: 1h 20m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24228) Support complex types in LLAP

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24228?focusedWorklogId=495194&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495194
 ]

ASF GitHub Bot logged work on HIVE-24228:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 07:26
Start Date: 05/Oct/20 07:26
Worklog Time Spent: 10m 
  Work Description: bymm opened a new pull request #1551:
URL: https://github.com/apache/hive/pull/1551


   
   
   ### What changes were proposed in this pull request?
   
   
   The idea of this improvement is to support complex types (arrays, maps, 
structs) returned from LLAP data reader. This is useful when consuming LLAP 
data later in Spark.
   
   ### Why are the changes needed?
   
   
   When consuming data from LLAP, it should support all Hive types including 
the complex ones.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   It was tested on the tables with complex types. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495194)
Remaining Estimate: 0h
Time Spent: 10m

> Support complex types in LLAP
> -
>
> Key: HIVE-24228
> URL: https://issues.apache.org/jira/browse/HIVE-24228
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Yuriy Baltovskyy
>Assignee: Yuriy Baltovskyy
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The idea of this improvement is to support complex types (arrays, maps, 
> structs) returned from LLAP data reader. This is useful when consuming LLAP 
> data later in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24228) Support complex types in LLAP

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24228:
--
Labels: pull-request-available  (was: )

> Support complex types in LLAP
> -
>
> Key: HIVE-24228
> URL: https://issues.apache.org/jira/browse/HIVE-24228
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Yuriy Baltovskyy
>Assignee: Yuriy Baltovskyy
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The idea of this improvement is to support complex types (arrays, maps, 
> structs) returned from LLAP data reader. This is useful when consuming LLAP 
> data later in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=495191&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495191
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 05/Oct/20 07:21
Start Date: 05/Oct/20 07:21
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1548:
URL: https://github.com/apache/hive/pull/1548#discussion_r499388356



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2839,6 +2848,87 @@ public static void setNonTransactional(Map tblProps) {
 tblProps.remove(hive_metastoreConstants.TABLE_TRANSACTIONAL_PROPERTIES);
   }
 
+  /**
+   * Look for delta directories matching the list of writeIds and deletes them.
+   * @param rootPartition root partition to look for the delta directories
+   * @param conf configuration
+   * @param writeIds list of writeIds to look for in the delta directories
+   * @return list of deleted directories.
+   * @throws IOException
+   */
+  public static List deleteDeltaDirectories(Path rootPartition, 
Configuration conf, Set writeIds)
+  throws IOException {
+FileSystem fs = rootPartition.getFileSystem(conf);
+
+PathFilter filter = (p) -> {
+  String name = p.getName();
+  for (Long wId : writeIds) {
+if (name.startsWith(deltaSubdir(wId, wId)) && !name.contains("=")) {
+  return true;
+} else if (name.startsWith(baseDir(wId)) && !name.contains("=")) {
+  return true;
+}
+  }
+  return false;
+};
+List deleted = new ArrayList<>();
+deleteDeltaDirectoriesAux(rootPartition, fs, filter, deleted);
+return deleted;
+  }
+
+  private static void deleteDeltaDirectoriesAux(Path root, FileSystem fs, 
PathFilter filter, List deleted)

Review comment:
   This is going issue many filesystem listing on a table with many 
partitions, that is going to be very slow on S3. I think you should consider 
changing this logic to be similar to getHdfsDirSnapshots that would do 1 
recursive listing, iterate all the files and collect the deltas that needs to 
be deleted and delete them at the end (possible concurrently)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 495191)
Time Spent: 1h 20m  (was: 1h 10m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)