[jira] [Work logged] (HIVE-24895) Add a DataCopyEnd stage in ReplStateLogTask for external table replication

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24895?focusedWorklogId=574578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574578
 ]

ASF GitHub Bot logged work on HIVE-24895:
-

Author: ASF GitHub Bot
Created on: 31/Mar/21 04:47
Start Date: 31/Mar/21 04:47
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2083:
URL: https://github.com/apache/hive/pull/2083#discussion_r604590780



##
File path: 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java
##
@@ -31,7 +31,8 @@
   RANGER_DUMP(19),
   RANGER_LOAD(20),
   ATLAS_DUMP(21),
-  ATLAS_LOAD(22);
+  ATLAS_LOAD(22),
+  COPY_LOG(23);
 
   private final int value;

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574578)
Time Spent: 50m  (was: 40m)

> Add a DataCopyEnd stage in ReplStateLogTask for external table replication
> --
>
> Key: HIVE-24895
> URL: https://issues.apache.org/jira/browse/HIVE-24895
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add a task to mark the end of external table copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24895) Add a DataCopyEnd stage in ReplStateLogTask for external table replication

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24895?focusedWorklogId=574577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574577
 ]

ASF GitHub Bot logged work on HIVE-24895:
-

Author: ASF GitHub Bot
Created on: 31/Mar/21 04:46
Start Date: 31/Mar/21 04:46
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2083:
URL: https://github.com/apache/hive/pull/2083#discussion_r604590431



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -182,6 +183,13 @@ public int execute() {
   Exception ex = new 
SecurityException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e);
   setException(ex);
   return ReplUtils.handleException(true, ex, work.getDumpDirectory(), 
work.getMetricCollector(), getName(), conf);
+} finally {
+  String jobId = conf.get("distcp.job.id", "UNAVAILABLE");
+  LOG.info("Completed DirCopyTask for source: {} to  target: {}. Took {}"

Review comment:
   Yeps for the Dir Task, As discussed have added a marker at the start as 
well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574577)
Time Spent: 40m  (was: 0.5h)

> Add a DataCopyEnd stage in ReplStateLogTask for external table replication
> --
>
> Key: HIVE-24895
> URL: https://issues.apache.org/jira/browse/HIVE-24895
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add a task to mark the end of external table copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=574559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574559
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 31/Mar/21 03:46
Start Date: 31/Mar/21 03:46
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-810736869


   @yongzhi @saihemanth-cloudera any thoughts or comments? thanks in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574559)
Time Spent: 1h  (was: 50m)

> Show operation log at webui
> ---
>
> Key: HIVE-24802
> URL: https://issues.apache.org/jira/browse/HIVE-24802
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: operationlog.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently we provide getQueryLog in HiveStatement to fetch the operation log, 
>  and the operation log would be deleted on operation closing(delay for the 
> canceled operation).  Sometimes it's would be not easy for the user(jdbc) or 
> administrators to deep into the details of the finished(failed) operation, so 
> we present the operation log on webui and keep the operation log for some 
> time for latter analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24901) Re-enable tests in TestBeeLineWithArgs

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24901?focusedWorklogId=574557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574557
 ]

ASF GitHub Bot logged work on HIVE-24901:
-

Author: ASF GitHub Bot
Created on: 31/Mar/21 03:44
Start Date: 31/Mar/21 03:44
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2087:
URL: https://github.com/apache/hive/pull/2087#issuecomment-810736345


   @jcamachor  @kgyrtkirk could you please take a look if available? thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574557)
Time Spent: 40m  (was: 0.5h)

> Re-enable tests in TestBeeLineWithArgs
> --
>
> Key: HIVE-24901
> URL: https://issues.apache.org/jira/browse/HIVE-24901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Re-enable the tests in TestBeeLineWithArgs, cause they are stable on master 
> now:
> http://ci.hive.apache.org/job/hive-flaky-check/219/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24591) Move Beeline To SLF4J Simple Logger

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24591?focusedWorklogId=574470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574470
 ]

ASF GitHub Bot logged work on HIVE-24591:
-

Author: ASF GitHub Bot
Created on: 31/Mar/21 00:16
Start Date: 31/Mar/21 00:16
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1833:
URL: https://github.com/apache/hive/pull/1833#issuecomment-810660406


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574470)
Time Spent: 2.5h  (was: 2h 20m)

> Move Beeline To SLF4J Simple Logger
> ---
>
> Key: HIVE-24591
> URL: https://issues.apache.org/jira/browse/HIVE-24591
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> To make beeline as simple as possible, move its SLF4J logger implementation 
> to SLFJ-Simple logger.  This will allow users to change the logging level 
> simply on the command line.  Currently uses must create a Log4J configuration 
> file which is way too advance/cumbersome for a data analyst that just wants 
> to use SQL (and do some minor troubleshooting)
> {code:none}
> export HADOOP_CLIENT_OPTS="-Dorg.slf4j.simpleLogger.defaultLogLevel=debug"
> beeline ...
> {code}
> http://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24606) Multi-stage materialized CTEs can lose intermediate data

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24606?focusedWorklogId=574471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574471
 ]

ASF GitHub Bot logged work on HIVE-24606:
-

Author: ASF GitHub Bot
Created on: 31/Mar/21 00:16
Start Date: 31/Mar/21 00:16
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1873:
URL: https://github.com/apache/hive/pull/1873


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574471)
Time Spent: 50m  (was: 40m)

> Multi-stage materialized CTEs can lose intermediate data
> 
>
> Key: HIVE-24606
> URL: https://issues.apache.org/jira/browse/HIVE-24606
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> With complex multi-stage CTEs, Hive can start a latter stage before its 
> previous stage finishes.
>  That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve 
> dependency between multistage materialized CTEs when a non-materialized CTE 
> cuts in.
>  
> [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>  
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It 
> means the dependency between `a1` and `a2` will be ignored and `a2` can start 
> without waiting for `a1`. As a result, the above query returns the following 
> result.
> {code:java}
> +-+
> | id  |
> +-+
> | a1  |
> | x   |
> +-+
> {code}
> For your information, I ran this test with revision = 
> 425e1ff7c054f87c4db87e77d004282d529599ae.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24915) Distribute by with sort by clause when used with constant parameter for sort produces wrong result.

2021-03-30 Thread Suprith Chandrashekharachar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311933#comment-17311933
 ] 

Suprith Chandrashekharachar commented on HIVE-24915:


[~kgyrtkirk] Could you please take a look at this one/assign it to someone who 
is familiar with the code base w.r.t the change being made?

> Distribute by with sort by clause when used with constant parameter for sort 
> produces wrong result.
> ---
>
> Key: HIVE-24915
> URL: https://issues.apache.org/jira/browse/HIVE-24915
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.4
>Reporter: Suprith Chandrashekharachar
>Assignee: Suprith Chandrashekharachar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Distribute by with sort by clause when used with constant parameter for sort 
> produces wrong result.
> Example: 
> {code:java}
>  SELECT 
> t.time,
> 'a' as const
>   FROM
> (SELECT 1591819264 as time
> UNION ALL
> SELECT 1591819265 as time) t
>   DISTRIBUTE by const
>   sort by const, t.time
> {code}
> Produces
>   
> |{color:#00}*time*{color}|{color:#00}*const*{color}|
> | NULL|{color:#00}a{color}|
> | NULL|{color:#00}a{color}|
> Instead it should produce(Hive 0.13 produces this):
> |{color:#00}*time*{color}|{color:#00}*const*{color}|
> |{color:#00}*1591819264*{color}|{color:#00}a{color}|
> |{color:#00}*1591819265*{color}|{color:#00}a{color}|
> Incorrect sort columns are used while creating ReduceSink here 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L9066]
> With constant propagation optimizer enabled, due to incorrect constant 
> operator folding, incorrect results will be produced.
>  
> More examples for incorrect behavior:
> {code:java}
>   SELECT 
> t.time,
> 'a' as const,
> t.id
>   FROM
> (SELECT 1591819264 as time, 1 as id
> UNION ALL
> SELECT 1591819265 as time, 2 as id) t
>   DISTRIBUTE by t.time
>   sort by t.time, const, t.id
> {code}
> produces
> |{color:#00}*time*{color}|{color:#00}*const*{color}|{color:#00}*id*{color}|
> |{color:#00}*1591819264*{color}|{color:#00}a{color}|NULL |
> |{color:#00}*1591819265*{color}|{color:#00}a{color}| NULL|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24895) Add a DataCopyEnd stage in ReplStateLogTask for external table replication

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24895?focusedWorklogId=574464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574464
 ]

ASF GitHub Bot logged work on HIVE-24895:
-

Author: ASF GitHub Bot
Created on: 31/Mar/21 00:01
Start Date: 31/Mar/21 00:01
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2083:
URL: https://github.com/apache/hive/pull/2083#discussion_r603191387



##
File path: 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java
##
@@ -31,7 +31,8 @@
   RANGER_DUMP(19),
   RANGER_LOAD(20),
   ATLAS_DUMP(21),
-  ATLAS_LOAD(22);
+  ATLAS_LOAD(22),
+  COPY_LOG(23);
 
   private final int value;

Review comment:
   this is a thrift gen file. Should modify the thrift file and regenrate 
these gen files

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -182,6 +183,13 @@ public int execute() {
   Exception ex = new 
SecurityException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e);
   setException(ex);
   return ReplUtils.handleException(true, ex, work.getDumpDirectory(), 
work.getMetricCollector(), getName(), conf);
+} finally {
+  String jobId = conf.get("distcp.job.id", "UNAVAILABLE");

Review comment:
   print no of retries as well?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574464)
Time Spent: 0.5h  (was: 20m)

> Add a DataCopyEnd stage in ReplStateLogTask for external table replication
> --
>
> Key: HIVE-24895
> URL: https://issues.apache.org/jira/browse/HIVE-24895
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add a task to mark the end of external table copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24886) Support simple equality operations between MAP/LIST/STRUCT data types

2021-03-30 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24886.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks [~zabetak]!

> Support simple equality operations between MAP/LIST/STRUCT data types
> -
>
> Key: HIVE-24886
> URL: https://issues.apache.org/jira/browse/HIVE-24886
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning, Query Processor
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently equality operations between non primitive data types (MAP, LIST, 
> STRUCT) work in some very limited cases e.g:
> {code:sql}
> create table table_map_types (id int, c1 map, c2 map);
> select id from table_map_types where map(1,1) IN (map(1,1), map(1,2), 
> map(1,3)); 
> {code}
> but this feature was never introduced explicitly (zero tests & JIRAs around 
> the subject) and the vast majority of queries involving comparisons between 
> non primitive data types now fail at compile time.
> The goal of this issue is to support simple equality operations:
> * EQUALS(=)
> * NOT_EQUALS(<>),
> * IN,
> * IS DISTINCT FROM,
> * IS NOT DISTINCT FROM
> between MAP/LIST/STRUCT data types when the compared types are identical 
> (same type category and identical component types). The following examples 
> illustrate the idea of types being identical:
> {noformat}
> MAP EQUALS MAP OK
> MAP EQUALS MAP KO
> STRUCT EQUALS STRUCT KO
> STRUCT EQUALS STRUCT OK
> LIST EQUALS LIST OK
> LIST EQUALS LIST KO
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24886) Support simple equality operations between MAP/LIST/STRUCT data types

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24886?focusedWorklogId=574455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574455
 ]

ASF GitHub Bot logged work on HIVE-24886:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 23:17
Start Date: 30/Mar/21 23:17
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #2107:
URL: https://github.com/apache/hive/pull/2107


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574455)
Time Spent: 20m  (was: 10m)

> Support simple equality operations between MAP/LIST/STRUCT data types
> -
>
> Key: HIVE-24886
> URL: https://issues.apache.org/jira/browse/HIVE-24886
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning, Query Processor
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently equality operations between non primitive data types (MAP, LIST, 
> STRUCT) work in some very limited cases e.g:
> {code:sql}
> create table table_map_types (id int, c1 map, c2 map);
> select id from table_map_types where map(1,1) IN (map(1,1), map(1,2), 
> map(1,3)); 
> {code}
> but this feature was never introduced explicitly (zero tests & JIRAs around 
> the subject) and the vast majority of queries involving comparisons between 
> non primitive data types now fail at compile time.
> The goal of this issue is to support simple equality operations:
> * EQUALS(=)
> * NOT_EQUALS(<>),
> * IN,
> * IS DISTINCT FROM,
> * IS NOT DISTINCT FROM
> between MAP/LIST/STRUCT data types when the compared types are identical 
> (same type category and identical component types). The following examples 
> illustrate the idea of types being identical:
> {noformat}
> MAP EQUALS MAP OK
> MAP EQUALS MAP KO
> STRUCT EQUALS STRUCT KO
> STRUCT EQUALS STRUCT OK
> LIST EQUALS LIST OK
> LIST EQUALS LIST KO
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24851?focusedWorklogId=574418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574418
 ]

ASF GitHub Bot logged work on HIVE-24851:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 21:07
Start Date: 30/Mar/21 21:07
Worklog Time Spent: 10m 
  Work Description: losipiuk commented on pull request #2129:
URL: https://github.com/apache/hive/pull/2129#issuecomment-810578166


   @pvary it looks CI failed. How can I check if that is related (I very much 
doubt that)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574418)
Time Spent: 6h 40m  (was: 6.5h)

> resources leak on exception in AvroGenericRecordReader constructor
> --
>
> Key: HIVE-24851
> URL: https://issues.apache.org/jira/browse/HIVE-24851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Lukasz Osipiuk
>Assignee: Lukasz Osipiuk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 4.0.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> AvroGenericRecordReader constructor creates an instance of FileReader but 
> lacks proper exception handling, and reader is not closed on the failure path.
> This results in leaking of underlying resources (e.g. S3 connections).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24895) Add a DataCopyEnd stage in ReplStateLogTask for external table replication

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24895?focusedWorklogId=574384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574384
 ]

ASF GitHub Bot logged work on HIVE-24895:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 20:11
Start Date: 30/Mar/21 20:11
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2083:
URL: https://github.com/apache/hive/pull/2083#discussion_r604340386



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -182,6 +183,13 @@ public int execute() {
   Exception ex = new 
SecurityException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e);
   setException(ex);
   return ReplUtils.handleException(true, ex, work.getDumpDirectory(), 
work.getMetricCollector(), getName(), conf);
+} finally {
+  String jobId = conf.get("distcp.job.id", "UNAVAILABLE");

Review comment:
   Can we  define constants for these - k,v

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -182,6 +183,13 @@ public int execute() {
   Exception ex = new 
SecurityException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e);
   setException(ex);
   return ReplUtils.handleException(true, ex, work.getDumpDirectory(), 
work.getMetricCollector(), getName(), conf);
+} finally {
+  String jobId = conf.get("distcp.job.id", "UNAVAILABLE");
+  LOG.info("Completed DirCopyTask for source: {} to  target: {}. Took {}"
+  + ". DistCp JobId {}", work.getFullyQualifiedSourcePath(),
+  work.getFullyQualifiedTargetPath(), ReplUtils
+  .convertToHumanReadableTime(

Review comment:
   nit: could you please format the code

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -269,12 +270,20 @@ private Path getCurrentDumpPath(Path dumpRoot, boolean 
isBootstrap) throws IOExc
 }
   }
 
-  private void initiateDataCopyTasks() throws SemanticException, IOException {
+  private void initiateDataCopyTasks(ReplLogger replLogger) throws 
SemanticException, IOException {
 TaskTracker taskTracker = new 
TaskTracker(conf.getIntVar(HiveConf.ConfVars.REPL_APPROX_MAX_LOAD_TASKS));
 if (childTasks == null) {
   childTasks = new ArrayList<>();
 }
-childTasks.addAll(work.externalTableCopyTasks(taskTracker, conf));
+List> externalTableCopyTasks =

Review comment:
   nit: Please format the code. lines can accommodate in their respective 
line length limit itself

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##
@@ -376,7 +379,14 @@ private void addLazyDataCopyTask(TaskTracker 
loadTaskTracker) throws IOException
   if (childTasks == null) {
 childTasks = new ArrayList<>();
   }
-  childTasks.addAll(work.externalTableCopyTasks(loadTaskTracker, conf));
+  List> externalTableCopyTasks =
+  work.externalTableCopyTasks(loadTaskTracker, conf);

Review comment:
   nit: please format

##
File path: 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
##
@@ -1184,6 +1186,14 @@ public boolean runDistCp(List srcPaths, Path dst, 
Configuration conf) thro
 } catch (Exception e) {
   throw new IOException("Cannot execute DistCp process: " + e, e);
 } finally {
+  // Set the job id from distCp conf to the callers configuration.
+  if (distcp != null) {
+String jobId = distcp.getConf().get(CONF_LABEL_DISTCP_JOB_ID);
+if (jobId != null) {

Review comment:
   When would job id be null? 

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStateLogWork.java
##
@@ -55,7 +56,8 @@
 TABLE,
 FUNCTION,
 EVENT,
-END
+END,
+DATACOPYEND

Review comment:
   Also, why not to add DATA_COPY_START also?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/DirCopyTask.java
##
@@ -182,6 +183,13 @@ public int execute() {
   Exception ex = new 
SecurityException(ErrorMsg.REPL_RETRY_EXHAUSTED.format(e.getMessage()), e);
   setException(ex);
   return ReplUtils.handleException(true, ex, work.getDumpDirectory(), 
work.getMetricCollector(), getName(), conf);
+} finally {
+  String jobId = conf.get("distcp.job.id", "UNAVAILABLE");
+  LOG.info("Completed DirCopyTask for source: {} to  target: {}. Took {}"

Review comment:
   Wouldn't this always be printed. Even in case of failure. 

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -120,6 +119,7 @@
 import static 
org.apache.hadoop.hive.metastore.ReplChangeManager.getReplPolicyIdString;
 import static org.apache.hadoop.hive.ql.exec.repl.ReplAck.LOAD_ACKNOWLEDGEMENT;
 import static 

[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=574276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574276
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 17:32
Start Date: 30/Mar/21 17:32
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request #2111:
URL: https://github.com/apache/hive/pull/2111


   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574276)
Time Spent: 2h  (was: 1h 50m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=574275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574275
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 17:32
Start Date: 30/Mar/21 17:32
Worklog Time Spent: 10m 
  Work Description: lcspinter closed pull request #2111:
URL: https://github.com/apache/hive/pull/2111


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574275)
Time Spent: 1h 50m  (was: 1h 40m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24958) Create Iceberg catalog module in Hive

2021-03-30 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-24958:
-


> Create Iceberg catalog module in Hive
> -
>
> Key: HIVE-24958
> URL: https://issues.apache.org/jira/browse/HIVE-24958
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> * Create a new iceberg-catalog module in Hive, with the code currently 
> contained in Iceberg's iceberg-hive-metastore module
>  * Make sure all tests pass (including static analysis and checkstyle)
>  * Make iceberg-handler depend on this module instead of 
> iceberg-hive-metastore



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24889) Hive CLI not working after upgrading from Oracle JDK 8u112 to 8u281

2021-03-30 Thread Norbert Kiam Maclang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Kiam Maclang updated HIVE-24889:

Description: 
After upgrading Oracle JDK version from jdk-8u112 to jdk-8u281, Hive CLI is not 
working anymore and gives below error when logging in.
{code:java}
WARNING: Use "yarn jar" to launch YARN applications.
21/03/09 11:00:04 WARN conf.HiveConf: HiveConf of name 
hive.server2.enable.impersonation does not existLogging initialized using 
configuration in file:/etc/hive/2.4.3.0-227/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: 
Previous writer likely failed to write 
hdfs://ppcontent-nn1.pp-content.dataplatform.com:8020/tmp/hive/hive/_tez_session_dir/96b21825-63f4-4316-9c43-20ebe641d9c9/hive-hcatalog-core.jar.
 Failing because I am unlikely to write too.
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:544)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Previous writer likely failed to write 
hdfs://ppcontent-nn1.pp-content.dataplatform.com:8020/tmp/hive/hive/_tez_session_dir/96b21825-63f4-4316-9c43-20ebe641d9c9/hive-hcatalog-core.jar.
 Failing because I am unlikely to write too.
at 
org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:982)
at 
org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:862)
at 
org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:805)
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState.java:233)
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:158)
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:541)
... 8 more
{code}
Version we are using:
 * Ambari 2.2.2
 * Hive 1.2.1
 * Hadoop 2.7
 * Spark 1.6
 * HDP 2.4
 * Tez 0.7.0.2.4

  was:
After upgrading Oracle JDK version from jdk-8u112 to jdk-8u281, Hive CLI is not 
working anymore and gives below error when logging in.
{code:java}
WARNING: Use "yarn jar" to launch YARN applications.
21/03/09 11:00:04 WARN conf.HiveConf: HiveConf of name 
hive.server2.enable.impersonation does not existLogging initialized using 
configuration in file:/etc/hive/2.4.3.0-227/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: 
Previous writer likely failed to write 
hdfs://ppcontent-nn1.pp-content.dataplatform.com:8020/tmp/hive/hive/_tez_session_dir/96b21825-63f4-4316-9c43-20ebe641d9c9/hive-hcatalog-core.jar.
 Failing because I am unlikely to write too.
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:544)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Previous writer likely failed to write 
hdfs://ppcontent-nn1.pp-content.dataplatform.com:8020/tmp/hive/hive/_tez_session_dir/96b21825-63f4-4316-9c43-20ebe641d9c9/hive-hcatalog-core.jar.
 Failing because I am unlikely to write too.
at 
org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:982)
at 
org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:862)
at 
org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:805)
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState.java:233)
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:158)
at 
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
at 

[jira] [Comment Edited] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-03-30 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311521#comment-17311521
 ] 

Stamatis Zampetakis edited comment on HIVE-24957 at 3/30/21, 1:21 PM:
--

The problem lies in the query plan and more specifically in the 
{{HiveRelDecorrelator}}. 
{noformat}
2021-03-30T06:16:57,682 DEBUG [d8fca83a-1e2a-4864-8730-f496318a0e47 main] 
rules.RelFieldTrimmer: Plan after trimming unused fields
HiveProject(b_title=[$0])
  HiveFilter(condition=[EXISTS({
HiveProject(a_authorkey=[$0])
  HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveProject(b_title=[$1], b_authorkey=[$2])
  HiveTableScan(table=[[default, book]], table:alias=[b])

2021-03-30T06:16:57,682 DEBUG [d8fca83a-1e2a-4864-8730-f496318a0e47 main] 
parse.CalcitePlanner: Plan before removing subquery:
HiveProject(b_title=[$1])
  HiveFilter(condition=[EXISTS({
HiveProject(a_authorkey=[$0])
  HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveTableScan(table=[[default, book]], table:alias=[b])

2021-03-30T06:16:57,690 DEBUG [d8fca83a-1e2a-4864-8730-f496318a0e47 main] 
parse.CalcitePlanner: Plan just after removing subquery:
HiveProject(b_title=[$1])
  LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{2}])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true])
  HiveProject(a_authorkey=[$0])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
  HiveTableScan(table=[[default, author]], table:alias=[a])

2021-03-30T06:16:57,796 DEBUG [d8fca83a-1e2a-4864-8730-f496318a0e47 main] 
parse.CalcitePlanner: Plan after decorrelation:
HiveProject(b_title=[$1])
  HiveSemiJoin(condition=[=($8, $2)], joinType=[semi])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true], b_authorkey=[$1])
  HiveProject(a_authorkey=[$0], b_authorkey=[$6])
HiveJoin(condition=[=(CASE(IS NOT NULL($6), $6, 300), $0)], 
joinType=[inner], algorithm=[none], cost=[not available])
  HiveTableScan(table=[[default, author]], table:alias=[a])
  HiveAggregate(group=[{0}])
HiveProject(b_authorkey=[$2])
  HiveTableScan(table=[[default, book]], table:alias=[b])
{noformat}
The problem starts with the introduction of the {{HiveSemiJoin}}. Due to that 
books with NULL {{b_authorkey}} are removed from the result set. 


was (Author: zabetak):
The problem lies in the query plan and more specifically in the 
{{HiveRelDecorrelator}}. 
{noformat}
2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
rules.RelFieldTrimmer: Plan after trimming unused fields
HiveProject(b_title=[$0])
  HiveFilter(condition=[EXISTS({
HiveProject(_o__c0=[1])
  HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveProject(b_title=[$1], b_authorkey=[$2])
  HiveTableScan(table=[[default, book]], table:alias=[b])

2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
parse.CalcitePlanner: Plan before removing subquery:
HiveProject(b_title=[$1])
  HiveFilter(condition=[EXISTS({
HiveProject(_o__c0=[1])
  HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveTableScan(table=[[default, book]], table:alias=[b])

2021-03-30T06:07:50,280 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
parse.CalcitePlanner: Plan just after removing subquery:
HiveProject(b_title=[$1])
  LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{2}])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true])
  HiveProject(_o__c0=[1])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
  HiveTableScan(table=[[default, author]], table:alias=[a])

2021-03-30T06:07:50,282 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
parse.CalcitePlanner: Plan after decorrelation:
HiveProject(b_title=[$1])
  HiveSemiJoin(condition=[=($8, $2)], joinType=[semi])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true], b_authorkey=[$1])
  HiveProject(_o__c0=[1], b_authorkey=[$6])
HiveJoin(condition=[=(CASE(IS NOT NULL($6), $6, 300), $0)], 
joinType=[inner], algorithm=[none], cost=[not available])
  HiveTableScan(table=[[default, author]], table:alias=[a])
  HiveAggregate(group=[{0}])
HiveProject(b_authorkey=[$2])
  

[jira] [Commented] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-03-30 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311521#comment-17311521
 ] 

Stamatis Zampetakis commented on HIVE-24957:


The problem lies in the query plan and more specifically in the 
{{HiveRelDecorrelator}}. 
{noformat}
2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
rules.RelFieldTrimmer: Plan after trimming unused fields
HiveProject(b_title=[$0])
  HiveFilter(condition=[EXISTS({
HiveProject(_o__c0=[1])
  HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveProject(b_title=[$1], b_authorkey=[$2])
  HiveTableScan(table=[[default, book]], table:alias=[b])

2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
parse.CalcitePlanner: Plan before removing subquery:
HiveProject(b_title=[$1])
  HiveFilter(condition=[EXISTS({
HiveProject(_o__c0=[1])
  HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveTableScan(table=[[default, book]], table:alias=[b])

2021-03-30T06:07:50,280 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
parse.CalcitePlanner: Plan just after removing subquery:
HiveProject(b_title=[$1])
  LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{2}])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true])
  HiveProject(_o__c0=[1])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey), 
$cor0.b_authorkey, 300), $0)])
  HiveTableScan(table=[[default, author]], table:alias=[a])

2021-03-30T06:07:50,282 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main] 
parse.CalcitePlanner: Plan after decorrelation:
HiveProject(b_title=[$1])
  HiveSemiJoin(condition=[=($8, $2)], joinType=[semi])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true], b_authorkey=[$1])
  HiveProject(_o__c0=[1], b_authorkey=[$6])
HiveJoin(condition=[=(CASE(IS NOT NULL($6), $6, 300), $0)], 
joinType=[inner], algorithm=[none], cost=[not available])
  HiveTableScan(table=[[default, author]], table:alias=[a])
  HiveAggregate(group=[{0}])
HiveProject(b_authorkey=[$2])
  HiveTableScan(table=[[default, book]], table:alias=[b])
{noformat}
The problem starts with the introduction of the {{HiveSemiJoin}}. Due to that 
books with NULL {{b_authorkey}} are removed from the result set. 

> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select a_authorkey
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-03-30 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-24957:
---
Description: 
Consider the following example:
{code:sql}
create table author (
a_authorkey   int,
a_name varchar(50));

create table book (
b_bookkey   int,
b_title varchar(50),
b_authorkey int);

insert into author values (10, 'Victor Hugo');
insert into author values (20, 'Alexandre Dumas');
insert into author values (300, 'UNKNOWN');

insert into book values (1, 'Les Miserables', 10);
insert into book values (2, 'The Count of Monte Cristo', 20);
insert into book values (3, 'Men Without Women', 30);
insert into book values (4, 'Odyssey', null);

select b.b_title
from book b
where exists
  (select a_authorkey
   from author a
   where coalesce(b.b_authorkey, 300) = a.a_authorkey);
{code}

*Expected results*
||B_TITLE||
|Les Miserables|
|The Count of Monte Cristo|
|Odyssey|

*Actual results*
||B_TITLE||
|Les Miserables|
|The Count of Monte Cristo|

{{Odyssey}} is missing from the result set and it shouldn't since with the 
application of COALESCE operator it should match with the UNKNOWN author.

  was:
Consider the following example:
{code:sql}
create table author (
a_authorkey   int,
a_name varchar(50));

create table book (
b_bookkey   int,
b_title varchar(50),
b_authorkey int);

insert into author values (10, 'Victor Hugo');
insert into author values (20, 'Alexandre Dumas');
insert into author values (300, 'UNKNOWN');

insert into book values (1, 'Les Miserables', 10);
insert into book values (2, 'The Count of Monte Cristo', 20);
insert into book values (3, 'Men Without Women', 30);
insert into book values (4, 'Odyssey', null);

select b.b_title
from book b
where exists
  (select 1
   from author a
   where coalesce(b.b_authorkey, 300) = a.a_authorkey);
{code}

*Expected results*
||B_TITLE||
|Les Miserables|
|The Count of Monte Cristo|
|Odyssey|

*Actual results*
||B_TITLE||
|Les Miserables|
|The Count of Monte Cristo|

{{Odyssey}} is missing from the result set and it shouldn't since with the 
application of COALESCE operator it should match with the UNKNOWN author.


> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select a_authorkey
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate

2021-03-30 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24957:
--


> Wrong results when subquery has COALESCE in correlation predicate
> -
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey   int,
> a_name varchar(50));
> create table book (
> b_bookkey   int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
>   (select 1
>from author a
>where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the 
> application of COALESCE operator it should match with the UNKNOWN author.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24955) New metrics about aborted transactions

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24955?focusedWorklogId=574140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574140
 ]

ASF GitHub Bot logged work on HIVE-24955:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 12:56
Start Date: 30/Mar/21 12:56
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2132:
URL: https://github.com/apache/hive/pull/2132#discussion_r604070671



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java
##
@@ -27,6 +27,11 @@
   public static final String COMPACTION_WORKER_CYCLE = 
"compaction_worker_cycle";
   public static final String OLDEST_OPEN_TXN_ID = "oldest_open_txn_id";
   public static final String OLDEST_OPEN_TXN_AGE = 
"oldest_open_txn_age_in_sec";
+  // number of aborted txns in TXNS table
+  public static final String NUM_ABORTED_TXNS_IN_TXNS = 
COMPACTION_STATUS_PREFIX + "aborted_txns_in_txns";

Review comment:
   yes, I would go with NUM_TOTAL_ABORTED_TXNS and NUM_ABORTED_TXNS. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574140)
Time Spent: 1h 40m  (was: 1.5h)

> New metrics about aborted transactions
> --
>
> Key: HIVE-24955
> URL: https://issues.apache.org/jira/browse/HIVE-24955
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> 5 new metrics:
>  * Number of aborted transactions in the TXNS table (collected in 
> AcidMetricsService)
>  * Oldest aborted transaction (collected in AcidMetricsService)
>  * Number of aborted write transaction (incremented counter at 
> abortTransaction)
>  * Number of committed write transaction (incremented counter at 
> commitTransaction)
>  * Number of timed out transactions (cleaner removed them after heartbeat 
> time out)
> The latter 3 will restart as 0 after every HMS restart 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24955) New metrics about aborted transactions

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24955?focusedWorklogId=574132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574132
 ]

ASF GitHub Bot logged work on HIVE-24955:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 12:46
Start Date: 30/Mar/21 12:46
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2132:
URL: https://github.com/apache/hive/pull/2132#discussion_r604062554



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java
##
@@ -27,6 +27,11 @@
   public static final String COMPACTION_WORKER_CYCLE = 
"compaction_worker_cycle";
   public static final String OLDEST_OPEN_TXN_ID = "oldest_open_txn_id";
   public static final String OLDEST_OPEN_TXN_AGE = 
"oldest_open_txn_age_in_sec";
+  // number of aborted txns in TXNS table
+  public static final String NUM_ABORTED_TXNS_IN_TXNS = 
COMPACTION_STATUS_PREFIX + "aborted_txns_in_txns";

Review comment:
   I have to differentiate between 2 metrics:
   1. count in TXNs where status='a' (current name: NUM_ABORTED_TXNS_IN_TXNS)
   2. total number of aborted txns since hms was started (current name: 
NUM_ABORTED_WRITE_TXNS but I'd change this to NUM_ABORTED_TXNS)
   
   It would also make sense to change #2 to NUM_TOTAL_ABORTED_TXNS and #1 to 
NUM_CURRENT_ABORTED_TXNS. Does that make sense to you too?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574132)
Time Spent: 1.5h  (was: 1h 20m)

> New metrics about aborted transactions
> --
>
> Key: HIVE-24955
> URL: https://issues.apache.org/jira/browse/HIVE-24955
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 5 new metrics:
>  * Number of aborted transactions in the TXNS table (collected in 
> AcidMetricsService)
>  * Oldest aborted transaction (collected in AcidMetricsService)
>  * Number of aborted write transaction (incremented counter at 
> abortTransaction)
>  * Number of committed write transaction (incremented counter at 
> commitTransaction)
>  * Number of timed out transactions (cleaner removed them after heartbeat 
> time out)
> The latter 3 will restart as 0 after every HMS restart 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24955) New metrics about aborted transactions

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24955?focusedWorklogId=574127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574127
 ]

ASF GitHub Bot logged work on HIVE-24955:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 12:44
Start Date: 30/Mar/21 12:44
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2132:
URL: https://github.com/apache/hive/pull/2132#discussion_r604061403



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java
##
@@ -444,6 +457,36 @@ public void testDBMetrics() throws Exception {
 
 Assert.assertEquals(1,
 Metrics.getOrCreateGauge(MetricsConstants.COMPACTION_STATUS_PREFIX + 
"txn_to_writeid").intValue());
+
+start = System.currentTimeMillis() - 1000L;

Review comment:
   leftovers from copy-paste. I removed it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574127)
Time Spent: 1h 20m  (was: 1h 10m)

> New metrics about aborted transactions
> --
>
> Key: HIVE-24955
> URL: https://issues.apache.org/jira/browse/HIVE-24955
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> 5 new metrics:
>  * Number of aborted transactions in the TXNS table (collected in 
> AcidMetricsService)
>  * Oldest aborted transaction (collected in AcidMetricsService)
>  * Number of aborted write transaction (incremented counter at 
> abortTransaction)
>  * Number of committed write transaction (incremented counter at 
> commitTransaction)
>  * Number of timed out transactions (cleaner removed them after heartbeat 
> time out)
> The latter 3 will restart as 0 after every HMS restart 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24955) New metrics about aborted transactions

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24955?focusedWorklogId=574122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574122
 ]

ASF GitHub Bot logged work on HIVE-24955:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 12:39
Start Date: 30/Mar/21 12:39
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2132:
URL: https://github.com/apache/hive/pull/2132#discussion_r604058028



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java
##
@@ -444,6 +457,36 @@ public void testDBMetrics() throws Exception {
 
 Assert.assertEquals(1,
 Metrics.getOrCreateGauge(MetricsConstants.COMPACTION_STATUS_PREFIX + 
"txn_to_writeid").intValue());
+
+start = System.currentTimeMillis() - 1000L;

Review comment:
   em, why do you need to subtract 1s here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574122)
Time Spent: 1h 10m  (was: 1h)

> New metrics about aborted transactions
> --
>
> Key: HIVE-24955
> URL: https://issues.apache.org/jira/browse/HIVE-24955
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> 5 new metrics:
>  * Number of aborted transactions in the TXNS table (collected in 
> AcidMetricsService)
>  * Oldest aborted transaction (collected in AcidMetricsService)
>  * Number of aborted write transaction (incremented counter at 
> abortTransaction)
>  * Number of committed write transaction (incremented counter at 
> commitTransaction)
>  * Number of timed out transactions (cleaner removed them after heartbeat 
> time out)
> The latter 3 will restart as 0 after every HMS restart 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24955) New metrics about aborted transactions

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24955?focusedWorklogId=574118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574118
 ]

ASF GitHub Bot logged work on HIVE-24955:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 12:37
Start Date: 30/Mar/21 12:37
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2132:
URL: https://github.com/apache/hive/pull/2132#discussion_r604055831



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java
##
@@ -27,6 +27,11 @@
   public static final String COMPACTION_WORKER_CYCLE = 
"compaction_worker_cycle";
   public static final String OLDEST_OPEN_TXN_ID = "oldest_open_txn_id";
   public static final String OLDEST_OPEN_TXN_AGE = 
"oldest_open_txn_age_in_sec";
+  // number of aborted txns in TXNS table
+  public static final String NUM_ABORTED_TXNS_IN_TXNS = 
COMPACTION_STATUS_PREFIX + "aborted_txns_in_txns";

Review comment:
   could this be just NUM_ABORTED_TXNS ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574118)
Time Spent: 1h  (was: 50m)

> New metrics about aborted transactions
> --
>
> Key: HIVE-24955
> URL: https://issues.apache.org/jira/browse/HIVE-24955
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> 5 new metrics:
>  * Number of aborted transactions in the TXNS table (collected in 
> AcidMetricsService)
>  * Oldest aborted transaction (collected in AcidMetricsService)
>  * Number of aborted write transaction (incremented counter at 
> abortTransaction)
>  * Number of committed write transaction (incremented counter at 
> commitTransaction)
>  * Number of timed out transactions (cleaner removed them after heartbeat 
> time out)
> The latter 3 will restart as 0 after every HMS restart 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23583) Upgrade to ant 1.10.9 due to CVEs

2021-03-30 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311486#comment-17311486
 ] 

Naveen Gangam commented on HIVE-23583:
--

PR 1599 has been auto-closed. We will need an active PR (if there is a way to 
re-open, that would be great). Otherwise a new PR for this change. Thanks

> Upgrade to ant 1.10.9 due to CVEs
> -
>
> Key: HIVE-23583
> URL: https://issues.apache.org/jira/browse/HIVE-23583
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Renukaprasad C
>Assignee: Kevin Risden
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23583.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Update ANT to fix:
> CVE-2020-1945: Apache Ant insecure temporary file vulnerability
> Severity: Medium
> Vendor:
> The Apache Software Foundation
> Versions Affected:
> Apache Ant 1.1 to 1.9.14 and 1.10.0 to 1.10.7
> Description:
> Apache Ant uses the default temporary directory identified by the Java
> system property java.io.tmpdir for several tasks and may thus leak
> sensitive information. The fixcrlf and replaceregexp tasks also copy
> files from the temporary directory back into the build tree allowing an
> attacker to inject modified source files into the build process.
> Mitigation:
> Ant users of versions 1.1 to 1.9.14 and 1.10.0 to 1.10.7 should set the
> java.io.tmpdir system property to point to a directory only readable and
> writable by the current user prior to running Ant.
> Users of versions 1.9.15 and 1.10.8 can use the Ant property ant.tmpfile
> instead. Users of Ant 1.10.8 can rely on Ant protecting the temporary
> files if the underlying filesystem allows it, but we still recommend
> using a private temporary directory instead.
> References:
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=2020-1945
> https://nvd.nist.gov/vuln/detail/CVE-2020-1945



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24955) New metrics about aborted transactions

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24955?focusedWorklogId=574112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574112
 ]

ASF GitHub Bot logged work on HIVE-24955:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 12:28
Start Date: 30/Mar/21 12:28
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2132:
URL: https://github.com/apache/hive/pull/2132#discussion_r604049457



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java
##
@@ -27,6 +27,11 @@
   public static final String COMPACTION_WORKER_CYCLE = 
"compaction_worker_cycle";
   public static final String OLDEST_OPEN_TXN_ID = "oldest_open_txn_id";
   public static final String OLDEST_OPEN_TXN_AGE = 
"oldest_open_txn_age_in_sec";
+  // number of aborted txns in TXNS table
+  public static final String NUM_ABORTED_TXNS_IN_TXNS = 
COMPACTION_STATUS_PREFIX + "aborted_txns_in_txns";
+  public static final String OLDEST_ABORTED_TXN_ID = "oldest_aborted_txn_id";
+  public static final String OLDEST_ABORTED_TXN_AGE_IN_SEC = 
"oldest_aborted_txn_age_in_sec";

Review comment:
   could we please use consistent naming? should we modify 
OLDEST_OPEN_TXN_AGE -> OLDEST_OPEN_TXN_AGE_IN_SEC? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574112)
Time Spent: 50m  (was: 40m)

> New metrics about aborted transactions
> --
>
> Key: HIVE-24955
> URL: https://issues.apache.org/jira/browse/HIVE-24955
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 5 new metrics:
>  * Number of aborted transactions in the TXNS table (collected in 
> AcidMetricsService)
>  * Oldest aborted transaction (collected in AcidMetricsService)
>  * Number of aborted write transaction (incremented counter at 
> abortTransaction)
>  * Number of committed write transaction (incremented counter at 
> commitTransaction)
>  * Number of timed out transactions (cleaner removed them after heartbeat 
> time out)
> The latter 3 will restart as 0 after every HMS restart 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24955) New metrics about aborted transactions

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24955?focusedWorklogId=574106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574106
 ]

ASF GitHub Bot logged work on HIVE-24955:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 12:25
Start Date: 30/Mar/21 12:25
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2132:
URL: https://github.com/apache/hive/pull/2132#discussion_r604047096



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -4578,6 +4584,7 @@ private int abortTxns(Connection dbConn, List 
txnids, boolean checkHeartbe
   prefix.append("DELETE FROM \"HIVE_LOCKS\" WHERE ");
   TxnUtils.buildQueryWithINClause(conf, queries, prefix, suffix, txnids, 
"\"HL_TXNID\"", false, false);
 
+  
Metrics.getOrCreateCounter(MetricsConstants.NUM_ABORTED_WRITE_TXNS).inc(txnids.size());

Review comment:
   I don't think you can differentiate here between aborted read and write 
txns




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574106)
Time Spent: 40m  (was: 0.5h)

> New metrics about aborted transactions
> --
>
> Key: HIVE-24955
> URL: https://issues.apache.org/jira/browse/HIVE-24955
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 5 new metrics:
>  * Number of aborted transactions in the TXNS table (collected in 
> AcidMetricsService)
>  * Oldest aborted transaction (collected in AcidMetricsService)
>  * Number of aborted write transaction (incremented counter at 
> abortTransaction)
>  * Number of committed write transaction (incremented counter at 
> commitTransaction)
>  * Number of timed out transactions (cleaner removed them after heartbeat 
> time out)
> The latter 3 will restart as 0 after every HMS restart 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24955) New metrics about aborted transactions

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24955?focusedWorklogId=574103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574103
 ]

ASF GitHub Bot logged work on HIVE-24955:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 12:21
Start Date: 30/Mar/21 12:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2132:
URL: https://github.com/apache/hive/pull/2132#discussion_r604044332



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -4578,6 +4584,7 @@ private int abortTxns(Connection dbConn, List 
txnids, boolean checkHeartbe
   prefix.append("DELETE FROM \"HIVE_LOCKS\" WHERE ");
   TxnUtils.buildQueryWithINClause(conf, queries, prefix, suffix, txnids, 
"\"HL_TXNID\"", false, false);
 
+  
Metrics.getOrCreateCounter(MetricsConstants.NUM_ABORTED_WRITE_TXNS).inc(txnids.size());

Review comment:
   same as above. what would happen if below query fails.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574103)
Time Spent: 0.5h  (was: 20m)

> New metrics about aborted transactions
> --
>
> Key: HIVE-24955
> URL: https://issues.apache.org/jira/browse/HIVE-24955
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 5 new metrics:
>  * Number of aborted transactions in the TXNS table (collected in 
> AcidMetricsService)
>  * Oldest aborted transaction (collected in AcidMetricsService)
>  * Number of aborted write transaction (incremented counter at 
> abortTransaction)
>  * Number of committed write transaction (incremented counter at 
> commitTransaction)
>  * Number of timed out transactions (cleaner removed them after heartbeat 
> time out)
> The latter 3 will restart as 0 after every HMS restart 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24955) New metrics about aborted transactions

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24955?focusedWorklogId=574102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-574102
 ]

ASF GitHub Bot logged work on HIVE-24955:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 12:17
Start Date: 30/Mar/21 12:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2132:
URL: https://github.com/apache/hive/pull/2132#discussion_r604041401



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -1453,6 +1455,7 @@ public void commitTxn(CommitTxnRequest rqst)
 }
 
 createCommitNotificationEvent(dbConn, txnid , txnType);
+Metrics.getOrCreateCounter(MetricsConstants.NUM_COMMITTED_TXNS).inc();

Review comment:
   what if db.commit fails? metric counter won't be decremented




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 574102)
Time Spent: 20m  (was: 10m)

> New metrics about aborted transactions
> --
>
> Key: HIVE-24955
> URL: https://issues.apache.org/jira/browse/HIVE-24955
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 5 new metrics:
>  * Number of aborted transactions in the TXNS table (collected in 
> AcidMetricsService)
>  * Oldest aborted transaction (collected in AcidMetricsService)
>  * Number of aborted write transaction (incremented counter at 
> abortTransaction)
>  * Number of committed write transaction (incremented counter at 
> commitTransaction)
>  * Number of timed out transactions (cleaner removed them after heartbeat 
> time out)
> The latter 3 will restart as 0 after every HMS restart 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory

2021-03-30 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311400#comment-17311400
 ] 

Zoltan Haindrich commented on HIVE-24625:
-

wouldn't it was an option to fix the movetask to put the data in the right 
place?

this transformer stuff is getting more-and-more complex and obscure...

> CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect 
> directory
> -
>
> Key: HIVE-24625
> URL: https://issues.apache.org/jira/browse/HIVE-24625
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> MetastoreDefaultTransformer in HMS converts a managed non transactional table 
> to external table. MoveTask still uses the managed path when loading the 
> data, resulting an always empty table.
> {code:java}
> create table tbl1 TBLPROPERTIES ('transactional'='false') as select * from 
> other;{code}
> After the conversion the table location points to an external directory:
> Location: | 
> hdfs://c670-node2.coelab.cloudera.com:8020/warehouse/tablespace/external/hive/tbl1
> Move task uses the managed location"
> {code:java}
> INFO : Moving data to directory 
> hdfs://...:8020/warehouse/tablespace/managed/hive/tbl1 from 
> hdfs://...:8020/warehouse/tablespace/managed/hive/.hive-staging_hive_2021-01-05_16-10-39_973_41005081081760609-4/-ext-1000
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24918) Handle failover case during Repl Dump

2021-03-30 Thread Haymant Mangla (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla updated HIVE-24918:
--
Description: 
To handle:
 a) Whenever user wants to go ahead with failover, during the next or 
subsequent repl dump operation upon confirming that there are no pending open 
transaction events, It should create a _failover_ready marker file in the dump 
dir. This marker file would contain scheduled query name
that has generated this dump.

b) Skip next repl dump instances once we have the marker file placed.

  was:
To handle:
a) Whenever user wants to go ahead with failover, during the next or subsequent 
repl dump operation upon confirming that there are no pending open transaction 
events, in should create a _failover_ready marker file in the dump dir.
b) Skip next repl dump instances once we have the marker file placed.


> Handle failover case during Repl Dump
> -
>
> Key: HIVE-24918
> URL: https://issues.apache.org/jira/browse/HIVE-24918
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> To handle:
>  a) Whenever user wants to go ahead with failover, during the next or 
> subsequent repl dump operation upon confirming that there are no pending open 
> transaction events, It should create a _failover_ready marker file in the 
> dump dir. This marker file would contain scheduled query name
> that has generated this dump.
> b) Skip next repl dump instances once we have the marker file placed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24729) Implement strategy for llap cache hydration

2021-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24729?focusedWorklogId=573940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-573940
 ]

ASF GitHub Bot logged work on HIVE-24729:
-

Author: ASF GitHub Bot
Created on: 30/Mar/21 08:18
Start Date: 30/Mar/21 08:18
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #2106:
URL: https://github.com/apache/hive/pull/2106


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 573940)
Time Spent: 1h  (was: 50m)

> Implement strategy for llap cache hydration
> ---
>
> Key: HIVE-24729
> URL: https://issues.apache.org/jira/browse/HIVE-24729
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24729) Implement strategy for llap cache hydration

2021-03-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-24729.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master. Thanks [~asinkovits]

> Implement strategy for llap cache hydration
> ---
>
> Key: HIVE-24729
> URL: https://issues.apache.org/jira/browse/HIVE-24729
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24956) Add debug logs for time taken in the incremental event processing

2021-03-30 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma reassigned HIVE-24956:
--


> Add debug logs for time taken in the incremental event processing
> -
>
> Key: HIVE-24956
> URL: https://issues.apache.org/jira/browse/HIVE-24956
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)