[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=560230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560230
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 03/Mar/21 00:48
Start Date: 03/Mar/21 00:48
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1635:
URL: https://github.com/apache/hive/pull/1635


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560230)
Time Spent: 8h 40m  (was: 8.5h)

> Upgrade Avro to version 1.10.1
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24502) Store table level regular expression used during dump for table level replication

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24502?focusedWorklogId=560229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560229
 ]

ASF GitHub Bot logged work on HIVE-24502:
-

Author: ASF GitHub Bot
Created on: 03/Mar/21 00:48
Start Date: 03/Mar/21 00:48
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1759:
URL: https://github.com/apache/hive/pull/1759


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560229)
Time Spent: 40m  (was: 0.5h)

> Store table level regular expression used during dump for table level 
> replication
> -
>
> Key: HIVE-24502
> URL: https://issues.apache.org/jira/browse/HIVE-24502
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24502.01.patch, HIVE-24502.02.patch, 
> HIVE-24502.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Store include table list and exclude table list as part of dump meta data file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-15444) tez.queue.name is invalid after tez job running on CLI

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-15444?focusedWorklogId=560228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560228
 ]

ASF GitHub Bot logged work on HIVE-15444:
-

Author: ASF GitHub Bot
Created on: 03/Mar/21 00:48
Start Date: 03/Mar/21 00:48
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1815:
URL: https://github.com/apache/hive/pull/1815


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560228)
Time Spent: 0.5h  (was: 20m)

> tez.queue.name is invalid after tez job running on CLI
> --
>
> Key: HIVE-15444
> URL: https://issues.apache.org/jira/browse/HIVE-15444
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Hui Fei
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
> Attachments: HIVE-15444.1.patch, HIVE-15444.2.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> hive> set tez.queue.name;
> tez.queue.name is undefined
> hive> set tez.queue.name=HQ_OLPS;
> hive> set tez.queue.name;
> tez.queue.name=HQ_OLPS
> {code}
> {code}
> hive> insert into abc values(2,2);
> Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1481877998678_0011)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 6.57 s
> --
> Loading data to table default.abc
> OK
> Time taken: 19.983 seconds
> {code}
> {code}
> hive> set tez.queue.name;
> tez.queue.name is undefined
> hive> set hive.execution.engine;
> hive.execution.engine=tez
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24837) Upgrade httpclient to 4.5.13+

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24837?focusedWorklogId=560218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560218
 ]

ASF GitHub Bot logged work on HIVE-24837:
-

Author: ASF GitHub Bot
Created on: 03/Mar/21 00:09
Start Date: 03/Mar/21 00:09
Worklog Time Spent: 10m 
  Work Description: hsnusonic closed pull request #2032:
URL: https://github.com/apache/hive/pull/2032


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560218)
Time Spent: 20m  (was: 10m)

> Upgrade httpclient to 4.5.13+
> -
>
> Key: HIVE-24837
> URL: https://issues.apache.org/jira/browse/HIVE-24837
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> Hive is using httpclients 4.5.6. We will need to upgrade httpclient and 
> httpcore.
> {quote}CVSSv2:
>  Base Score: MEDIUM (5.0)
>  Vector: /AV:N/AC:L/Au:N/C:N/I:P/A:N
>  CVSSv3:
>  Base Score: MEDIUM (5.3)
>  Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N
> CVE-2020-13956: Apache HttpClient incorrect handling of malformed
>  authority component in request URIs
> Severity: Medium
> Vendor:
>  The Apache Software Foundation
> Versions Affected:
>  Apache HttpClient 4.5.12 and prior 
>  Apache HttpClient 5.0.2 and prior
> Description:
> Apache HttpClient versions prior to version 4.5.13 and 5.0.3 can
>  misinterpret malformed authority component in request URIs passed to
>  the library as java.net.URI object and pick the wrong target host for
>  request execution.
> Mitigation:
> As of release 4.5.13 and 5.0.3 HttpClient will reject URIs with
>  ambiguous malformed authority component as invalid. Users of HttpClient
>  are advised to upgrade to version 4.5.13 or 5.0.3 and sanitize request
>  URIs when using java.net.URI as input.
> Credit:
>  This issue was discovered and reported by Priyank Nigam
> {quote}
> Reference:
>  * [https://www.openwall.com/lists/oss-security/2020/10/08/4]
>  * [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13956]
>  * [https://nvd.nist.gov/vuln/detail/CVE-2020-13956]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24783) Store currentNotificationID on target during repl load operation

2021-03-02 Thread Haymant Mangla (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294033#comment-17294033
 ] 

Haymant Mangla commented on HIVE-24783:
---

My Pleasure.

> Store currentNotificationID on target during repl load operation
> 
>
> Key: HIVE-24783
> URL: https://issues.apache.org/jira/browse/HIVE-24783
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24783) Store currentNotificationID on target during repl load operation

2021-03-02 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha resolved HIVE-24783.
-
Resolution: Fixed

Committed to master.
Thank you for the patch, [~haymant]

> Store currentNotificationID on target during repl load operation
> 
>
> Key: HIVE-24783
> URL: https://issues.apache.org/jira/browse/HIVE-24783
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24783) Store currentNotificationID on target during repl load operation

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24783?focusedWorklogId=560163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560163
 ]

ASF GitHub Bot logged work on HIVE-24783:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 22:15
Start Date: 02/Mar/21 22:15
Worklog Time Spent: 10m 
  Work Description: pkumarsinha merged pull request #2005:
URL: https://github.com/apache/hive/pull/2005


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560163)
Time Spent: 1h 20m  (was: 1h 10m)

> Store currentNotificationID on target during repl load operation
> 
>
> Key: HIVE-24783
> URL: https://issues.apache.org/jira/browse/HIVE-24783
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24783) Store currentNotificationID on target during repl load operation

2021-03-02 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293311#comment-17293311
 ] 

Pravin Sinha edited comment on HIVE-24783 at 3/2/21, 10:05 PM:
---

+1


was (Author: pkumarsinha):
+1 Pending test

> Store currentNotificationID on target during repl load operation
> 
>
> Key: HIVE-24783
> URL: https://issues.apache.org/jira/browse/HIVE-24783
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24837) Upgrade httpclient to 4.5.13+

2021-03-02 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-24837.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been committed to master. Closing the jira. Thanks for the contribute 
[~hsnusonic]

> Upgrade httpclient to 4.5.13+
> -
>
> Key: HIVE-24837
> URL: https://issues.apache.org/jira/browse/HIVE-24837
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> Hive is using httpclients 4.5.6. We will need to upgrade httpclient and 
> httpcore.
> {quote}CVSSv2:
>  Base Score: MEDIUM (5.0)
>  Vector: /AV:N/AC:L/Au:N/C:N/I:P/A:N
>  CVSSv3:
>  Base Score: MEDIUM (5.3)
>  Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N
> CVE-2020-13956: Apache HttpClient incorrect handling of malformed
>  authority component in request URIs
> Severity: Medium
> Vendor:
>  The Apache Software Foundation
> Versions Affected:
>  Apache HttpClient 4.5.12 and prior 
>  Apache HttpClient 5.0.2 and prior
> Description:
> Apache HttpClient versions prior to version 4.5.13 and 5.0.3 can
>  misinterpret malformed authority component in request URIs passed to
>  the library as java.net.URI object and pick the wrong target host for
>  request execution.
> Mitigation:
> As of release 4.5.13 and 5.0.3 HttpClient will reject URIs with
>  ambiguous malformed authority component as invalid. Users of HttpClient
>  are advised to upgrade to version 4.5.13 or 5.0.3 and sanitize request
>  URIs when using java.net.URI as input.
> Credit:
>  This issue was discovered and reported by Priyank Nigam
> {quote}
> Reference:
>  * [https://www.openwall.com/lists/oss-security/2020/10/08/4]
>  * [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13956]
>  * [https://nvd.nist.gov/vuln/detail/CVE-2020-13956]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24837) Upgrade httpclient to 4.5.13+

2021-03-02 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-24837:
-
Summary: Upgrade httpclient to 4.5.13+  (was: Upgrade httpclient to 4.5.13+ 
due to CVE-2020-13956)

> Upgrade httpclient to 4.5.13+
> -
>
> Key: HIVE-24837
> URL: https://issues.apache.org/jira/browse/HIVE-24837
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> Hive is using httpclients 4.5.6. We will need to upgrade httpclient and 
> httpcore.
> {quote}CVSSv2:
>  Base Score: MEDIUM (5.0)
>  Vector: /AV:N/AC:L/Au:N/C:N/I:P/A:N
>  CVSSv3:
>  Base Score: MEDIUM (5.3)
>  Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N
> CVE-2020-13956: Apache HttpClient incorrect handling of malformed
>  authority component in request URIs
> Severity: Medium
> Vendor:
>  The Apache Software Foundation
> Versions Affected:
>  Apache HttpClient 4.5.12 and prior 
>  Apache HttpClient 5.0.2 and prior
> Description:
> Apache HttpClient versions prior to version 4.5.13 and 5.0.3 can
>  misinterpret malformed authority component in request URIs passed to
>  the library as java.net.URI object and pick the wrong target host for
>  request execution.
> Mitigation:
> As of release 4.5.13 and 5.0.3 HttpClient will reject URIs with
>  ambiguous malformed authority component as invalid. Users of HttpClient
>  are advised to upgrade to version 4.5.13 or 5.0.3 and sanitize request
>  URIs when using java.net.URI as input.
> Credit:
>  This issue was discovered and reported by Priyank Nigam
> {quote}
> Reference:
>  * [https://www.openwall.com/lists/oss-security/2020/10/08/4]
>  * [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13956]
>  * [https://nvd.nist.gov/vuln/detail/CVE-2020-13956]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24596) Explain ddl for debugging

2021-03-02 Thread Harshit Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293868#comment-17293868
 ] 

Harshit Gupta commented on HIVE-24596:
--

Yeah Sure!!

Let's assume the following [^table_definitions] and the following [^query]. The 
explain ddl output for the query will look like [^explain_ddl_output]

 

 

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Attachments: explain_ddl_output, query, table_definitions
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24596) Explain ddl for debugging

2021-03-02 Thread Harshit Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshit Gupta updated HIVE-24596:
-
Attachment: table_definitions
query
explain_ddl_output

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Attachments: explain_ddl_output, query, table_definitions
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24841) Parallel edge fixer may run into NPE when RS is missing a duplicate column from the output schema

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24841?focusedWorklogId=560025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560025
 ]

ASF GitHub Bot logged work on HIVE-24841:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 16:33
Start Date: 02/Mar/21 16:33
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2035:
URL: https://github.com/apache/hive/pull/2035#discussion_r585719471



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java
##
@@ -256,9 +257,20 @@ private static String extractColumnName(ExprNodeDesc expr) 
throws SemanticExcept
   public static Optional> colMappingInverseKeys(ReduceSinkOperator 
rs) {
 Map ret = new HashMap();
 Map exprMap = rs.getColumnExprMap();
+Set neededColumns = new HashSet();
 try {
   for (Entry e : exprMap.entrySet()) {
-ret.put(extractColumnName(e.getValue()), e.getKey());
+String columnName = extractColumnName(e.getValue());
+if (rs.getSchema().getColumnInfo(e.getKey()) == null) {
+  // ignore incorrectly mapped columns (if there's any) - but require 
its input to be present
+  neededColumns.add(columnName);
+} else {
+  ret.put(columnName, e.getKey());
+}
+  }
+  neededColumns.removeAll(ret.keySet());
+  if (!neededColumns.isEmpty()) {
+throw new SemanticException("There is no way to compute: " + 
neededColumns);

Review comment:
   It would be useful to log the exception in the catch clause at least 
debug level.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 560025)
Time Spent: 20m  (was: 10m)

> Parallel edge fixer may run into NPE when RS is missing a duplicate column 
> from the output schema
> -
>
> Key: HIVE-24841
> URL: https://issues.apache.org/jira/browse/HIVE-24841
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This may mean that the RS has an incorrect schema - but that will be 
> investigated separately



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24346) Store HPL/SQL packages into HMS

2021-03-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman resolved HIVE-24346.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thank you [~amagyar]

> Store HPL/SQL packages into HMS
> ---
>
> Key: HIVE-24346
> URL: https://issues.apache.org/jira/browse/HIVE-24346
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24685) Remove HiveSubQRemoveRelBuilder

2021-03-02 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24685:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Remove HiveSubQRemoveRelBuilder
> ---
>
> Key: HIVE-24685
> URL: https://issues.apache.org/jira/browse/HIVE-24685
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 4.0.0
>
>
> The class seems to be a close clone of {{RelBuilder}} created due to some 
> bugs existing in original implementation. Those issues seem to be fixed now 
> and we should be able to get rid of the copy. In the worst case scenario, if 
> we need to keep it for the time being, we could try to make it extend 
> {{RelBuilder}} and override only necessary methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24823) Fix ide error in BasePartitionEvaluator

2021-03-02 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24823.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Rajesh for reviewing the changes!

> Fix ide error in BasePartitionEvaluator
> ---
>
> Key: HIVE-24823
> URL: https://issues.apache.org/jira/browse/HIVE-24823
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24596) Explain ddl for debugging

2021-03-02 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293773#comment-17293773
 ] 

Zoltan Haindrich commented on HIVE-24596:
-

Yes, reducing the round-trips in problematic cases would be very usefull!

I'm still a bit confused; based on the description I've saw so far I (somehow) 
was expecting sql statements as output...and I kinda still feel like that 
should be the case.
note that in the PR 'explain ddl' outputs seem more like standard explains...so 
I feel like I'm missing basic with the concept.
Maybe it will be easier to understand your idea thru an example:

{code}
$ create table t (a integer);
$ create table t2 (a integer);
$ explain ddl select 1 from t join t2 where t.a=t2.a;
{code}

for the above I would expect to see a "create table t (a integer)" in the 
output...could you give a theoretical transcript?

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24814) Harmonize Hive Date-Time Formats

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24814?focusedWorklogId=559978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559978
 ]

ASF GitHub Bot logged work on HIVE-24814:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 14:15
Start Date: 02/Mar/21 14:15
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2009:
URL: https://github.com/apache/hive/pull/2009#issuecomment-788939546


   @pgaref Thanks for your interest (as always).
   
   The Time parsing/formatting code is all over the place in Hive.  Did you 
know there are some areas of the code that allow for 10 digits of nanos, by 
truncating the last digit, but not others?
   
   I am trying to consolidate all that stuff and bring it into one place for 
visibility and trying to harmonize it with the pre-canned ISO formats already 
included in the JDK.  The less Hive-specific stuff regarding time handling, the 
better.  I also plan on adding copious documentation once I get the unit tests 
all working.  I know some of them are broken.  I'm having a pretty hard time 
figuring out where they are going wrong.
   
   And yes `ISO_LOCAL_TIME` includes an optional NANO field.  This will be 
mentioned in comments.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559978)
Time Spent: 1h 10m  (was: 1h)

> Harmonize Hive Date-Time Formats
> 
>
> Key: HIVE-24814
> URL: https://issues.apache.org/jira/browse/HIVE-24814
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Harmonize Hive on JDK date-time formats courtesy of {{DateTimeFormatter}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24758) Log Tez Task DAG ID, DAG Session ID, HS2 Hostname

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24758?focusedWorklogId=559954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559954
 ]

ASF GitHub Bot logged work on HIVE-24758:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 13:33
Start Date: 02/Mar/21 13:33
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1963:
URL: https://github.com/apache/hive/pull/1963#discussion_r585566958



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -236,6 +239,10 @@ public int execute() {
   throw new HiveException("Operation cancelled");
 }
 
+// Log all the info required to find the various logs for this query
+LOG.info("HS2 Host: [{}], Query ID: [{}], Dag ID: [{}], DAG Session 
ID: [{}]", getHostNameIP(), queryId,

Review comment:
   Hey @belugabehr  -- taking another look here and seems like SessionState 
could do the job.
   What about SessionState.getHiveServer2Host() ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559954)
Time Spent: 1h 40m  (was: 1.5h)

> Log Tez Task DAG ID, DAG Session ID, HS2 Hostname
> -
>
> Key: HIVE-24758
> URL: https://issues.apache.org/jira/browse/HIVE-24758
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In order to get the logs for a particular query, submitted to Tez on YARN, 
> the following pieces of information are required:
> * YARN Application ID
> * TEZ DAG ID
> * HS2 Host that ran the job
> Include this information in TezTask output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24596) Explain ddl for debugging

2021-03-02 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293707#comment-17293707
 ] 

Rajesh Balamohan commented on HIVE-24596:
-

Lot of times, for debugging cbo & other aspects of query plans, users would 
have to provide lot of details like query plans, table details, logs etc.
 Certain times custom dev-jars are shipped to gather additional information, 
and in some cases sample data is also requested from users.

{{explain ddl }} helps in identifying the tables/views of the specific 
query and generates the schema, partitions, stats, views for the specific 
query. This way, dev has to just get this sql output and they can run it in 
their local environment to reproduce the issue (i.e {{explain cbo }} or 
{{explain }} should generate the same result without having real data). 
This is targeted towards easier debugging. I haven't gone through the testcases 
yet in the PR. I believe, it should cover examples and test cases, if not 
present already.

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24839?focusedWorklogId=559944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559944
 ]

ASF GitHub Bot logged work on HIVE-24839:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 13:08
Start Date: 02/Mar/21 13:08
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2034:
URL: https://github.com/apache/hive/pull/2034#discussion_r585549664



##
File path: ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java
##
@@ -174,8 +174,10 @@ public StatEstimator getStatEstimator() {
 }
 
 private Optional getRangeWidth(Range range) {
-  if (range.minValue != null && range.maxValue != null) {
-return Optional.of(range.maxValue.doubleValue() - 
range.minValue.doubleValue());
+  if (range != null) {

Review comment:
   could you please also add the testcase from the jira?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559944)
Time Spent: 20m  (was: 10m)

> SubStrStatEstimator.estimate throws NullPointerException
> 
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue can be reproduced by running the following queries:
> {code:java}
> create table t0 (s string);
> create table t1 (s string, i int);
> insert into t0 select "abc";
> insert into t1 select "abc", 4;
> select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
> {code}
> The select query fails with error:
> {code:java}
> Error: Error while compiling statement: FAILED: NullPointerException null 
> (state=42000,code=4)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24841) Parallel edge fixer may run into NPE when RS is missing a duplicate column from the output schema

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24841:
--
Labels: pull-request-available  (was: )

> Parallel edge fixer may run into NPE when RS is missing a duplicate column 
> from the output schema
> -
>
> Key: HIVE-24841
> URL: https://issues.apache.org/jira/browse/HIVE-24841
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This may mean that the RS has an incorrect schema - but that will be 
> investigated separately



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24841) Parallel edge fixer may run into NPE when RS is missing a duplicate column from the output schema

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24841?focusedWorklogId=559942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559942
 ]

ASF GitHub Bot logged work on HIVE-24841:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 13:06
Start Date: 02/Mar/21 13:06
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #2035:
URL: https://github.com/apache/hive/pull/2035


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559942)
Remaining Estimate: 0h
Time Spent: 10m

> Parallel edge fixer may run into NPE when RS is missing a duplicate column 
> from the output schema
> -
>
> Key: HIVE-24841
> URL: https://issues.apache.org/jira/browse/HIVE-24841
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This may mean that the RS has an incorrect schema - but that will be 
> investigated separately



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24814) Harmonize Hive Date-Time Formats

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24814?focusedWorklogId=559940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559940
 ]

ASF GitHub Bot logged work on HIVE-24814:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 13:06
Start Date: 02/Mar/21 13:06
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #2009:
URL: https://github.com/apache/hive/pull/2009#issuecomment-788895303


   Hey @belugabehr can you please take another look on the .out diffs?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559940)
Time Spent: 1h  (was: 50m)

> Harmonize Hive Date-Time Formats
> 
>
> Key: HIVE-24814
> URL: https://issues.apache.org/jira/browse/HIVE-24814
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Harmonize Hive on JDK date-time formats courtesy of {{DateTimeFormatter}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24814) Harmonize Hive Date-Time Formats

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24814?focusedWorklogId=559939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559939
 ]

ASF GitHub Bot logged work on HIVE-24814:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 13:05
Start Date: 02/Mar/21 13:05
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2009:
URL: https://github.com/apache/hive/pull/2009#discussion_r585545605



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToString.java
##
@@ -28,20 +28,12 @@
 import java.time.ZoneOffset;
 import java.time.format.DateTimeFormatter;
 import java.time.format.DateTimeFormatterBuilder;
-import java.time.temporal.ChronoField;
 
 public class CastTimestampToString extends TimestampToStringUnaryUDF {
   private static final long serialVersionUID = 1L;
-  private static final DateTimeFormatter PRINT_FORMATTER;
-
-  static {
-DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder();
-// Date and time parts
-builder.append(DateTimeFormatter.ofPattern("-MM-dd HH:mm:ss"));
-// Fractional part
-builder.optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd();
-PRINT_FORMATTER = builder.toFormatter();
-  }
+  private static final DateTimeFormatter PRINT_FORMATTER =
+  new 
DateTimeFormatterBuilder().append(DateTimeFormatter.ISO_LOCAL_DATE).appendLiteral('
 ')
+  .append(DateTimeFormatter.ISO_LOCAL_TIME).toFormatter();

Review comment:
   is optional Nanotime included in ISO_LOCAL_TIME ?  Looks likes its 
already there: ``.appendFraction(NANO_OF_SECOND, 0, 9, true)``





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559939)
Time Spent: 50m  (was: 40m)

> Harmonize Hive Date-Time Formats
> 
>
> Key: HIVE-24814
> URL: https://issues.apache.org/jira/browse/HIVE-24814
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Harmonize Hive on JDK date-time formats courtesy of {{DateTimeFormatter}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24596) Explain ddl for debugging

2021-03-02 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293684#comment-17293684
 ] 

Zoltan Haindrich commented on HIVE-24596:
-

[~harshit.gupta] or [~rajesh.balamohan]: Could you please give a sample usage 
to this feature?

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24814) Harmonize Hive Date-Time Formats

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24814?focusedWorklogId=559938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559938
 ]

ASF GitHub Bot logged work on HIVE-24814:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 13:03
Start Date: 02/Mar/21 13:03
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2009:
URL: https://github.com/apache/hive/pull/2009#discussion_r585545605



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToString.java
##
@@ -28,20 +28,12 @@
 import java.time.ZoneOffset;
 import java.time.format.DateTimeFormatter;
 import java.time.format.DateTimeFormatterBuilder;
-import java.time.temporal.ChronoField;
 
 public class CastTimestampToString extends TimestampToStringUnaryUDF {
   private static final long serialVersionUID = 1L;
-  private static final DateTimeFormatter PRINT_FORMATTER;
-
-  static {
-DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder();
-// Date and time parts
-builder.append(DateTimeFormatter.ofPattern("-MM-dd HH:mm:ss"));
-// Fractional part
-builder.optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, 
true).optionalEnd();
-PRINT_FORMATTER = builder.toFormatter();
-  }
+  private static final DateTimeFormatter PRINT_FORMATTER =
+  new 
DateTimeFormatterBuilder().append(DateTimeFormatter.ISO_LOCAL_DATE).appendLiteral('
 ')
+  .append(DateTimeFormatter.ISO_LOCAL_TIME).toFormatter();

Review comment:
   is optional Nanotime included in ISO_LOCAL_TIME ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559938)
Time Spent: 40m  (was: 0.5h)

> Harmonize Hive Date-Time Formats
> 
>
> Key: HIVE-24814
> URL: https://issues.apache.org/jira/browse/HIVE-24814
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Harmonize Hive on JDK date-time formats courtesy of {{DateTimeFormatter}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

2021-03-02 Thread Robbie Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293681#comment-17293681
 ] 

Robbie Zhang commented on HIVE-24839:
-

This bug can be worked around by setting hive.stats.estimators.enable to false.

> SubStrStatEstimator.estimate throws NullPointerException
> 
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue can be reproduced by running the following queries:
> {code:java}
> create table t0 (s string);
> create table t1 (s string, i int);
> insert into t0 select "abc";
> insert into t1 select "abc", 4;
> select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
> {code}
> The select query fails with error:
> {code:java}
> Error: Error while compiling statement: FAILED: NullPointerException null 
> (state=42000,code=4)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24839:
--
Labels: pull-request-available  (was: )

> SubStrStatEstimator.estimate throws NullPointerException
> 
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue can be reproduced by running the following queries:
> {code:java}
> create table t0 (s string);
> create table t1 (s string, i int);
> insert into t0 select "abc";
> insert into t1 select "abc", 4;
> select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
> {code}
> The select query fails with error:
> {code:java}
> Error: Error while compiling statement: FAILED: NullPointerException null 
> (state=42000,code=4)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24839?focusedWorklogId=559936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559936
 ]

ASF GitHub Bot logged work on HIVE-24839:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 13:00
Start Date: 02/Mar/21 13:00
Worklog Time Spent: 10m 
  Work Description: ujc714 opened a new pull request #2034:
URL: https://github.com/apache/hive/pull/2034


   ### What changes were proposed in this pull request?
   It fixes a bug in UDFSubstr.SubStrStatEstimator. 
   
   ### Why are the changes needed?
   The method getRangeWidth didn't check if range is null before it references 
the properties of range. When Hive estimates the stats on a substr function 
with a child UDF, the compilation might fail due to NullPointerException.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Start MiniHS2Cluster then run the following queries manually:
   ```
   create table t0 (s string);
   create table t1 (s string, i int);
   insert into t0 select "abc";
   insert into t1 select "abc", 4;
   select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559936)
Remaining Estimate: 0h
Time Spent: 10m

> SubStrStatEstimator.estimate throws NullPointerException
> 
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue can be reproduced by running the following queries:
> {code:java}
> create table t0 (s string);
> create table t1 (s string, i int);
> insert into t0 select "abc";
> insert into t1 select "abc", 4;
> select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
> {code}
> The select query fails with error:
> {code:java}
> Error: Error while compiling statement: FAILED: NullPointerException null 
> (state=42000,code=4)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24596) Explain ddl for debugging

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24596?focusedWorklogId=559933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559933
 ]

ASF GitHub Bot logged work on HIVE-24596:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 12:53
Start Date: 02/Mar/21 12:53
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2033:
URL: https://github.com/apache/hive/pull/2033#discussion_r585539220



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
##
@@ -541,12 +606,12 @@ JSONObject collectAuthRelatedEntities(PrintStream out, 
ExplainWork work)
 if (delegate != null) {
   Class itface = SessionState.get().getAuthorizerInterface();
   Object authorizer = AuthorizationFactory.create(delegate, itface,
-  new AuthorizationFactory.AuthorizationExceptionHandler() {
-@Override
-public void exception(Exception exception) {
-  exceptions.add(exception.getMessage());
-}
-  });
+  new 
AuthorizationFactory.AuthorizationExceptionHandler() {
+@Override

Review comment:
   there are lots of pure indentation changes in this patch - are we using 
the same formatter settings?
   `dev-support/eclipse-styles.xml`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559933)
Time Spent: 20m  (was: 10m)

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

2021-03-02 Thread Robbie Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293674#comment-17293674
 ] 

Robbie Zhang commented on HIVE-24839:
-

We can see such backtrace in HS2 log file:
{code:java}
java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.getRangeWidth(UDFSubstr.java:177)
        at 
org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.estimate(UDFSubstr.java:156)
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExpression(StatsUtils.java:1576)
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExprMap(StatsUtils.java:1435)
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$SelectStatsRule.process(StatsRulesProcFactory.java:197)
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
        at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:447)
        at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:185)
        at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:158)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12823)
        at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:422)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
        at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221)
        at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:598)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:544)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:538)
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
        at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
        at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
        at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:274)
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565)
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551)
        at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567)
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}
 The expression "substr(t0.s, t1.i-1)" has a nested function. The second 
parameter of substr is actually GenericUDFOPMinus. The ColStatistics on it 
doesn't have a valid range. But getRangeWidth doesn't check it:
{code:java}
    private Optional getRangeWidth(Range range) {
      if (range.minValue != null && range.maxValue != null) {
        return Optional.of(range.maxValue.doubleValue() - 
range.minValue.doubleValue());
      }
      return Optional.empty();
    }
{code}
Only 4 UDF classes implement StatEstimatorProvider and only UDFSubstr has this 
bug.

> SubStrStatEstimator.estimate throws NullPointerException
> 
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
>  

[jira] [Updated] (HIVE-24596) Explain ddl for debugging

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24596:
--
Labels: pull-request-available  (was: )

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24596) Explain ddl for debugging

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24596?focusedWorklogId=559927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559927
 ]

ASF GitHub Bot logged work on HIVE-24596:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 12:31
Start Date: 02/Mar/21 12:31
Worklog Time Spent: 10m 
  Work Description: HarshitGupta11 opened a new pull request #2033:
URL: https://github.com/apache/hive/pull/2033


   https://issues.apache.org/jira/browse/HIVE-24596
   For debugging query issues, basic details like table schema, statistics, 
partition details, query plans are needed.
   It would be good to have "explain ddl" support, which can generate these 
details. This can help in recreating the schema and planner issues without 
sample data.
   
   ### What changes were proposed in this pull request?
   Added "explain ddl " option which will emit all the DDL plans for the 
given query.
   
   
   ### Why are the changes needed?
   For Improving the debugging Process in clusters.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   The patch was tested on the local cluster.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559927)
Remaining Estimate: 0h
Time Spent: 10m

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24841) Parallel edge fixer may run into NPE when RS is missing a duplicate column from the output schema

2021-03-02 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24841:
---


> Parallel edge fixer may run into NPE when RS is missing a duplicate column 
> from the output schema
> -
>
> Key: HIVE-24841
> URL: https://issues.apache.org/jira/browse/HIVE-24841
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> This may mean that the RS has an incorrect schema - but that will be 
> investigated separately



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24840) Materialized View incremental rebuild produces wrong result set after compaction

2021-03-02 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-24840:
-


> Materialized View incremental rebuild produces wrong result set after 
> compaction
> 
>
> Key: HIVE-24840
> URL: https://issues.apache.org/jira/browse/HIVE-24840
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Critical
>
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as 
> select a,b,c from t1 where a > 0 or a is null;
> delete from t1 where a = 1;
> alter table t1 compact 'major';
> -- Wait until compaction finished.
> alter materialized view mat1 rebuild;
> {code}
> Expected result of query
> {code}
> select * from mat1;
> {code}
> {code}
> 2 two 2
> NULL NULL NULL
> {code}
> but if incremental rebuild is enabled the result is
> {code}
> 1 one 1
> 2 two 2
> NULL NULL NULL
> {code}
> Cause: Incremental rebuild queries whether the source tables of a 
> materialized view has delete or update transaction since the last rebuild 
> from metastore from COMPLETED_TXN_COMPONENTS table. However when a major 
> compaction is performed on the source tables the records related to these 
> tables are deleted from COMPLETED_TXN_COMPONENTS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24827) Hive aggregation query returns incorrect results for non text files

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24827?focusedWorklogId=559914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559914
 ]

ASF GitHub Bot logged work on HIVE-24827:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 12:14
Start Date: 02/Mar/21 12:14
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2018:
URL: https://github.com/apache/hive/pull/2018#discussion_r585508226



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##
@@ -4028,6 +4030,8 @@ public static int getFooterCount(TableDesc table, JobConf 
job) throws IOExceptio
 int footerCount;
 try {
   footerCount = 
Integer.parseInt(table.getProperties().getProperty(serdeConstants.FOOTER_COUNT, 
"0"));
+  footerCount =
+  validateHeaderFooter(table, footerCount, "skip.footer.line.count");

Review comment:
   since `FOOTER_COUNT = "skip.footer.line.count"` ; I think you could also 
push in this `Integer.parseInt` into you method as well





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559914)
Time Spent: 40m  (was: 0.5h)

> Hive aggregation query returns incorrect results for non text files
> ---
>
> Key: HIVE-24827
> URL: https://issues.apache.org/jira/browse/HIVE-24827
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When header & footer are configured for non-text files, the aggregation query 
> returns wrong result.
> Propose to ignore this property for non-text files



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24723) Use ExecutorService in TezSessionPool

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24723?focusedWorklogId=559892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559892
 ]

ASF GitHub Bot logged work on HIVE-24723:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 12:12
Start Date: 02/Mar/21 12:12
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1939:
URL: https://github.com/apache/hive/pull/1939#issuecomment-788782335


   Thanks for the PR @belugabehr and the time to polish this! :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559892)
Time Spent: 3h 20m  (was: 3h 10m)

> Use ExecutorService in TezSessionPool
> -
>
> Key: HIVE-24723
> URL: https://issues.apache.org/jira/browse/HIVE-24723
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently there are some wonky home-made thread pooling action going on in 
> {{TezSessionPool}}.  Replace it with some JDK/Guava goodness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24835) Replace HiveSubQueryFinder with RexUtil.SubQueryFinder

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24835?focusedWorklogId=559879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559879
 ]

ASF GitHub Bot logged work on HIVE-24835:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 11:53
Start Date: 02/Mar/21 11:53
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2026:
URL: https://github.com/apache/hive/pull/2026


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 559879)
Time Spent: 20m  (was: 10m)

> Replace HiveSubQueryFinder with RexUtil.SubQueryFinder
> --
>
> Key: HIVE-24835
> URL: https://issues.apache.org/jira/browse/HIVE-24835
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HiveSubQueryFinder has been copied from RexUtil::SubQueryFinder due to 
> CALCITE-1726. Currently, Hive is in calcite-1.21.0 and this bug is resolved 
> so the duplicated code can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24817) "not in" clause returns incorrect data when there is coercion

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24817?focusedWorklogId=559871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559871
 ]

ASF GitHub Bot logged work on HIVE-24817:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 11:50
Start Date: 02/Mar/21 11:50
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #2027:
URL: https://github.com/apache/hive/pull/2027#discussion_r585391616



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##
@@ -1007,17 +1001,12 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node,
 T columnDesc = children.get(0);
 T valueDesc = interpretNode(columnDesc, children.get(i));
 if (valueDesc == null) {
-  if (hasNullValue) {
-// Skip if null value has already been added
-continue;
-  }
-  TypeInfo targetType = exprFactory.getTypeInfo(columnDesc);
+  // Keep original
+  TypeInfo targetType = exprFactory.getTypeInfo(children.get(i));
   if (!expressions.containsKey(targetType)) {
 expressions.put(targetType, columnDesc);
   }
-  T nullConst = exprFactory.createConstantExpr(targetType, null);
-  expressions.put(targetType, nullConst);
-  hasNullValue = true;
+  expressions.put(targetType, children.get(i));
 } else {

Review comment:
   I was going thru here and there and I think there might be another way 
around this problem which could retain this optimization as well:
   * introduce a new `NOT` operator: which can be controlled to return 
true/false in case of null values
   * in case of filter expressions start using the new not operator; and switch 
mode below every `NOT` operator
   
   but this feels like a more complicated change - we should only do it if we 
loose important optimizations

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##
@@ -1007,17 +1001,12 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node,
 T columnDesc = children.get(0);
 T valueDesc = interpretNode(columnDesc, children.get(i));
 if (valueDesc == null) {
-  if (hasNullValue) {
-// Skip if null value has already been added
-continue;
-  }
-  TypeInfo targetType = exprFactory.getTypeInfo(columnDesc);
+  // Keep original
+  TypeInfo targetType = exprFactory.getTypeInfo(children.get(i));
   if (!expressions.containsKey(targetType)) {
 expressions.put(targetType, columnDesc);
   }
-  T nullConst = exprFactory.createConstantExpr(targetType, null);
-  expressions.put(targetType, nullConst);
-  hasNullValue = true;
+  expressions.put(targetType, children.get(i));
 } else {
   TypeInfo targetType = exprFactory.getTypeInfo(valueDesc);
   if (!expressions.containsKey(targetType)) {

Review comment:
   this if statement has no effect - the map value will be overwritten 
anyway ; I wonder if we have a bug here

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##
@@ -1007,17 +1001,12 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node,
 T columnDesc = children.get(0);
 T valueDesc = interpretNode(columnDesc, children.get(i));
 if (valueDesc == null) {
-  if (hasNullValue) {
-// Skip if null value has already been added
-continue;
-  }
-  TypeInfo targetType = exprFactory.getTypeInfo(columnDesc);
+  // Keep original
+  TypeInfo targetType = exprFactory.getTypeInfo(children.get(i));
   if (!expressions.containsKey(targetType)) {
 expressions.put(targetType, columnDesc);
   }
-  T nullConst = exprFactory.createConstantExpr(targetType, null);
-  expressions.put(targetType, nullConst);
-  hasNullValue = true;
+  expressions.put(targetType, children.get(i));

Review comment:
   for `IN` the original logic is valid as long as it's in 
`UnknownAs.FALSE` mode...but for `NOT IN` the correct interpretation would be 
`UnknownAs.TRUE`.
   
   I think we might be better off not coping with the `UnknownAs` devils here - 
and retain the original expressions as in the current proposed patch; I'm not 
sure how much optimization opportunities/performance we will loose that way.
   
   

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##
@@ -1007,17 

[jira] [Resolved] (HIVE-24835) Replace HiveSubQueryFinder with RexUtil.SubQueryFinder

2021-03-02 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-24835.
---
Resolution: Fixed

Pushed to master. Thanks [~zabetak].

> Replace HiveSubQueryFinder with RexUtil.SubQueryFinder
> --
>
> Key: HIVE-24835
> URL: https://issues.apache.org/jira/browse/HIVE-24835
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HiveSubQueryFinder has been copied from RexUtil::SubQueryFinder due to 
> CALCITE-1726. Currently, Hive is in calcite-1.21.0 and this bug is resolved 
> so the duplicated code can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

2021-03-02 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-24839:
---


> SubStrStatEstimator.estimate throws NullPointerException
> 
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>
> This issue can be reproduced by running the following queries:
> {code:java}
> create table t0 (s string);
> create table t1 (s string, i int);
> insert into t0 select "abc";
> insert into t1 select "abc", 4;
> select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
> {code}
> The select query fails with error:
> {code:java}
> Error: Error while compiling statement: FAILED: NullPointerException null 
> (state=42000,code=4)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24343) Table partition operations (create, drop, select) fail when the number of partitions is greater than 32767 (signed int)

2021-03-02 Thread Narayanan Venkateswaran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293600#comment-17293600
 ] 

Narayanan Venkateswaran commented on HIVE-24343:


Please note that although this issue exists presently in hive, this issue is 
exposed through repair done in the partition management task thread and is 
fixed with the backports of the following JIRAs,

 
 * HIVE-23111
 * HIVE-23851
 * HIVE-24584

 

 

> Table partition operations (create, drop, select) fail when the number of 
> partitions is greater than 32767 (signed int)
> ---
>
> Key: HIVE-24343
> URL: https://issues.apache.org/jira/browse/HIVE-24343
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The table partition operations - create, drop, select access the underlying 
> relation database using JDO, which internally routes the operations through 
> the JDBC driver. Most of the underlying JDBC driver implementations place a 
> limit on the number of parameters that can be passed through a statement 
> implementation. The limitations are as follows,
> postgreSQL - 32767
> (https://www.postgresql.org/message-id/16832734.post%40talk.nabble.com)
> MySQL - 32767 - 2 Byte Integer - num of params
> (https://dev.mysql.com/doc/internals/en/com-stmt-prepare-response.html#packet-COM_STMT_PREPARE_OK)
> Oracle - 32767 -
> https://www.jooq.org/doc/3.12/manual/sql-building/dsl-context/custom-settings/settings-inline-threshold/
> Derby - 32767 - stored in an unsinged integer - Note the Prepared
> Statement implementation here -
> [https://svn.apache.org/repos/asf/db/derby/code/branches/10.1/java/client/org/apache/derby/client/am/PreparedStatement.java]
>  
> These limits should be taken into account when querying the underlying 
> metastore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24718) Moving to file based iteration for copying data

2021-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24718?focusedWorklogId=559852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559852
 ]

ASF GitHub Bot logged work on HIVE-24718:
-

Author: ASF GitHub Bot
Created on: 02/Mar/21 08:02
Start Date: 02/Mar/21 08:02
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1936:
URL: https://github.com/apache/hive/pull/1936#discussion_r585302812



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -1836,6 +1837,64 @@ public void testHdfsNameserviceWithDataCopy() throws 
Throwable {
 .verifyResults(new String[]{"2", "3"});
   }
 
+  @Test
+  public void testReplWithRetryDisabledIterators() throws Throwable {
+List clause = new ArrayList<>();
+//NS replacement parameters has no effect when data is also copied to 
staging
+clause.add("'" + HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET + 
"'='false'");
+clause.add("'" + HiveConf.ConfVars.REPL_COPY_ITERATOR_RETRY + "'='false'");
+WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName)
+.run("create table  acid_table (key int, value int) partitioned by 
(load_date date) " +
+"clustered by(key) into 2 buckets stored as orc 
tblproperties ('transactional'='true')")
+.run("create table table1 (i String)")
+.run("insert into table1 values (1)")
+.run("insert into table1 values (2)")
+.dump(primaryDbName, clause);
+assertFalseExternalFileList(new Path(new Path(tuple.dumpLocation,

Review comment:
   nit: If you just pass dumpLocation to the method, and do the path 
creation inside the method, this would look clean. Anyway the method 
assertFalseExternalFileList isn't doing much. So, alternatively, you can do the 
fs.exist() write there and get rid of method.

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -653,6 +649,8 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
   new TimeValidator(TimeUnit.HOURS),
   "Total allowed retry duration in hours inclusive of all retries. Once 
this is exhausted, " +
 "the policy instance will be marked as failed and will need manual 
intervention to restart."),
+REPL_COPY_ITERATOR_RETRY("hive.repl.copy.iterator.retry", true,

Review comment:
   REPL_COPY_FILE_LIST_ITERATOR_RETRY ?

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/repl/util/TestFileList.java
##
@@ -18,147 +18,266 @@
 
 package org.apache.hadoop.hive.ql.exec.repl.util;
 
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.util.Retryable;
+import org.junit.Assert;
 import org.junit.Test;
 import org.junit.runner.RunWith;
 import org.mockito.ArgumentCaptor;
-import org.mockito.Mock;
 import org.mockito.Mockito;
+import org.mockito.junit.MockitoJUnitRunner;
 import org.powermock.core.classloader.annotations.PrepareForTest;
-import org.powermock.modules.junit4.PowerMockRunner;
 import org.slf4j.LoggerFactory;
 
-import java.io.BufferedWriter;
+import java.io.File;
+import java.io.IOException;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
-import java.util.concurrent.LinkedBlockingQueue;
 import java.util.concurrent.TimeUnit;
 
-import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertTrue;
-
-
 /**
  * Tests the File List implementation.
  */
 
-@RunWith(PowerMockRunner.class)
+@RunWith(MockitoJUnitRunner.class)
 @PrepareForTest({LoggerFactory.class})
 public class TestFileList {
 
-  @Mock
-  private BufferedWriter bufferedWriter;
-
-
-  @Test
-  public void testNoStreaming() throws Exception {
-Object tuple[] =  setupAndGetTuple(100, false);
-FileList fileList = (FileList) tuple[0];
-FileListStreamer fileListStreamer = (FileListStreamer) tuple[1];
-fileList.add("Entry1");
-fileList.add("Entry2");
-assertFalse(isStreamingToFile(fileListStreamer));
-  }
+  HiveConf conf = new HiveConf();
+  private FSDataOutputStream outStream;
+  private FSDataOutputStream testFileStream;
+  final String TEST_DATA_DIR = new File(System.getProperty("java.io.tmpdir") +
+  File.separator + TestFileList.class.getCanonicalName() + "-" + 
System.currentTimeMillis()
+  ).getPath().replaceAll("", "/");
+  private Exception testException = new IOException("test");
 
   @Test
-  public void testAlwaysStreaming() throws Exception {
-Object tuple[] =  setupAndGetTuple(100, true);
-