date:20220617

[jira] [Work logged] (HIVE-26291) Ranger client file descriptor leak

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26291?focusedWorklogId=782565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782565
 ]

ASF GitHub Bot logged work on HIVE-26291:
-

Author: ASF GitHub Bot
Created on: 18/Jun/22 02:56
Start Date: 18/Jun/22 02:56
Worklog Time Spent: 10m 
  Work Description: adrian-wang commented on code in PR #3345:
URL: https://github.com/apache/hive/pull/3345#discussion_r900688205


##
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ranger/RangerRestClientImpl.java:
##
@@ -428,10 +428,7 @@ public RangerExportPolicyList 
readRangerPoliciesFromJsonFile(Path filePath,
HiveConf conf) 
throws SemanticException {
 RangerExportPolicyList rangerExportPolicyList = null;
 Gson gsonBuilder = new 
GsonBuilder().setDateFormat("MMdd-HH:mm:ss.SSS-Z").setPrettyPrinting().create();
-try {
-  FileSystem fs = filePath.getFileSystem(conf);
-  InputStream inputStream = fs.open(filePath);
-  Reader reader = new InputStreamReader(inputStream, 
Charset.forName("UTF-8"));
+try (Reader reader = new 
InputStreamReader(filePath.getFileSystem(conf).open(filePath), 
Charset.forName("UTF-8"))) {

Review Comment:
   @slachiewicz Thanks for your review, I've updated the patch





Issue Time Tracking
---

Worklog Id: (was: 782565)
Time Spent: 50m  (was: 40m)

> Ranger client file descriptor leak
> --
>
> Key: HIVE-26291
> URL: https://issues.apache.org/jira/browse/HIVE-26291
> Project: Hive
>  Issue Type: Improvement
>Reporter: Adrian Wang
>Assignee: Adrian Wang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ranger Client has an fd leak



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26123?focusedWorklogId=782551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782551
 ]

ASF GitHub Bot logged work on HIVE-26123:
-

Author: ASF GitHub Bot
Created on: 18/Jun/22 00:21
Start Date: 18/Jun/22 00:21
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #3196: 
HIVE-26123: Introduce test coverage for sysdb for the different metas…
URL: https://github.com/apache/hive/pull/3196




Issue Time Tracking
---

Worklog Id: (was: 782551)
Time Spent: 1h 20m  (was: 1h 10m)

> Introduce test coverage for sysdb for the different metastores
> --
>
> Key: HIVE-26123
> URL: https://issues.apache.org/jira/browse/HIVE-26123
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> _sydb_ exposes (some of) the metastore tables from Hive via JDBC queries. 
> Existing tests are running only against Derby, meaning that any change 
> against sysdb query mapping is not covered by CI.
> The present ticket aims at bridging this gap by introducing test coverage for 
> the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26130) Incorrect matching of external table when validating NOT NULL constraints

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26130?focusedWorklogId=782550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782550
 ]

ASF GitHub Bot logged work on HIVE-26130:
-

Author: ASF GitHub Bot
Created on: 18/Jun/22 00:21
Start Date: 18/Jun/22 00:21
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #3199:
URL: https://github.com/apache/hive/pull/3199#issuecomment-1159316220

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 782550)
Time Spent: 50m  (was: 40m)

> Incorrect matching of external table when validating NOT NULL constraints
> -
>
> Key: HIVE-26130
> URL: https://issues.apache.org/jira/browse/HIVE-26130
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> _AbstractAlterTablePropertiesAnalyzer.validate_ uses incorrect external table 
> judgment statement:
> {code:java}
> else if (entry.getKey().equals("external") && entry.getValue().equals("true") 
> {code}
> In current hive code, we use hive tblproperties('EXTERNAL'='true' or 
> 'EXTERNAL'='TRUE) to validate external table.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26079) Upgrade protobuf to 3.16.1

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26079?focusedWorklogId=782552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782552
 ]

ASF GitHub Bot logged work on HIVE-26079:
-

Author: ASF GitHub Bot
Created on: 18/Jun/22 00:21
Start Date: 18/Jun/22 00:21
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #3144: 
HIVE-26079: Upgrade protobuf to 3.16.1
URL: https://github.com/apache/hive/pull/3144




Issue Time Tracking
---

Worklog Id: (was: 782552)
Time Spent: 1h 20m  (was: 1h 10m)

> Upgrade protobuf to 3.16.1
> --
>
> Key: HIVE-26079
> URL: https://issues.apache.org/jira/browse/HIVE-26079
> Project: Hive
>  Issue Type: Task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Upgrade com.google.protobuf:protobuf-java from 2.5.0 to 3.16.1 to fix 
> CVE-2021-22569



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25996) Backport HIVE-25098

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25996?focusedWorklogId=782445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782445
 ]

ASF GitHub Bot logged work on HIVE-25996:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 15:51
Start Date: 17/Jun/22 15:51
Worklog Time Spent: 10m 
  Work Description: pan3793 commented on PR #3066:
URL: https://github.com/apache/hive/pull/3066#issuecomment-1159011704

   Thanks to the work done by @wangyum, Hive 2.3 is widely used in the industry 
and adopted by many downstream projects such as Apache Spark. A security patch 
release is greatly appreciated.




Issue Time Tracking
---

Worklog Id: (was: 782445)
Time Spent: 50m  (was: 40m)

> Backport HIVE-25098
> ---
>
> Key: HIVE-25996
> URL: https://issues.apache.org/jira/browse/HIVE-25996
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.9
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25787) Prevent duplicate paths in the fileList while adding an entry to NotifcationLog

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25787?focusedWorklogId=782443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782443
 ]

ASF GitHub Bot logged work on HIVE-25787:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 15:45
Start Date: 17/Jun/22 15:45
Worklog Time Spent: 10m 
  Work Description: hmangla98 closed pull request #3170: HIVE-25787: 
Prevent duplicate paths in the fileList while adding an entry to NotifcationLog
URL: https://github.com/apache/hive/pull/3170




Issue Time Tracking
---

Worklog Id: (was: 782443)
Time Spent: 50m  (was: 40m)

> Prevent duplicate paths in the fileList while adding an entry to 
> NotifcationLog
> ---
>
> Key: HIVE-25787
> URL: https://issues.apache.org/jira/browse/HIVE-25787
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As of now, while adding entries to notification logs, in case of retries, 
> sometimes the same path gets added to the notification log entry, which 
> during replication leads to failures during copy.
> Avoid having same path more than once for single transaction.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-24884) Move top level dump metadata content to be in JSON format

2022-06-17 Thread Haymant Mangla (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla updated HIVE-24884:
--
Assignee: Haymant Mangla  (was: Pravin Sinha)
  Status: Open  (was: Patch Available)

> Move top level dump metadata content to be in JSON format
> -
>
> Key: HIVE-24884
> URL: https://issues.apache.org/jira/browse/HIVE-24884
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {color:#172b4d}The current content for _dumpmetadata file is TAB separated. 
> This is not very flexible for extension. A more flexible format like JSON 
> based content would be helpful for extending the content.{color}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-24884) Move top level dump metadata content to be in JSON format

2022-06-17 Thread Haymant Mangla (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla resolved HIVE-24884.
---
Resolution: Won't Fix

> Move top level dump metadata content to be in JSON format
> -
>
> Key: HIVE-24884
> URL: https://issues.apache.org/jira/browse/HIVE-24884
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {color:#172b4d}The current content for _dumpmetadata file is TAB separated. 
> This is not very flexible for extension. A more flexible format like JSON 
> based content would be helpful for extending the content.{color}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-24884) Move top level dump metadata content to be in JSON format

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24884?focusedWorklogId=782430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782430
 ]

ASF GitHub Bot logged work on HIVE-24884:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 15:21
Start Date: 17/Jun/22 15:21
Worklog Time Spent: 10m 
  Work Description: hmangla98 closed pull request #3293: HIVE-24884: Move 
top level dump metadata content to be in JSON format
URL: https://github.com/apache/hive/pull/3293




Issue Time Tracking
---

Worklog Id: (was: 782430)
Time Spent: 1h 50m  (was: 1h 40m)

> Move top level dump metadata content to be in JSON format
> -
>
> Key: HIVE-24884
> URL: https://issues.apache.org/jira/browse/HIVE-24884
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {color:#172b4d}The current content for _dumpmetadata file is TAB separated. 
> This is not very flexible for extension. A more flexible format like JSON 
> based content would be helpful for extending the content.{color}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25996) Backport HIVE-25098

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25996?focusedWorklogId=782410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782410
 ]

ASF GitHub Bot logged work on HIVE-25996:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 14:05
Start Date: 17/Jun/22 14:05
Worklog Time Spent: 10m 
  Work Description: wangyum commented on PR #3066:
URL: https://github.com/apache/hive/pull/3066#issuecomment-1158905682

   cc @sunchao 




Issue Time Tracking
---

Worklog Id: (was: 782410)
Time Spent: 40m  (was: 0.5h)

> Backport HIVE-25098
> ---
>
> Key: HIVE-25996
> URL: https://issues.apache.org/jira/browse/HIVE-25996
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.9
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26336) Hive JDBC Driver should respect JDBC DriverManager#loginTimeout

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26336?focusedWorklogId=782370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782370
 ]

ASF GitHub Bot logged work on HIVE-26336:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 11:29
Start Date: 17/Jun/22 11:29
Worklog Time Spent: 10m 
  Work Description: pan3793 commented on PR #3379:
URL: https://github.com/apache/hive/pull/3379#issuecomment-1158779054

   @prasanthj @pvary would you please take a look?




Issue Time Tracking
---

Worklog Id: (was: 782370)
Time Spent: 20m  (was: 10m)

> Hive JDBC Driver should respect JDBC DriverManager#loginTimeout
> ---
>
> Key: HIVE-26336
> URL: https://issues.apache.org/jira/browse/HIVE-26336
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 4.0.0-alpha-1
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Before HIVE-12371, the Hive JDBC Driver uses DriverManager#loginTimeout as 
> both connectTimeout and socketTimeout, which usually cause socket timeout 
> exceptions for users who use Hive JDBC Driver in Spring Boot project, because 
> Spring Boot will setLoginTimeout to 30s (default values).
> HIVE-12371 introduced a new parameter socketTimeout, and does not care about 
> DriverManager#loginTimeout anymore, I think it's not a correct solution.
> I think the for loginTimeout, prefer to use loginTimeout (in milliseconds) 
> from jdbc connection url, and fallback to use DriverManger#getLoginTimeout 
> (in seconds).
> For socketTimeout, use socketTimeout (in milliseconds) from jdbc connection 
> url if present.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.

2022-06-17 Thread GuangMing Lu (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-20607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1745#comment-1745
 ] 

GuangMing Lu commented on HIVE-20607:
-

Hi [~sankarh]  [~kgyrtkirk],  Do you know Hive's EOL schedule?

> TxnHandler should use PreparedStatement to execute direct SQL queries.
> --
>
> Key: HIVE-20607
> URL: https://issues.apache.org/jira/browse/HIVE-20607
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore, Transactions
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, pull-request-available
> Fix For: 3.2.0, 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-20607.01-branch-3.patch, HIVE-20607.01.patch
>
>
> TxnHandler uses direct SQL queries to operate on Txn related databases/tables 
> in Hive metastore RDBMS.
> Most of the methods are direct calls from Metastore api which should be 
> directly append input string arguments to the SQL string.
> Need to use parameterised PreparedStatement object to set these arguments.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-26340) Vectorized PTF operator fails if query has upper case window function

2022-06-17 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-26340.
---
Resolution: Fixed

Pushed to master! Thanks [~abstractdog] for review.


> Vectorized PTF operator fails if query has upper case window function
> -
>
> Key: HIVE-26340
> URL: https://issues.apache.org/jira/browse/HIVE-26340
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> SELECT ROW_NUMBER() OVER(order by age) AS rn FROM studentnull100;
> {code}
> {code}
> 2022-06-16T14:18:57,728 ERROR [pool-4-thread-1] jdbc.TestDriver: Error while 
> compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 7, vertexId=vertex_1655217967697_0062_1_01, diagnostics=[Task failed, 
> taskId=task_1655217967697_0062_1_01_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1655217967697_0062_1_01_00_0:java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:298)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:252)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluator(VectorPTFDesc.java:165)
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluators(VectorPTFDesc.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.initializeOp(VectorPTFOperator.java:317)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:523)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:384)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:268)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26340) Vectorized PTF operator fails if query has upper case window function

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26340?focusedWorklogId=782363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782363
 ]

ASF GitHub Bot logged work on HIVE-26340:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 10:58
Start Date: 17/Jun/22 10:58
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #3382:
URL: https://github.com/apache/hive/pull/3382




Issue Time Tracking
---

Worklog Id: (was: 782363)
Time Spent: 20m  (was: 10m)

> Vectorized PTF operator fails if query has upper case window function
> -
>
> Key: HIVE-26340
> URL: https://issues.apache.org/jira/browse/HIVE-26340
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> SELECT ROW_NUMBER() OVER(order by age) AS rn FROM studentnull100;
> {code}
> {code}
> 2022-06-16T14:18:57,728 ERROR [pool-4-thread-1] jdbc.TestDriver: Error while 
> compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 7, vertexId=vertex_1655217967697_0062_1_01, diagnostics=[Task failed, 
> taskId=task_1655217967697_0062_1_01_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1655217967697_0062_1_01_00_0:java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:298)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:252)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluator(VectorPTFDesc.java:165)
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluators(VectorPTFDesc.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.initializeOp(VectorPTFOperator.java:317)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:523)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:384)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:268)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26340) Vectorized PTF operator fails if query has upper case window function

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26340?focusedWorklogId=782362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782362
 ]

ASF GitHub Bot logged work on HIVE-26340:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 10:48
Start Date: 17/Jun/22 10:48
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on PR #3382:
URL: https://github.com/apache/hive/pull/3382#issuecomment-1158749695

   created new jira: https://issues.apache.org/jira/browse/HIVE-26340




Issue Time Tracking
---

Worklog Id: (was: 782362)
Remaining Estimate: 0h
Time Spent: 10m

> Vectorized PTF operator fails if query has upper case window function
> -
>
> Key: HIVE-26340
> URL: https://issues.apache.org/jira/browse/HIVE-26340
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> SELECT ROW_NUMBER() OVER(order by age) AS rn FROM studentnull100;
> {code}
> {code}
> 2022-06-16T14:18:57,728 ERROR [pool-4-thread-1] jdbc.TestDriver: Error while 
> compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 7, vertexId=vertex_1655217967697_0062_1_01, diagnostics=[Task failed, 
> taskId=task_1655217967697_0062_1_01_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1655217967697_0062_1_01_00_0:java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:298)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:252)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluator(VectorPTFDesc.java:165)
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluators(VectorPTFDesc.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.initializeOp(VectorPTFOperator.java:317)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:523)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:384)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:268)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26340) Vectorized PTF operator fails if query has upper case window function

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26340:
--
Labels: pull-request-available  (was: )

> Vectorized PTF operator fails if query has upper case window function
> -
>
> Key: HIVE-26340
> URL: https://issues.apache.org/jira/browse/HIVE-26340
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> SELECT ROW_NUMBER() OVER(order by age) AS rn FROM studentnull100;
> {code}
> {code}
> 2022-06-16T14:18:57,728 ERROR [pool-4-thread-1] jdbc.TestDriver: Error while 
> compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 7, vertexId=vertex_1655217967697_0062_1_01, diagnostics=[Task failed, 
> taskId=task_1655217967697_0062_1_01_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1655217967697_0062_1_01_00_0:java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:298)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:252)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluator(VectorPTFDesc.java:165)
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluators(VectorPTFDesc.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.initializeOp(VectorPTFOperator.java:317)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:523)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:384)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:268)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26340) Vectorized PTF operator fails if query has upper case window function

2022-06-17 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-26340:
-


> Vectorized PTF operator fails if query has upper case window function
> -
>
> Key: HIVE-26340
> URL: https://issues.apache.org/jira/browse/HIVE-26340
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>
> {code}
> SELECT ROW_NUMBER() OVER(order by age) AS rn FROM studentnull100;
> {code}
> {code}
> 2022-06-16T14:18:57,728 ERROR [pool-4-thread-1] jdbc.TestDriver: Error while 
> compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 7, vertexId=vertex_1655217967697_0062_1_01, diagnostics=[Task failed, 
> taskId=task_1655217967697_0062_1_01_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1655217967697_0062_1_01_00_0:java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:298)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:252)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluator(VectorPTFDesc.java:165)
>   at 
> org.apache.hadoop.hive.ql.plan.VectorPTFDesc.getEvaluators(VectorPTFDesc.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.ptf.VectorPTFOperator.initializeOp(VectorPTFOperator.java:317)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:374)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:523)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:384)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:268)
>   ... 16 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26274) No vectorization if query has upper case window function

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26274?focusedWorklogId=782359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782359
 ]

ASF GitHub Bot logged work on HIVE-26274:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 10:04
Start Date: 17/Jun/22 10:04
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on PR #3382:
URL: https://github.com/apache/hive/pull/3382#issuecomment-1158716470

   I'm afraid using addendum patches can make the patch contents vague (what to 
backport later), can you please file a separate jira for clarity sake? 
otherwise, looks good to me




Issue Time Tracking
---

Worklog Id: (was: 782359)
Time Spent: 50m  (was: 40m)

> No vectorization if query has upper case window function
> 
>
> Key: HIVE-26274
> URL: https://issues.apache.org/jira/browse/HIVE-26274
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE t1 (a int, b int);
> EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1;
> {code}
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   Vertices:
> Map 1 
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vector.serde.deserialize IS true
> inputFormatFeatureSupport: [DECIMAL_64]
> featureSupportInUse: [DECIMAL_64]
> inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
> allNative: true
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez] IS true
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> vectorized: false
>   Stage: Stage-0
> Fetch Operator
> {code}
> {code}
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=782339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782339
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 09:26
Start Date: 17/Jun/22 09:26
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r899942979


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -858,19 +862,24 @@ private static boolean 
hasParquetListColumnSupport(Properties tableProps, Schema
* @param overwrite If we have to overwrite the existing table or just add 
the new data
* @return The generated JobContext
*/
-  private Optional generateJobContext(Configuration configuration, 
String tableName, boolean overwrite) {
+  private Optional> generateJobContext(

Review Comment:
   rewritten.





Issue Time Tracking
---

Worklog Id: (was: 782339)
Time Spent: 2h 10m  (was: 2h)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=782338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782338
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 09:26
Start Date: 17/Jun/22 09:26
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r899942578


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -411,23 +411,27 @@ public boolean commitInMoveTask() {
   public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
 String tableName = commitProperties.getProperty(Catalogs.NAME);
 Configuration configuration = SessionState.getSessionConf();
-Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
-if (jobContext.isPresent()) {
+Optional> jobContextList = 
generateJobContext(configuration, tableName, overwrite);
+if (!jobContextList.isPresent()) {
+  return;
+}
+
+for (JobContext jobContext : jobContextList.get()) {
   OutputCommitter committer = new HiveIcebergOutputCommitter();
   try {
-committer.commitJob(jobContext.get());
+committer.commitJob(jobContext);
   } catch (Throwable e) {
 // Aborting the job if the commit has failed
 LOG.error("Error while trying to commit job: {}, starting rollback 
changes for table: {}",
-jobContext.get().getJobID(), tableName, e);
+jobContext.getJobID(), tableName, e);
 try {
-  committer.abortJob(jobContext.get(), JobStatus.State.FAILED);
+  committer.abortJob(jobContext, JobStatus.State.FAILED);

Review Comment:
   I think all jobs should be rolled back in case of error when committing any 
of them. To achieve this we are using `org.apache.iceberg.util.Tasks`:
   ```
 Tasks.foreach(outputs)
 .throwFailureWhenFinished()
 .stopOnFailure()
 .run(output -> {
   ...
   ```
   which can revert all tasks in case of error even if some of them are already 
succeeded.
   
   The initial implementation committed each job independently: all jobs 
launched a separate batch of tasks.
   I refactored this part to collect all outputs from all jobs and launch it in 
one batch.
   I also found that this is done parallel and we are looking up the necessary 
data for commit in the SessionState which is stored thread locally. I 
experienced that this is working only if one output exists since only one 
worker thread is used and that is the main thread where the `SessionState` is 
initialized. However if more than one outputs exists in a batch threads other 
than the main thread does not have the necessary data for commit in the 
`SessionState`.
   So I extracted collecting these data prior launching the tasks.
   
   This affects multi inserts, split updates and merge statements. I haven't 
found any tests for multi inserting into an iceberg table (please share some if 
any exists) I assume this issue haven't came up before.
   
   Please share your thoughts.
   





Issue Time Tracking
---

Worklog Id: (was: 782338)
Time Spent: 2h  (was: 1h 50m)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work started] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2022-06-17 Thread Ryu Kobayashi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26339 started by Ryu Kobayashi.

> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> {code:java}
> %abc\%def {code}
> Start pattern
> {code:java}
> abc\%def% {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26339?focusedWorklogId=782329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782329
 ]

ASF GitHub Bot logged work on HIVE-26339:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 09:08
Start Date: 17/Jun/22 09:08
Worklog Time Spent: 10m 
  Work Description: ryukobayashi opened a new pull request, #3384:
URL: https://github.com/apache/hive/pull/3384

   
   
   ### What changes were proposed in this pull request?
   
   
   Vectorized LIKE udf is taking proportionately higher time depending on the 
size of input string in UDF. And, I found a problem with some regular 
expressions.
   
   ### Why are the changes needed?
   
   
   To support filter condition based on input data.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   Added testcase as part of PR.




Issue Time Tracking
---

Worklog Id: (was: 782329)
Remaining Estimate: 0h
Time Spent: 10m

> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> {code:java}
> %abc\%def {code}
> Start pattern
> {code:java}
> abc\%def% {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=782328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782328
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 09:08
Start Date: 17/Jun/22 09:08
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r899928173


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -411,23 +411,27 @@ public boolean commitInMoveTask() {
   public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
 String tableName = commitProperties.getProperty(Catalogs.NAME);
 Configuration configuration = SessionState.getSessionConf();
-Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
-if (jobContext.isPresent()) {
+Optional> jobContextList = 
generateJobContext(configuration, tableName, overwrite);
+if (!jobContextList.isPresent()) {
+  return;
+}
+
+for (JobContext jobContext : jobContextList.get()) {
   OutputCommitter committer = new HiveIcebergOutputCommitter();
   try {
-committer.commitJob(jobContext.get());
+committer.commitJob(jobContext);
   } catch (Throwable e) {
 // Aborting the job if the commit has failed
 LOG.error("Error while trying to commit job: {}, starting rollback 
changes for table: {}",
-jobContext.get().getJobID(), tableName, e);
+jobContext.getJobID(), tableName, e);
 try {
-  committer.abortJob(jobContext.get(), JobStatus.State.FAILED);
+  committer.abortJob(jobContext, JobStatus.State.FAILED);
 } catch (IOException ioe) {
   LOG.error("Error while trying to abort failed job. There might be 
uncleaned data files.", ioe);
   // no throwing here because the original exception should be 
propagated
 }
 throw new HiveException(
-"Error committing job: " + jobContext.get().getJobID() + " for 
table: " + tableName, e);
+"Error committing job: " + jobContext.getJobID() + " for 
table: " + tableName, e);

Review Comment:
   removed



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -411,23 +411,27 @@ public boolean commitInMoveTask() {
   public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
 String tableName = commitProperties.getProperty(Catalogs.NAME);
 Configuration configuration = SessionState.getSessionConf();
-Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
-if (jobContext.isPresent()) {
+Optional> jobContextList = 
generateJobContext(configuration, tableName, overwrite);
+if (!jobContextList.isPresent()) {
+  return;
+}
+
+for (JobContext jobContext : jobContextList.get()) {
   OutputCommitter committer = new HiveIcebergOutputCommitter();
   try {
-committer.commitJob(jobContext.get());
+committer.commitJob(jobContext);
   } catch (Throwable e) {
 // Aborting the job if the commit has failed
 LOG.error("Error while trying to commit job: {}, starting rollback 
changes for table: {}",
-jobContext.get().getJobID(), tableName, e);
+jobContext.getJobID(), tableName, e);

Review Comment:
   removed





Issue Time Tracking
---

Worklog Id: (was: 782328)
Time Spent: 1h 50m  (was: 1h 40m)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26339:
--
Labels: pull-request-available  (was: )

> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> {code:java}
> %abc\%def {code}
> Start pattern
> {code:java}
> abc\%def% {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=782322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782322
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 09:04
Start Date: 17/Jun/22 09:04
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r899924819


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -127,14 +130,23 @@ public void commitTask(TaskAttemptContext 
originalContext) throws IOException {
   .run(output -> {
 Table table = 
HiveIcebergStorageHandler.table(context.getJobConf(), output);
 if (table != null) {
-  HiveIcebergWriter writer = writers.get(output);
+  Collection dataFiles = Lists.newArrayList();
+  Collection deleteFiles = Lists.newArrayList();
   String fileForCommitLocation = 
generateFileForCommitLocation(table.location(), jobConf,
-  attemptID.getJobID(), attemptID.getTaskID().getId());
-  if (writer != null) {
-createFileForCommit(writer.files(), fileForCommitLocation, 
table.io());
-  } else {
+  attemptID.getJobID(), attemptID.getTaskID().getId());
+  if (writers.get(output) != null) {
+for (HiveIcebergWriter writer : writers.get(output)) {
+  if (writer != null) {
+dataFiles.addAll(writer.files().dataFiles());

Review Comment:
   I wouldn't change this code until it turns out to be a bottleneck.





Issue Time Tracking
---

Worklog Id: (was: 782322)
Time Spent: 1h 40m  (was: 1.5h)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26319) Iceberg integration: Perform update split early

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=782321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782321
 ]

ASF GitHub Bot logged work on HIVE-26319:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 09:02
Start Date: 17/Jun/22 09:02
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3362:
URL: https://github.com/apache/hive/pull/3362#discussion_r899923444


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##
@@ -325,7 +339,15 @@ private void commitTable(FileIO io, ExecutorService 
executor, JobContext jobCont
 LOG.info("Committing job has started for table: {}, using location: {}",
 table, generateJobLocation(location, conf, jobContext.getJobID()));
 
-int numTasks = SessionStateUtil.getCommitInfo(conf, name).map(info -> 
info.getTaskNum()).orElseGet(() -> {
+Optional commitInfo;
+if (SessionStateUtil.getCommitInfo(conf, name).isPresent()) {
+  commitInfo = SessionStateUtil.getCommitInfo(conf, name).get()
+  .stream().filter(ci -> 
ci.getJobIdStr().equals(jobContext.getJobID().toString())).findFirst();

Review Comment:
   AFAIK only one `CommitInfo` object should be associated to a jobContext.





Issue Time Tracking
---

Worklog Id: (was: 782321)
Time Spent: 1.5h  (was: 1h 20m)

> Iceberg integration: Perform update split early
> ---
>
> Key: HIVE-26319
> URL: https://issues.apache.org/jira/browse/HIVE-26319
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Extend update split early to iceberg tables like in HIVE-21160 for native 
> acid tables



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26298) Selecting complex types on migrated iceberg table does not work

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26298?focusedWorklogId=782320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782320
 ]

ASF GitHub Bot logged work on HIVE-26298:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 09:02
Start Date: 17/Jun/22 09:02
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request, #3383:
URL: https://github.com/apache/hive/pull/3383

   
   
   ### What changes were proposed in this pull request?
   Test fixes
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 782320)
Time Spent: 1h 10m  (was: 1h)

> Selecting complex types on migrated iceberg table does not work
> ---
>
> Key: HIVE-26298
> URL: https://issues.apache.org/jira/browse/HIVE-26298
> Project: Hive
>  Issue Type: Bug
>Reporter: Gergely Fürnstáhl
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1-a5d522f4-a065-44e6-983b-ba66596b4332.metadata.json
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I am working on implementing NameMapping in Impala (mainly replicating Hive's 
> functionality) and ran into the following issue:
> {code:java}
> CREATE TABLE array_demo
> (
>   int_primitive INT,
>   int_array ARRAY,
>   int_array_array ARRAY>,
>   int_to_array_array_Map MAP>>
> )
> STORED AS ORC;
> INSERT INTO array_demo values (0, array(1), array(array(2), array(3,4)), 
> map(5,array(array(6),array(7,8;
> select * from array_demo;
> +---+---+-++
> | array_demo.int_primitive  | array_demo.int_array  | 
> array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> +---+---+-++
> | 0                         | [1]                   | [[2],[3,4]]             
>     | {5:[[6],[7,8]]}                    |
> +---+---+-++
>  {code}
> Converting to iceberg
>  
>  
> {code:java}
> ALTER TABLE array_demo SET TBLPROPERTIES 
> ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')
> select * from array_demo;
> INFO  : Compiling 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe):
>  select * from array_demo
> INFO  : No Stats for default@array_demo, Columns: int_primitive, int_array, 
> int_to_array_array_map, int_array_array
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:array_demo.int_primitive, type:int, 
> comment:null), FieldSchema(name:array_demo.int_array, type:array, 
> comment:null), FieldSchema(name:array_demo.int_array_array, 
> type:array>, comment:null), 
> FieldSchema(name:array_demo.int_to_array_array_map, 
> type:map>>, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe);
>  Time taken: 0.036 seconds
> INFO  : Executing 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe):
>  select * from array_demo
> INFO  : Completed executing 
> command(queryId=gfurnstahl_20220608102746_54bf3e74-e12b-400b-94a9-4e4c9fe460fe);
>  Time taken: 0.0 seconds
> INFO  : OK
> Error: java.io.IOException: java.lang.IllegalArgumentException: Can not 
> promote MAP type to INTEGER (state=,code=0)
> select int_primitive from array_demo;
> ++
> | int_primitive  |
> ++
> | 0              |
> ++
> 1 row selected (0.088 seconds)
>  {code}
> Removing schema.name-mapping.default solves it
> {code:java}
> ALTER TABLE array_demo UNSET TBLPROPERTIES ('schema.name-mapping.default');
> select * from array_demo;
> +---+---+-++
> | array_demo.int_primitive  | array_demo.int_array  | 
> array_demo.int_array_array  | array_demo.int_to_array_array_map  |
> +---+---+-++
> | 0                         | [1]                   | [[2],[3,4]]             
>     | {5:[[6],[7,8]]}                    |
> +---+---+-++
>  {code}
> Possible cause:
>  
> The name mapping

[jira] [Resolved] (HIVE-25929) Let secret config properties to be propagated to Tez

2022-06-17 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-25929.
-
Resolution: Invalid

not fixing this in the current way, refer to 
https://github.com/apache/hive/pull/3019#discussion_r899907375

> Let secret config properties to be propagated to Tez
> 
>
> Key: HIVE-25929
> URL: https://issues.apache.org/jira/browse/HIVE-25929
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> History in chronological order:
> HIVE-10508: removed some passwords from config that's propagated to execution 
> engines
> HIVE-9013: introduced hive.conf.hidden.list, which is used instead of the 
> hardcoded list in HIVE-10508
> the problem with HIVE-9013 is it's about to introduce a common method for 
> removing sensitive data from Configuration, which absolutely makes sense in 
> most of the cases (set command showing sensitive data), but can cause issues 
> e.g. while using non-secure cloud connectors on a cluster, where instead of 
> the hadoop credential provider API (which is considered the secure and proper 
> way), passwords/secrets appear in the Configuration object (like: 
> "fs.azure.account.oauth2.client.secret")
> 2 possible solutions:
> 1. introduce a new property like: "hive.conf.hidden.list.exec.engines" -> 
> which defaults to "hive.conf.hidden.list" (configurable, but maybe just more 
> confusing to users, having a new config property which should be understood 
> and maintained on a cluster)
> 2. simply revert DAGUtils to use to old stripHivePasswordDetails introduced 
> by HIVE-10508 (convenient, less confusing for users, but cannot be configured)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25929) Let secret config properties to be propagated to Tez

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25929?focusedWorklogId=782311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782311
 ]

ASF GitHub Bot logged work on HIVE-25929:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 08:45
Start Date: 17/Jun/22 08:45
Worklog Time Spent: 10m 
  Work Description: abstractdog closed pull request #3019: HIVE-25929: Let 
secret config properties to be propagated to Tez
URL: https://github.com/apache/hive/pull/3019




Issue Time Tracking
---

Worklog Id: (was: 782311)
Time Spent: 3h 10m  (was: 3h)

> Let secret config properties to be propagated to Tez
> 
>
> Key: HIVE-25929
> URL: https://issues.apache.org/jira/browse/HIVE-25929
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> History in chronological order:
> HIVE-10508: removed some passwords from config that's propagated to execution 
> engines
> HIVE-9013: introduced hive.conf.hidden.list, which is used instead of the 
> hardcoded list in HIVE-10508
> the problem with HIVE-9013 is it's about to introduce a common method for 
> removing sensitive data from Configuration, which absolutely makes sense in 
> most of the cases (set command showing sensitive data), but can cause issues 
> e.g. while using non-secure cloud connectors on a cluster, where instead of 
> the hadoop credential provider API (which is considered the secure and proper 
> way), passwords/secrets appear in the Configuration object (like: 
> "fs.azure.account.oauth2.client.secret")
> 2 possible solutions:
> 1. introduce a new property like: "hive.conf.hidden.list.exec.engines" -> 
> which defaults to "hive.conf.hidden.list" (configurable, but maybe just more 
> confusing to users, having a new config property which should be understood 
> and maintained on a cluster)
> 2. simply revert DAGUtils to use to old stripHivePasswordDetails introduced 
> by HIVE-10508 (convenient, less confusing for users, but cannot be configured)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25929) Let secret config properties to be propagated to Tez

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25929?focusedWorklogId=782312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782312
 ]

ASF GitHub Bot logged work on HIVE-25929:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 08:45
Start Date: 17/Jun/22 08:45
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #3019:
URL: https://github.com/apache/hive/pull/3019#discussion_r899907375


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5455,6 +5455,22 @@ public static enum ConfVars {
 + ",hive.zookeeper.ssl.truststore.location"
 + ",hive.zookeeper.ssl.truststore.password",
 "Comma separated list of configuration options which should not be 
read by normal user like passwords"),
+HIVE_CONF_PROPAGATE_EXEC_ENGINES("hive.conf.propagate.exec.engines",
+"fs.s3.awsAccessKeyId"

Review Comment:
   yes, if configs are dumped in logs then it's a security risk, but currently, 
we have no other way to support 'less secure' option
   ('more secure' uses hadoop credential provider)
   I don't have strong opinions about this one, so I'm closing the jira as 
invalid, until we're not facing customer pressure to support it
   





Issue Time Tracking
---

Worklog Id: (was: 782312)
Time Spent: 3h 20m  (was: 3h 10m)

> Let secret config properties to be propagated to Tez
> 
>
> Key: HIVE-25929
> URL: https://issues.apache.org/jira/browse/HIVE-25929
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> History in chronological order:
> HIVE-10508: removed some passwords from config that's propagated to execution 
> engines
> HIVE-9013: introduced hive.conf.hidden.list, which is used instead of the 
> hardcoded list in HIVE-10508
> the problem with HIVE-9013 is it's about to introduce a common method for 
> removing sensitive data from Configuration, which absolutely makes sense in 
> most of the cases (set command showing sensitive data), but can cause issues 
> e.g. while using non-secure cloud connectors on a cluster, where instead of 
> the hadoop credential provider API (which is considered the secure and proper 
> way), passwords/secrets appear in the Configuration object (like: 
> "fs.azure.account.oauth2.client.secret")
> 2 possible solutions:
> 1. introduce a new property like: "hive.conf.hidden.list.exec.engines" -> 
> which defaults to "hive.conf.hidden.list" (configurable, but maybe just more 
> confusing to users, having a new config property which should be understood 
> and maintained on a cluster)
> 2. simply revert DAGUtils to use to old stripHivePasswordDetails introduced 
> by HIVE-10508 (convenient, less confusing for users, but cannot be configured)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25929) Let secret config properties to be propagated to Tez

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25929?focusedWorklogId=782310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782310
 ]

ASF GitHub Bot logged work on HIVE-25929:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 08:44
Start Date: 17/Jun/22 08:44
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #3019:
URL: https://github.com/apache/hive/pull/3019#discussion_r899907375


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5455,6 +5455,22 @@ public static enum ConfVars {
 + ",hive.zookeeper.ssl.truststore.location"
 + ",hive.zookeeper.ssl.truststore.password",
 "Comma separated list of configuration options which should not be 
read by normal user like passwords"),
+HIVE_CONF_PROPAGATE_EXEC_ENGINES("hive.conf.propagate.exec.engines",
+"fs.s3.awsAccessKeyId"

Review Comment:
   yes, if configs are dumped in logs then it's a security risk, but currently, 
we have no other way to support 'less secure' option
   ('more secure' uses hadoop credential provider)
   I don't have strong opinions about this one, so I'm closing this as invalid, 
until we're not facing customer pressure to support it
   





Issue Time Tracking
---

Worklog Id: (was: 782310)
Time Spent: 3h  (was: 2h 50m)

> Let secret config properties to be propagated to Tez
> 
>
> Key: HIVE-25929
> URL: https://issues.apache.org/jira/browse/HIVE-25929
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> History in chronological order:
> HIVE-10508: removed some passwords from config that's propagated to execution 
> engines
> HIVE-9013: introduced hive.conf.hidden.list, which is used instead of the 
> hardcoded list in HIVE-10508
> the problem with HIVE-9013 is it's about to introduce a common method for 
> removing sensitive data from Configuration, which absolutely makes sense in 
> most of the cases (set command showing sensitive data), but can cause issues 
> e.g. while using non-secure cloud connectors on a cluster, where instead of 
> the hadoop credential provider API (which is considered the secure and proper 
> way), passwords/secrets appear in the Configuration object (like: 
> "fs.azure.account.oauth2.client.secret")
> 2 possible solutions:
> 1. introduce a new property like: "hive.conf.hidden.list.exec.engines" -> 
> which defaults to "hive.conf.hidden.list" (configurable, but maybe just more 
> confusing to users, having a new config property which should be understood 
> and maintained on a cluster)
> 2. simply revert DAGUtils to use to old stripHivePasswordDetails introduced 
> by HIVE-10508 (convenient, less confusing for users, but cannot be configured)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-17 Thread Sourabh Badhya (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya updated HIVE-26324:
--
Fix Version/s: 4.0.0-alpha-2

> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help 
> us in adding "one-row-table" like constraints.
> *MySQL* - 
> {code:java}
> ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) 
> GENERATED ALWAYS AS (1) STORED NOT NULL;{code}
> CHECK constraints are not effective in MySQL 5.7. Hence need to shift to 
> using GENERATED columns. This is supported in MySQL 5.7.
> Similarly for MariaDB which uses the same schema script as that of MySQL, 
> Generated columns with syntax compatible with MySQL are supported from 10.2.
> Link - 
> [https://dev.mysql.com/doc/refman/5.7/en/alter-table-generated-columns.html]
> Link - [https://mariadb.com/kb/en/generated-columns/]
> *Postgres* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE "NOTIFICATION_SEQUENCE"
> (
> "NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
> "NEXT_EVENT_ID" BIGINT NOT NULL,
> PRIMARY KEY ("NNI_ID")
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE "NOTIFICATION_SEQUENCE"
> ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
> *Derby* - 
> {code:java}
> ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
> CHECK (NNI_ID = 1); {code}
> *Oracle* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE NOTIFICATION_SEQUENCE
> (
> NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
> NEXT_EVENT_ID NUMBER NOT NULL
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}
> *Microsoft SQL Server* - 
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-17 Thread Sourabh Badhya (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya resolved HIVE-26324.
---
Target Version/s: 4.0.0-alpha-2
  Resolution: Fixed

> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help 
> us in adding "one-row-table" like constraints.
> *MySQL* - 
> {code:java}
> ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) 
> GENERATED ALWAYS AS (1) STORED NOT NULL;{code}
> CHECK constraints are not effective in MySQL 5.7. Hence need to shift to 
> using GENERATED columns. This is supported in MySQL 5.7.
> Similarly for MariaDB which uses the same schema script as that of MySQL, 
> Generated columns with syntax compatible with MySQL are supported from 10.2.
> Link - 
> [https://dev.mysql.com/doc/refman/5.7/en/alter-table-generated-columns.html]
> Link - [https://mariadb.com/kb/en/generated-columns/]
> *Postgres* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE "NOTIFICATION_SEQUENCE"
> (
> "NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
> "NEXT_EVENT_ID" BIGINT NOT NULL,
> PRIMARY KEY ("NNI_ID")
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE "NOTIFICATION_SEQUENCE"
> ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
> *Derby* - 
> {code:java}
> ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
> CHECK (NNI_ID = 1); {code}
> *Oracle* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE NOTIFICATION_SEQUENCE
> (
> NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
> NEXT_EVENT_ID NUMBER NOT NULL
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}
> *Microsoft SQL Server* - 
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-17 Thread Sourabh Badhya (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya updated HIVE-26324:
--
Target Version/s:   (was: 4.0.0-alpha-2)

> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help 
> us in adding "one-row-table" like constraints.
> *MySQL* - 
> {code:java}
> ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) 
> GENERATED ALWAYS AS (1) STORED NOT NULL;{code}
> CHECK constraints are not effective in MySQL 5.7. Hence need to shift to 
> using GENERATED columns. This is supported in MySQL 5.7.
> Similarly for MariaDB which uses the same schema script as that of MySQL, 
> Generated columns with syntax compatible with MySQL are supported from 10.2.
> Link - 
> [https://dev.mysql.com/doc/refman/5.7/en/alter-table-generated-columns.html]
> Link - [https://mariadb.com/kb/en/generated-columns/]
> *Postgres* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE "NOTIFICATION_SEQUENCE"
> (
> "NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
> "NEXT_EVENT_ID" BIGINT NOT NULL,
> PRIMARY KEY ("NNI_ID")
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE "NOTIFICATION_SEQUENCE"
> ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
> *Derby* - 
> {code:java}
> ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
> CHECK (NNI_ID = 1); {code}
> *Oracle* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE NOTIFICATION_SEQUENCE
> (
> NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
> NEXT_EVENT_ID NUMBER NOT NULL
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}
> *Microsoft SQL Server* - 
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-17 Thread Sourabh Badhya (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555466#comment-17555466
 ] 

Sourabh Badhya commented on HIVE-26324:
---

Thanks [~dkuzmenko] for the review.

> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help 
> us in adding "one-row-table" like constraints.
> *MySQL* - 
> {code:java}
> ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) 
> GENERATED ALWAYS AS (1) STORED NOT NULL;{code}
> CHECK constraints are not effective in MySQL 5.7. Hence need to shift to 
> using GENERATED columns. This is supported in MySQL 5.7.
> Similarly for MariaDB which uses the same schema script as that of MySQL, 
> Generated columns with syntax compatible with MySQL are supported from 10.2.
> Link - 
> [https://dev.mysql.com/doc/refman/5.7/en/alter-table-generated-columns.html]
> Link - [https://mariadb.com/kb/en/generated-columns/]
> *Postgres* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE "NOTIFICATION_SEQUENCE"
> (
> "NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
> "NEXT_EVENT_ID" BIGINT NOT NULL,
> PRIMARY KEY ("NNI_ID")
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE "NOTIFICATION_SEQUENCE"
> ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
> *Derby* - 
> {code:java}
> ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
> CHECK (NNI_ID = 1); {code}
> *Oracle* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE NOTIFICATION_SEQUENCE
> (
> NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
> NEXT_EVENT_ID NUMBER NOT NULL
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}
> *Microsoft SQL Server* - 
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2022-06-17 Thread Ryu Kobayashi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated HIVE-26339:
-
Description: 
Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
expressions. Current code also confirmed that the current regular expression 
pattern cannot be supported by the following LIKE patterns.

End pattern
{code:java}
%abc\%def {code}
Start pattern
{code:java}
abc\%def% {code}

  was:
Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
expressions. Current code also confirmed that the current regular expression 
pattern cannot be supported by the following LIKE patterns.

End pattern
{code:java}
%abc\def {code}
Start pattern
{code:java}
abc\def% {code}


> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> {code:java}
> %abc\%def {code}
> Start pattern
> {code:java}
> abc\%def% {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2022-06-17 Thread Ryu Kobayashi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated HIVE-26339:
-
Description: 
Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
expressions. Current code also confirmed that the current regular expression 
pattern cannot be supported by the following LIKE patterns.

End pattern
{code:java}
%abc\def {code}
Start pattern
{code:java}
abc\def% {code}

  was:
Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
expressions. Current code also confirmed that the current regular expression 
pattern cannot be supported by the following LIKE patterns.

End pattern
```
%abc\def
```

Start pattern
```
abc\def%
```


> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> {code:java}
> %abc\def {code}
> Start pattern
> {code:java}
> abc\def% {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2022-06-17 Thread Ryu Kobayashi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated HIVE-26339:
-
Description: 
Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
expressions. Current code also confirmed that the current regular expression 
pattern cannot be supported by the following LIKE patterns.

End pattern
```
%abc\def
```

Start pattern
```
abc\def%
```

> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>
> Fixed https://issues.apache.org/jira/browse/HIVE-26047 without using regular 
> expressions. Current code also confirmed that the current regular expression 
> pattern cannot be supported by the following LIKE patterns.
> End pattern
> ```
> %abc\def
> ```
> Start pattern
> ```
> abc\def%
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26339) HIVE-26047 Related LIKE pattern issues

2022-06-17 Thread Ryu Kobayashi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi reassigned HIVE-26339:


Assignee: Ryu Kobayashi

> HIVE-26047 Related LIKE pattern issues
> --
>
> Key: HIVE-26339
> URL: https://issues.apache.org/jira/browse/HIVE-26339
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26326) Support enabling background threads when failover is in progress.

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26326?focusedWorklogId=782302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782302
 ]

ASF GitHub Bot logged work on HIVE-26326:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 08:16
Start Date: 17/Jun/22 08:16
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3376:
URL: https://github.com/apache/hive/pull/3376#discussion_r899882866


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java:
##
@@ -291,9 +291,17 @@ public static boolean isTargetOfReplication(Database db) {
 return dbParameters != null && 
!StringUtils.isEmpty(dbParameters.get(ReplConst.TARGET_OF_REPLICATION));
   }
 
+  public static boolean isBackgroundThreadsEnabledForRepl(Database db) {
+assert (db != null);

Review Comment:
   won't be executed in production code, should we throw an exception here?





Issue Time Tracking
---

Worklog Id: (was: 782302)
Time Spent: 0.5h  (was: 20m)

> Support enabling background threads when failover is in progress.
> -
>
> Key: HIVE-26326
> URL: https://issues.apache.org/jira/browse/HIVE-26326
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> repl.target.for doesn't allow background threads, expose a 
> {*}repl.backgroundthread{*}=enable: To force enable the background threads, 
> irrespective of repl.target.for, once B takes over as primary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26324) Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26324?focusedWorklogId=782297=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782297
 ]

ASF GitHub Bot logged work on HIVE-26324:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 07:57
Start Date: 17/Jun/22 07:57
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged PR #3369:
URL: https://github.com/apache/hive/pull/3369




Issue Time Tracking
---

Worklog Id: (was: 782297)
Time Spent: 40m  (was: 0.5h)

> Add "one-row-table" constraints on NOTIFICATION_SEQUENCE table
> --
>
> Key: HIVE-26324
> URL: https://issues.apache.org/jira/browse/HIVE-26324
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> NOTIFICATION_SEQUENCE table must have only one row however there have been 
> several reports of NOTIFICATION_SEQUENCE table having multiple rows. In order 
> to prevent this situation from happening, its best to enforce "one-row-table" 
> like constraints on NOTIFICATION_SEQUENCE table.
> Queries tried on supported databases - 
> NOTIFICATION_SEQUENCE already has NNI_ID as the primary key. This will help 
> us in adding "one-row-table" like constraints.
> *MySQL* - 
> {code:java}
> ALTER TABLE `NOTIFICATION_SEQUENCE` MODIFY COLUMN `NNI_ID` BIGINT(20) 
> GENERATED ALWAYS AS (1) STORED NOT NULL;{code}
> CHECK constraints are not effective in MySQL 5.7. Hence need to shift to 
> using GENERATED columns. This is supported in MySQL 5.7.
> Similarly for MariaDB which uses the same schema script as that of MySQL, 
> Generated columns with syntax compatible with MySQL are supported from 10.2.
> Link - 
> [https://dev.mysql.com/doc/refman/5.7/en/alter-table-generated-columns.html]
> Link - [https://mariadb.com/kb/en/generated-columns/]
> *Postgres* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE "NOTIFICATION_SEQUENCE"
> (
> "NNI_ID" BIGINT NOT NULL CHECK ("NNI_ID" = 1),
> "NEXT_EVENT_ID" BIGINT NOT NULL,
> PRIMARY KEY ("NNI_ID")
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE "NOTIFICATION_SEQUENCE"
> ADD CONSTRAINT "ONE_ROW_CONSTRAINT" CHECK ("NNI_ID" = 1); {code}
> *Derby* - 
> {code:java}
> ALTER TABLE "APP"."NOTIFICATION_SEQUENCE" ADD CONSTRAINT "ONE_ROW_CONSTRAINT" 
> CHECK (NNI_ID = 1); {code}
> *Oracle* - 
> Either change the definition of table like this - 
> {code:java}
> CREATE TABLE NOTIFICATION_SEQUENCE
> (
> NNI_ID NUMBER NOT NULL CHECK (NNI_ID = 1),
> NEXT_EVENT_ID NUMBER NOT NULL
> ); {code}
> OR add explicit constraints like this -
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}
> *Microsoft SQL Server* - 
> {code:java}
> ALTER TABLE NOTIFICATION_SEQUENCE ADD CONSTRAINT ONE_ROW_CONSTRAINT CHECK 
> (NNI_ID = 1); {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26177) Create a new connection pool for compaction (DataNucleus)

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26177?focusedWorklogId=782288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782288
 ]

ASF GitHub Bot logged work on HIVE-26177:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 07:43
Start Date: 17/Jun/22 07:43
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged PR #3372:
URL: https://github.com/apache/hive/pull/3372




Issue Time Tracking
---

Worklog Id: (was: 782288)
Time Spent: 2h  (was: 1h 50m)

> Create a new connection pool for compaction (DataNucleus)
> -
>
> Key: HIVE-26177
> URL: https://issues.apache.org/jira/browse/HIVE-26177
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26274) No vectorization if query has upper case window function

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26274?focusedWorklogId=782257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782257
 ]

ASF GitHub Bot logged work on HIVE-26274:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 06:50
Start Date: 17/Jun/22 06:50
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request, #3382:
URL: https://github.com/apache/hive/pull/3382

   Addendum to #3332




Issue Time Tracking
---

Worklog Id: (was: 782257)
Time Spent: 40m  (was: 0.5h)

> No vectorization if query has upper case window function
> 
>
> Key: HIVE-26274
> URL: https://issues.apache.org/jira/browse/HIVE-26274
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE t1 (a int, b int);
> EXPLAIN VECTORIZATION ONLY SELECT ROW_NUMBER() OVER(order by a) AS rn FROM t1;
> {code}
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   Vertices:
> Map 1 
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vector.serde.deserialize IS true
> inputFormatFeatureSupport: [DECIMAL_64]
> featureSupportInUse: [DECIMAL_64]
> inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
> allNative: true
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez] IS true
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> vectorized: false
>   Stage: Stage-0
> Fetch Operator
> {code}
> {code}
> notVectorizedReason: PTF operator: ROW_NUMBER not in 
> supported functions [avg, count, dense_rank, first_value, lag, last_value, 
> lead, max, min, rank, row_number, sum]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782251
 ]

ASF GitHub Bot logged work on HIVE-26244:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 06:37
Start Date: 17/Jun/22 06:37
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #3307:
URL: https://github.com/apache/hive/pull/3307#discussion_r899807718


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created 
table, etc).
 return response;
   }
 }
+
+if (checkForConcurrentCtas && isValidTxn(txnId)) {
+  LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar)
+  .orElseThrow(() -> new MetaException("Unknown lock type: " + 
lockChar));
+
+  if (lockType == LockType.EXCL_WRITE && blockedBy.state == 
LockState.ACQUIRED) {
+
+String deleteBlockedByTxnComp = "DELETE  FROM \"TXN_COMPONENTS\" 
WHERE" + " \"TC_TXNID\"=" + txnId;

Review Comment:
   Realized that the cleaner will take care of this. I have removed the delete 
query in the recent commit.





Issue Time Tracking
---

Worklog Id: (was: 782251)
Time Spent: 7h 10m  (was: 7h)

> Implementing locking for concurrent ctas
> 
>
> Key: HIVE-26244
> URL: https://issues.apache.org/jira/browse/HIVE-26244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas

2022-06-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782252
 ]

ASF GitHub Bot logged work on HIVE-26244:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 06:37
Start Date: 17/Jun/22 06:37
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #3307:
URL: https://github.com/apache/hive/pull/3307#discussion_r899808001


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created 
table, etc).
 return response;
   }
 }
+
+if (checkForConcurrentCtas && isValidTxn(txnId)) {

Review Comment:
   done





Issue Time Tracking
---

Worklog Id: (was: 782252)
Time Spent: 7h 20m  (was: 7h 10m)

> Implementing locking for concurrent ctas
> 
>
> Key: HIVE-26244
> URL: https://issues.apache.org/jira/browse/HIVE-26244
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HIVE-26335) Partition params dit not updated after calling Hive.loadPartition

2022-06-17 Thread zhangdonglin (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangdonglin updated HIVE-26335:

Description: 
Hi,

   I found that when partition A already exists,   after calling 
Hive.loadPartition to load data into partition A, the partition params in table 
PARTITION_PARAMS was not updated. even I set hasFollowingStatsTask=false.

   The reason is below, newTPart was set to oldPart when old partition exists, 
{code:java}
Partition newTPart = oldPart != null ? oldPart : new Partition(tbl, partSpec, 
newPartPath); {code}
   Due to this, when calling alter_partition, oldPart info was send to 
metastore and it will not update partition params.

   

  was:
Hi,

   I found that when partition A already exists,   after calling 
Hive.loadPartition to load data into partition A, the partition params in table 
PARTITION_PARAMS was not updated. even I set hasFollowingStatsTask=false.

   The reason is below, newTPart was set to oldPart, 
{code:java}
Partition newTPart = oldPart != null ? oldPart : new Partition(tbl, partSpec, 
newPartPath); {code}
   Due to this, when calling alter_partition, oldPart info was send to 
metastore and it will not update partition params.


> Partition params dit not updated after calling Hive.loadPartition
> -
>
> Key: HIVE-26335
> URL: https://issues.apache.org/jira/browse/HIVE-26335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: All Versions
>Reporter: zhangdonglin
>Priority: Major
>
> Hi,
>    I found that when partition A already exists,   after calling 
> Hive.loadPartition to load data into partition A, the partition params in 
> table PARTITION_PARAMS was not updated. even I set 
> hasFollowingStatsTask=false.
>    The reason is below, newTPart was set to oldPart when old partition 
> exists, 
> {code:java}
> Partition newTPart = oldPart != null ? oldPart : new Partition(tbl, partSpec, 
> newPartPath); {code}
>    Due to this, when calling alter_partition, oldPart info was send to 
> metastore and it will not update partition params.
>    



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

46 matches

Mail list logo