[jira] [Work logged] (HIVE-25317) Relocate dependencies in shaded hive-exec module

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25317?focusedWorklogId=641468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641468
 ]

ASF GitHub Bot logged work on HIVE-25317:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 02:53
Start Date: 25/Aug/21 02:53
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #2459:
URL: https://github.com/apache/hive/pull/2459#discussion_r695352477



##
File path: llap-server/pom.xml
##
@@ -38,6 +38,7 @@
   org.apache.hive
   hive-exec
   ${project.version}
+  core

Review comment:
   @sunchao do we need to have similar change on master first?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641468)
Time Spent: 2.5h  (was: 2h 20m)

> Relocate dependencies in shaded hive-exec module
> 
>
> Key: HIVE-25317
> URL: https://issues.apache.org/jira/browse/HIVE-25317
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When we want to use shaded version of hive-exec (i.e., w/o classifier), more 
> dependencies conflict with Spark. We need to relocate these dependencies too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) Concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
Description: In the Linux environment, adding multiple jars concurrently 
through HiveCli or JDBC will increase the system cpu and even affect the 
service. Finally, we found that when the add jar is executed, the FileUtil 
chmod method is used to grant permissions to the downloaded jar file. The 
performance of this method is very low. So we use the setPosixFilePermissions 
method of the Files class to test. The performance is seventy to eighty times 
that of FileUtil (the same file is given permissions in multiple cycles, when 
it is cycled 1000 times), and as the number of cycles increases, the gap 
becomes larger and larger. But the file requires jdk7+, which is not friendly 
to windows. Therefore, if you use the setPosixFilePermissions method of the 
Files class to grant permissions to files in an operating system that conforms 
to the posix specification(tested on Mac and Linux), the performance will be 
improved.  (was: In the Linux environment, adding multiple jars concurrently 
through HiveCli or JDBC will increase the system cpu and even affect the 
service. Finally, we found that when the add jar is executed, the FileUtil 
chmod method is used to grant permissions to the downloaded jar file. The 
performance of this method is very low. So we use the setPosixFilePermissions 
method of the Files class to test. The performance is seventy to eighty times 
that of FileUtil (the same file is given permissions in multiple cycles, when 
it is cycled 1000 times). But the file requires jdk7+, which is not friendly to 
windows. Therefore, if you use the setPosixFilePermissions method of the Files 
class to grant permissions to files in an operating system that conforms to the 
posix specification(tested on Mac and Linux), the performance will be improved.)

> Concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.jpg, HIVE-25474.patch, PermissionTest.java
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times), and as the number of cycles increases, the gap becomes larger and 
> larger. But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) Concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
Summary: Concurrency add jars cause hiveserver2 sys cpu to high  (was: 
concurrency add jars cause hiveserver2 sys cpu to high)

> Concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.jpg, HIVE-25474.patch, PermissionTest.java
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times). But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
Attachment: PermissionTest.java

> concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.jpg, HIVE-25474.patch, PermissionTest.java
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times). But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
Attachment: HIVE-25474.jpg

> concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.jpg, HIVE-25474.patch
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times). But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
Attachment: HIVE-25474.patch
Status: Patch Available  (was: In Progress)

> concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.patch
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times). But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
Attachment: (was: HIVE-25474.patch)

> concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.patch
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times). But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25476) Remove Unused Dependencies for JDBC Driver

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25476:
--
Labels: pull-request-available  (was: )

> Remove Unused Dependencies for JDBC Driver
> --
>
> Key: HIVE-25476
> URL: https://issues.apache.org/jira/browse/HIVE-25476
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am using JDBC driver in a project and was very surprised by the number of 
> dependencies it has.  Remove some unnecessary dependencies to make it a 
> little easier to work with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25476) Remove Unused Dependencies for JDBC Driver

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25476?focusedWorklogId=641437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641437
 ]

ASF GitHub Bot logged work on HIVE-25476:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 01:22
Start Date: 25/Aug/21 01:22
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #2599:
URL: https://github.com/apache/hive/pull/2599


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641437)
Remaining Estimate: 0h
Time Spent: 10m

> Remove Unused Dependencies for JDBC Driver
> --
>
> Key: HIVE-25476
> URL: https://issues.apache.org/jira/browse/HIVE-25476
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am using JDBC driver in a project and was very surprised by the number of 
> dependencies it has.  Remove some unnecessary dependencies to make it a 
> little easier to work with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25477) Clean Up JDBC Code

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25477:
--
Labels: pull-request-available  (was: )

> Clean Up JDBC Code
> --
>
> Key: HIVE-25477
> URL: https://issues.apache.org/jira/browse/HIVE-25477
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Remove unused imports
>  * Remove unused code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25477) Clean Up JDBC Code

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25477?focusedWorklogId=641436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641436
 ]

ASF GitHub Bot logged work on HIVE-25477:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 01:21
Start Date: 25/Aug/21 01:21
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2600:
URL: https://github.com/apache/hive/pull/2600#issuecomment-905094983


   @nrg4878 Review please? :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641436)
Remaining Estimate: 0h
Time Spent: 10m

> Clean Up JDBC Code
> --
>
> Key: HIVE-25477
> URL: https://issues.apache.org/jira/browse/HIVE-25477
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Remove unused imports
>  * Remove unused code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23571) [CachedStore] Add ValidWriteIdList to SharedCache.TableWrapper

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23571?focusedWorklogId=641415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641415
 ]

ASF GitHub Bot logged work on HIVE-23571:
-

Author: ASF GitHub Bot
Created on: 25/Aug/21 00:12
Start Date: 25/Aug/21 00:12
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2128:
URL: https://github.com/apache/hive/pull/2128


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641415)
Time Spent: 4.5h  (was: 4h 20m)

> [CachedStore] Add ValidWriteIdList to SharedCache.TableWrapper
> --
>
> Key: HIVE-23571
> URL: https://issues.apache.org/jira/browse/HIVE-23571
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Add ValidWriteIdList to SharedCache.TableWrapper. This would be used in 
> deciding whether a given read request can be served from the cache or we have 
> to reload it from the backing database. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=641392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641392
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 23:45
Start Date: 24/Aug/21 23:45
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r695289418



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {
+lcv.lengths[index] = 0;
+lcv.isNull[index] = true;
+lcv.noNulls = false;
+  }
   elements.add(lastValue);
 } while (fetchNextValue(category) && (repetitionLevel != 0));
 
-lcv.isNull[index] = false;
 lcv.lengths[index] = elements.size() - lcv.offsets[index];

Review comment:
   good catch, I'm removing the assignment in the loop, because this outer 
assignment is valid under all circumstances




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641392)
Time Spent: 3h 10m  (was: 3h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=641387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641387
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 23:40
Start Date: 24/Aug/21 23:40
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r695287365



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {

Review comment:
   the original intention here was to sign if there is a NULL value instead 
of a list, which is happens in definitionLevel == 0, I'll change this part and 
add some comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641387)
Time Spent: 3h  (was: 2h 50m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> 

[jira] [Updated] (HIVE-25479) Browser SSO auth may fail intermittently on chrome browser in virtual environments

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25479:
--
Labels: pull-request-available  (was: )

> Browser SSO auth may fail intermittently on chrome browser in virtual 
> environments
> --
>
> Key: HIVE-25479
> URL: https://issues.apache.org/jira/browse/HIVE-25479
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When browser based SSO is enabled the Hive JDBC driver might miss the POST 
> requests coming from the browser which provide the one-time token issued by 
> HS2s after the SAML flow completes. The issue was observed mostly in virtual 
> environments on Windows.
> The issue seems to be that when the driver binds to a port even though the 
> port is in LISTEN state, if the browser issues posts request on the port 
> before it goes into ACCEPT state the result is non-deterministic. On native 
> OSes we observed that the connection is buffered and is received by the 
> driver when it begins accepting the connections. In case of VMs it is 
> observed that even though the connection is buffered and presented when the 
> port goes into ACCEPT mode, the payload of the request or the connection 
> itself is lost. This race condition causes the driver to wait for the browser 
> until it timesout and the browser keeps waiting for a response from the 
> driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25479) Browser SSO auth may fail intermittently on chrome browser in virtual environments

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25479?focusedWorklogId=641335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641335
 ]

ASF GitHub Bot logged work on HIVE-25479:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 21:07
Start Date: 24/Aug/21 21:07
Worklog Time Spent: 10m 
  Work Description: vihangk1 opened a new pull request #2601:
URL: https://github.com/apache/hive/pull/2601


   ### What changes were proposed in this pull request?
   This patch fixes a race condition on the JDBC driver side when it brings up 
a browser to do a SAML based SSO authentication. The race condition occurs 
sometimes in virtual environment and has to do with the port state when the 
browser sends a POST request to the driver. More details are available in JIRA.
   
   ### Why are the changes needed?
   To fix the bug.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   The issue is not reproducible on my dev machine. However, I added some new 
tests to cover the new changes and additionally the patch was manually verified 
on Windows 10 VMs where the issue was reproducible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641335)
Remaining Estimate: 0h
Time Spent: 10m

> Browser SSO auth may fail intermittently on chrome browser in virtual 
> environments
> --
>
> Key: HIVE-25479
> URL: https://issues.apache.org/jira/browse/HIVE-25479
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When browser based SSO is enabled the Hive JDBC driver might miss the POST 
> requests coming from the browser which provide the one-time token issued by 
> HS2s after the SAML flow completes. The issue was observed mostly in virtual 
> environments on Windows.
> The issue seems to be that when the driver binds to a port even though the 
> port is in LISTEN state, if the browser issues posts request on the port 
> before it goes into ACCEPT state the result is non-deterministic. On native 
> OSes we observed that the connection is buffered and is received by the 
> driver when it begins accepting the connections. In case of VMs it is 
> observed that even though the connection is buffered and presented when the 
> port goes into ACCEPT mode, the payload of the request or the connection 
> itself is lost. This race condition causes the driver to wait for the browser 
> until it timesout and the browser keeps waiting for a response from the 
> driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25317) Relocate dependencies in shaded hive-exec module

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25317?focusedWorklogId=641313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641313
 ]

ASF GitHub Bot logged work on HIVE-25317:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 20:38
Start Date: 24/Aug/21 20:38
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #2459:
URL: https://github.com/apache/hive/pull/2459#discussion_r695185072



##
File path: llap-server/pom.xml
##
@@ -38,6 +38,7 @@
   org.apache.hive
   hive-exec
   ${project.version}
+  core

Review comment:
   As more dependencies are relocated here, some modules if they depends on 
non-core artifact, will cause class not found error...
   
   The motivation is because we want to use shaded version of hive-exec (i.e., 
w/o classifier) in Spark to make sure it doesn't conflict guava version there. 
But there are more dependencies conflict with Spark. We need to relocate these 
dependencies too..
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641313)
Time Spent: 2h 20m  (was: 2h 10m)

> Relocate dependencies in shaded hive-exec module
> 
>
> Key: HIVE-25317
> URL: https://issues.apache.org/jira/browse/HIVE-25317
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When we want to use shaded version of hive-exec (i.e., w/o classifier), more 
> dependencies conflict with Spark. We need to relocate these dependencies too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25408) AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for Authorization.

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25408?focusedWorklogId=641280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641280
 ]

ASF GitHub Bot logged work on HIVE-25408:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 19:35
Start Date: 24/Aug/21 19:35
Worklog Time Spent: 10m 
  Work Description: yongzhi merged pull request #2560:
URL: https://github.com/apache/hive/pull/2560


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641280)
Time Spent: 40m  (was: 0.5h)

> AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for 
> Authorization. 
> -
>
> Key: HIVE-25408
> URL: https://issues.apache.org/jira/browse/HIVE-25408
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, Hive is sending an empty list in the Hive Privilege Objects for 
> authorization when a user does the following operation: alter table foo set 
> owner user user_name;
> We should be sending the input/objects related to the table in Hive privilege 
> objects for authorization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25479) Browser SSO auth may fail intermittently on chrome browser in virtual environments

2021-08-24 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-25479:
--


> Browser SSO auth may fail intermittently on chrome browser in virtual 
> environments
> --
>
> Key: HIVE-25479
> URL: https://issues.apache.org/jira/browse/HIVE-25479
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When browser based SSO is enabled the Hive JDBC driver might miss the POST 
> requests coming from the browser which provide the one-time token issued by 
> HS2s after the SAML flow completes. The issue was observed mostly in virtual 
> environments on Windows.
> The issue seems to be that when the driver binds to a port even though the 
> port is in LISTEN state, if the browser issues posts request on the port 
> before it goes into ACCEPT state the result is non-deterministic. On native 
> OSes we observed that the connection is buffered and is received by the 
> driver when it begins accepting the connections. In case of VMs it is 
> observed that even though the connection is buffered and presented when the 
> port goes into ACCEPT mode, the payload of the request or the connection 
> itself is lost. This race condition causes the driver to wait for the browser 
> until it timesout and the browser keeps waiting for a response from the 
> driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25478) Temp file left over after ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS

2021-08-24 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-25478:
---


> Temp file left over after ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS
> -
>
> Key: HIVE-25478
> URL: https://issues.apache.org/jira/browse/HIVE-25478
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
>
> The dot staging file (".hive-staging") file is not removed at the end of the 
> ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS operation as it is for say an 
> INSERT that does automatic statistics collection. I expected it would be 
> deleted after the Stats Work stage.
> Any ideas where in the code to add automatic deletion (hook)?
> hdfs dfs -ls /hive/warehouse/managed/table_orc
> Found 2 items
> drwxr-xr-x   - hive supergroup  0 2021-08-24 17:19 
> /hive/warehouse/managed/table_orc/.hive-staging_hive_2021-08-24_17-19-17_228_4856027533912221506-7
> drwxr-xr-x   - hive supergroup  0 2021-08-24 07:17 
> /hive/warehouse/managed/table_orc/delta_001_001_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2021-08-24 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403937#comment-17403937
 ] 

Panagiotis Garefalakis commented on HIVE-24316:
---

Hey [~glapark] thanks for bringing this up -- taking a look at 
MemoryManagerImpl looks like checkMemory() is the new method that determines if 
the scale has changed and since ORC-361 removed getTotalMemoryPool() calls from 
multiple places we are loosing the effect of controlling the memory pool.

The intention behind  LlapAwareMemoryManager  was to have memory per executor 
instead of the entire heap since multiple writers are involved. An idea could 
be to restore getTotalMemoryPool calls where needed .

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.3
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25477) Clean Up JDBC Code

2021-08-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25477:
-


> Clean Up JDBC Code
> --
>
> Key: HIVE-25477
> URL: https://issues.apache.org/jira/browse/HIVE-25477
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> * Remove unused imports
>  * Remove unused code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25476) Remove Unused Dependencies for JDBC Driver

2021-08-24 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25476:
-


> Remove Unused Dependencies for JDBC Driver
> --
>
> Key: HIVE-25476
> URL: https://issues.apache.org/jira/browse/HIVE-25476
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> I am using JDBC driver in a project and was very surprised by the number of 
> dependencies it has.  Remove some unnecessary dependencies to make it a 
> little easier to work with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2021-08-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403902#comment-17403902
 ] 

Dongjoon Hyun commented on HIVE-24316:
--

cc [~omalley]

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.3
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24316) Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2021-08-24 Thread Sungwoo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403893#comment-17403893
 ] 

Sungwoo commented on HIVE-24316:


Hello,

It seems that with ORC-361, the use of MemoryManagerImpl in 
LlapAwareMemoryManager is inconsistent. 

Before merging ORC-361, LlapAwareMemoryManager sets its own totalMemoryPool and 
MemoryManagerImpl accesses totalMemoryPool via getTotalMemoryPool(), so 
everything is fine.

With ORC-361 merged, we have the following:

1. LlapAwareMemoryManager sets its own totalMemoryPool as a private field.
 2. MemoryManagerImpl sets its own totalMemoryPool as a private field.
 3. LlapAwareMemoryManager overrides getTotalMemoryPool() using its own 
totalMemoryPool.

Now it is unclear whether or not getTotalMemoryPool() should be overridden.

Here are my thoughts on ORC-361:

1. Is MemoryManagerImpl intended to coordinate all threads writing to ORC files 
inside a process (like LLAP Daemon)? Then is it necessary to create 
LlapAwareMemoryManager as a ThreadLocal object? Why not just call 
OrcFile.getStaticMemoryManager() to obtain the shared MemoryManagerImpl?

3. LlapAwareMemoryManager sets its own totalMemoryPool:
{code:java}
  long memPerExecutor = LlapDaemonInfo.INSTANCE.getMemoryPerExecutor();
  totalMemoryPool = (long) (memPerExecutor * maxLoad);
{code}
>From my understanding, this has no effect because MemoryManagerImpl sets its 
>own totalMemoryPool.

Any comment would be appreciated.

> Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-24316
> URL: https://issues.apache.org/jira/browse/HIVE-24316
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.1.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.3
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This will bring eleven bug fixes.
>  * ORC 1.5.7: [https://issues.apache.org/jira/projects/ORC/versions/12345702]
>  * ORC 1.5.8: [https://issues.apache.org/jira/projects/ORC/versions/12346462]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
Summary: concurrency add jars cause hiveserver2 sys cpu to high  (was: 
Improvement concurrency add jars cause hiveserver2 sys cpu to high)

> concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.patch
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times). But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) Improvement concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
Attachment: HIVE-25474.patch

> Improvement concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.patch
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times). But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) Improvement concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
Attachment: (was: HIVE-25474.patch)

> Improvement concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.patch
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times). But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25474) Improvement concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guangbao zhao updated HIVE-25474:
-
   Fix Version/s: (was: 3.1.2)
Target Version/s:   (was: 3.1.2)
 Description: In the Linux environment, adding multiple jars 
concurrently through HiveCli or JDBC will increase the system cpu and even 
affect the service. Finally, we found that when the add jar is executed, the 
FileUtil chmod method is used to grant permissions to the downloaded jar file. 
The performance of this method is very low. So we use the 
setPosixFilePermissions method of the Files class to test. The performance is 
seventy to eighty times that of FileUtil (the same file is given permissions in 
multiple cycles, when it is cycled 1000 times). But the file requires jdk7+, 
which is not friendly to windows. Therefore, if you use the 
setPosixFilePermissions method of the Files class to grant permissions to files 
in an operating system that conforms to the posix specification(tested on Mac 
and Linux), the performance will be improved.  (was: In the Linux environment, 
when there are multiple concurrent add jars through HiveCli or JDBC, the system 
cpu will increase. The currently used FileUtil.chmod(dest, "ugo+rx", true); 
method is used for file authorization, However, in jdk7+, can use 
Files.setPosixFilePermissions(path, perms); for file authorization. The 
performance is seventy to eighty times that of the above. Why not apply this 
method?)
 Summary: Improvement concurrency add jars cause hiveserver2 sys 
cpu to high  (was: concurrency add jars cause hiveserver2 sys cpu to high)

> Improvement concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.patch
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times). But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25331) Create database query doesn't create MANAGEDLOCATION directory

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25331?focusedWorklogId=641095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641095
 ]

ASF GitHub Bot logged work on HIVE-25331:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 13:30
Start Date: 24/Aug/21 13:30
Worklog Time Spent: 10m 
  Work Description: ujc714 commented on a change in pull request #2478:
URL: https://github.com/apache/hive/pull/2478#discussion_r694851572



##
File path: 
ql/src/test/results/clientpositive/llap/alter_change_db_location.q.out
##
@@ -11,7 +11,7 @@ PREHOOK: Input: database:newdb
 POSTHOOK: query: describe database extended newDB
 POSTHOOK: type: DESCDATABASE
 POSTHOOK: Input: database:newdb
-newdb  location/in/testhive_test_user  USER

+ A masked pattern was here 

Review comment:
   Actually the original output is like:
   newdblocation/in/test
file:/home/robbie/hive/itests/qtest/target/localfs/warehouse/newdb.db   
hive_test_user  USER
   
   The managedlocation is not empty. Because of the pattern "file:/", 
QOutProcessor masks the whole line.   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641095)
Time Spent: 1.5h  (was: 1h 20m)

> Create database query doesn't create MANAGEDLOCATION directory
> --
>
> Key: HIVE-25331
> URL: https://issues.apache.org/jira/browse/HIVE-25331
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> If we don't assign MANAGEDLOCATION in a "create database" query, the 
> MANAGEDLOCATION will be NULL so HMS doesn't create the directory. In this 
> case, a CTAS query immediately after the CREATE DATABASE query might fail in 
> MOVE task due to "destination's parent does not exist". I can use the 
> following script to reproduce this issue:
> {code:java}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create database testdb location '/tmp/testdb.db';
> create table testdb.test as select 1;
> {code}
> If the staging directory is under the MANAGEDLOCATION directory, the CTAS 
> query is fine as the MANAGEDLOCATION directory is created while creating the 
> staging directory. Since we set LOCATION to a default directory when LOCATION 
> is not assigned in the CREATE DATABASE query, I believe it's worth to set 
> MANAGEDLOCATION to a default directory, too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25329) CTAS creates a managed table as non-ACID table

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25329?focusedWorklogId=641086=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641086
 ]

ASF GitHub Bot logged work on HIVE-25329:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 13:20
Start Date: 24/Aug/21 13:20
Worklog Time Spent: 10m 
  Work Description: ujc714 commented on a change in pull request #2477:
URL: https://github.com/apache/hive/pull/2477#discussion_r694842778



##
File path: ql/src/test/queries/clientpositive/create_table.q
##
@@ -0,0 +1,39 @@
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.create.as.external.legacy=true;
+
+-- When hive.create.as.external.legacy is true, the tables created with
+-- 'managed' or 'transactional' are ACID tables but the tables create
+-- without 'managed' and 'transactional' are non-ACID tables.
+-- Note: managed non-ACID tables are allowed because tables are not
+-- transformed when hive.in.test is true.
+
+-- Create tables with 'transactional'. These tables have table property
+-- 'transactional'='true'
+create transactional table test11 as select 1;
+show create table test11;
+describe formatted test11;
+
+create transactional table test12 as select 1;

Review comment:
   I'll change the test cases then rebase and submit again :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641086)
Time Spent: 1h 40m  (was: 1.5h)

> CTAS creates a managed table as non-ACID table
> --
>
> Key: HIVE-25329
> URL: https://issues.apache.org/jira/browse/HIVE-25329
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> According to HIVE-22158,  MANAGED tables should be ACID tables only. When we 
> set hive.create.as.external.legacy to true, the query like 'create managed 
> table as select 1' creates a non-ACID table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25329) CTAS creates a managed table as non-ACID table

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25329?focusedWorklogId=641065=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641065
 ]

ASF GitHub Bot logged work on HIVE-25329:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 12:54
Start Date: 24/Aug/21 12:54
Worklog Time Spent: 10m 
  Work Description: ujc714 commented on a change in pull request #2477:
URL: https://github.com/apache/hive/pull/2477#discussion_r694821704



##
File path: 
iceberg/iceberg-handler/src/test/results/positive/truncate_force_iceberg_table.q.out
##
@@ -85,7 +85,7 @@ Retention:0
  A masked pattern was here 
 Table Type:EXTERNAL_TABLE   
 Table Parameters:   
-   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
+   COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\",\"value\":\"true\"}}

Review comment:
   The iceberg tests failed after I rebased. I don't think this change is 
related to the code in SemanticAnalyzer.java. HIVE-25276 also changed these 
test files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641065)
Time Spent: 1.5h  (was: 1h 20m)

> CTAS creates a managed table as non-ACID table
> --
>
> Key: HIVE-25329
> URL: https://issues.apache.org/jira/browse/HIVE-25329
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> According to HIVE-22158,  MANAGED tables should be ACID tables only. When we 
> set hive.create.as.external.legacy to true, the query like 'create managed 
> table as select 1' creates a non-ACID table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=641044=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641044
 ]

ASF GitHub Bot logged work on HIVE-25429:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 12:02
Start Date: 24/Aug/21 12:02
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2563:
URL: https://github.com/apache/hive/pull/2563#discussion_r694770338



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetricsOnTez.java
##
@@ -17,42 +17,59 @@
  */
 package org.apache.hadoop.hive.ql.txn.compactor;
 
-import com.codahale.metrics.Gauge;
 import org.apache.commons.lang3.RandomStringUtils;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hive.common.metrics.common.MetricsFactory;
-import org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics;
 import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
 import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
 import 
org.apache.hadoop.hive.ql.txn.compactor.metrics.DeltaFilesMetricReporter;
+import org.apache.tez.dag.api.TezConfiguration;
+import org.junit.After;
 import org.junit.Assert;
 import org.junit.Test;
 
 import java.text.MessageFormat;
 import java.util.HashMap;
-import java.util.Map;
 import java.util.concurrent.TimeUnit;
 
-import static 
org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.equivalent;
-import static 
org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.gaugeToMap;
+import static 
org.apache.hadoop.hive.ql.txn.compactor.TestDeltaFilesMetrics.gaugeToMap;
+import static 
org.apache.hadoop.hive.ql.txn.compactor.TestDeltaFilesMetrics.verifyMetricsMatch;
 import static 
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.executeStatementOnDriver;
 
 public class TestCompactionMetricsOnTez extends CompactorOnTezTest {
 
-  @Test
-  public void testDeltaFilesMetric() throws Exception {
-MetricsFactory.close();
-HiveConf conf = driver.getConf();
+  /**
+   * Use {@link 
CompactorOnTezTest#setupWithConf(org.apache.hadoop.hive.conf.HiveConf)} when 
HiveConf is
+   * configured to your liking.
+   */
+  @Override
+  public void setup() {
+  }
 
-HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, 
true);
-MetricsFactory.init(conf);
+  @After
+  public void tearDown() {
+DeltaFilesMetricReporter.close();
+  }
 
+  private void configureMetrics(HiveConf conf) {
 HiveConf.setIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD, 0);
 HiveConf.setIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD, 0);
 HiveConf.setTimeVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, 1, 
TimeUnit.SECONDS);
 HiveConf.setTimeVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_CHECK_THRESHOLD, 0, 
TimeUnit.SECONDS);
 HiveConf.setFloatVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_PCT_THRESHOLD, 0.7f);
+  }
+
+  @Test
+  public void testDeltaFilesMetric() throws Exception {
+HiveConf conf = new HiveConf();
+HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, 
true);
+configureMetrics(conf);
+setupWithConf(conf);

Review comment:
   We need to be able to set different (conflicting) configs before 
setupWithConf is called. setup() would need to be parametrized. Do you know how 
to do that?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetricsOnTez.java
##
@@ -105,12 +118,100 @@ public void testDeltaFilesMetric() throws Exception {
 executeStatementOnDriver("select avg(b) from " + tableName, driver);
 Thread.sleep(1000);
 
-Assert.assertTrue(
-  equivalent(
-new HashMap() {{
+verifyMetricsMatch(new HashMap() {{
   put(tableName + Path.SEPARATOR + partitionToday, "1");
-}}, gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS)));
+}}, gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS));
+  }
 
-DeltaFilesMetricReporter.close();
+  /**
+   * Queries shouldn't fail, but metrics should be 0, if tez.counters.max 
limit is passed.
+   * @throws Exception
+   */
+  @Test
+  public void testDeltaFilesMetricTezMaxCounters() throws Exception {
+HiveConf conf = new HiveConf();
+conf.setInt(TezConfiguration.TEZ_COUNTERS_MAX, 50);
+HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, 
true);
+configureMetrics(conf);
+setupWithConf(conf);
+
+MetricsFactory.close();
+MetricsFactory.init(conf);
+DeltaFilesMetricReporter.init(conf);
+
+String tableName = "test_metrics";
+

[jira] [Work started] (HIVE-25474) concurrency add jars cause hiveserver2 sys cpu to high

2021-08-24 Thread guangbao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25474 started by guangbao zhao.

> concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Fix For: 3.1.2
>
> Attachments: HIVE-25474.patch
>
>
> In the Linux environment, when there are multiple concurrent add jars through 
> HiveCli or JDBC, the system cpu will increase. The currently used 
> FileUtil.chmod(dest, "ugo+rx", true); method is used for file authorization, 
> However, in jdk7+, can use Files.setPosixFilePermissions(path, perms); for 
> file authorization. The performance is seventy to eighty times that of the 
> above. Why not apply this method?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=641022=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641022
 ]

ASF GitHub Bot logged work on HIVE-25429:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 10:05
Start Date: 24/Aug/21 10:05
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2563:
URL: https://github.com/apache/hive/pull/2563#discussion_r694684449



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetricsOnTez.java
##
@@ -105,12 +118,100 @@ public void testDeltaFilesMetric() throws Exception {
 executeStatementOnDriver("select avg(b) from " + tableName, driver);
 Thread.sleep(1000);
 
-Assert.assertTrue(
-  equivalent(
-new HashMap() {{
+verifyMetricsMatch(new HashMap() {{
   put(tableName + Path.SEPARATOR + partitionToday, "1");
-}}, gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS)));
+}}, gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS));
+  }
 
-DeltaFilesMetricReporter.close();
+  /**
+   * Queries shouldn't fail, but metrics should be 0, if tez.counters.max 
limit is passed.
+   * @throws Exception
+   */
+  @Test
+  public void testDeltaFilesMetricTezMaxCounters() throws Exception {
+HiveConf conf = new HiveConf();
+conf.setInt(TezConfiguration.TEZ_COUNTERS_MAX, 50);
+HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, 
true);
+configureMetrics(conf);
+setupWithConf(conf);
+
+MetricsFactory.close();
+MetricsFactory.init(conf);
+DeltaFilesMetricReporter.init(conf);
+
+String tableName = "test_metrics";
+CompactorOnTezTest.TestDataProvider testDataProvider = new 
CompactorOnTezTest.TestDataProvider();
+testDataProvider.createFullAcidTable(tableName, true, false);
+// Create 51 partitions
+for (int i = 0; i < 51; i++) {
+  executeStatementOnDriver("insert into " + tableName + " values('1', " + 
i * i + ", '" + i + "')", driver);
+}
+
+// Touch all partitions
+executeStatementOnDriver("select avg(b) from " + tableName, driver);
+Thread.sleep(1000);
+
+Assert.assertEquals(0, 
gaugeToMap(MetricsConstants.COMPACTION_NUM_DELTAS).size());
+Assert.assertEquals(0, 
gaugeToMap(MetricsConstants.COMPACTION_NUM_OBSOLETE_DELTAS).size());
+Assert.assertEquals(0, 
gaugeToMap(MetricsConstants.COMPACTION_NUM_SMALL_DELTAS).size());
+  }
+
+  /**
+   * Queries should succeed if additional acid metrics are disabled.
+   * @throws Exception
+   */
+  @Test
+  public void testDeltaFilesMetricWithMetricsDisabled() throws Exception {
+HiveConf conf = new HiveConf();
+HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_SERVER2_METRICS_ENABLED, 
false);
+MetastoreConf.setBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON, true);
+configureMetrics(conf);
+super.setupWithConf(conf);
+
+MetricsFactory.close();
+MetricsFactory.init(conf);
+
+String tableName = "test_metrics";
+CompactorOnTezTest.TestDataProvider testDataProvider = new 
CompactorOnTezTest.TestDataProvider();
+testDataProvider.createFullAcidTable(tableName, true, false);
+testDataProvider.insertTestDataPartitioned(tableName);
+
+executeStatementOnDriver("select avg(b) from " + tableName, driver);
+
+try {
+  Assert.assertEquals(0, 
gaugeToMap(MetricsConstants.COMPACTION_NUM_DELTAS).size());

Review comment:
   Would it be possible to move this assertion to function level?
   `@Test(expected = javax.management.InstanceNotFoundException.class)`

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetricsOnTez.java
##
@@ -17,42 +17,59 @@
  */
 package org.apache.hadoop.hive.ql.txn.compactor;
 
-import com.codahale.metrics.Gauge;
 import org.apache.commons.lang3.RandomStringUtils;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hive.common.metrics.common.MetricsFactory;
-import org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics;
 import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
 import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
 import 
org.apache.hadoop.hive.ql.txn.compactor.metrics.DeltaFilesMetricReporter;
+import org.apache.tez.dag.api.TezConfiguration;
+import org.junit.After;
 import org.junit.Assert;
 import org.junit.Test;
 
 import java.text.MessageFormat;
 import java.util.HashMap;
-import java.util.Map;
 import java.util.concurrent.TimeUnit;
 
-import static 
org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.equivalent;
-import static 
org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.gaugeToMap;
+import static 

[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641015
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:28
Start Date: 24/Aug/21 09:28
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694680303



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
##
@@ -247,6 +243,12 @@ public OrcEncodedDataReader(LowLevelCache lowLevelCache, 
BufferUsageManager buff
 this.jobConf = jobConf;
 // TODO: setFileMetadata could just create schema. Called in two places; 
clean up later.
 this.evolution = sef.createSchemaEvolution(fileMetadata.getSchema());
+
+fileIncludes = includes.generateFileIncludes(fileSchema);

Review comment:
   Ok.. it is covered by the tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641015)
Time Spent: 2h  (was: 1h 50m)

> Add LLAP IO support for Iceberg ORC tables
> --
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641014
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:27
Start Date: 24/Aug/21 09:27
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694679451



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
##
@@ -247,6 +243,12 @@ public OrcEncodedDataReader(LowLevelCache lowLevelCache, 
BufferUsageManager buff
 this.jobConf = jobConf;
 // TODO: setFileMetadata could just create schema. Called in two places; 
clean up later.
 this.evolution = sef.createSchemaEvolution(fileMetadata.getSchema());
+
+fileIncludes = includes.generateFileIncludes(fileSchema);

Review comment:
   What happens if there are multiple files with different schema?
   The test data is created this way:
   - Iceberg table created with 2 columns
   - Data inserted with 2 columns
   - Iceberg table schema modified
   - Data inserted with the modified schema
   ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641014)
Time Spent: 1h 50m  (was: 1h 40m)

> Add LLAP IO support for Iceberg ORC tables
> --
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641011
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:20
Start Date: 24/Aug/21 09:20
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694674515



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -2693,6 +2696,77 @@ public static TypeDescription 
getDesiredRowTypeDescr(Configuration conf,
 return result;
   }
 
+  /**
+   * Based on the file schema and the low level file includes provided in the 
SchemaEvolution instance, this method
+   * calculates which top level columns should be included i.e. if any of the 
nested columns inside complex types is
+   * required, then its relevant top level parent column will be considered as 
required (and thus the full subtree).
+   * Hive and LLAP currently only supports column pruning on the first level, 
thus we need to calculate this ourselves.
+   * @param evolution
+   * @return bool array of include values, where 0th element is root struct, 
and any Nth element is a first level
+   * column within that
+   */
+  public static boolean[] firstLevelFileIncludes(SchemaEvolution evolution) {
+// This is the leaf level type description include bool array
+boolean[] lowLevelIncludes = evolution.getFileIncluded();
+Map idMap = new HashMap<>();
+Map parentIdMap = new HashMap<>();
+idToFieldSchemaMap(evolution.getFileSchema(), idMap, parentIdMap);
+
+// Root + N top level columns...
+boolean[] result = new 
boolean[evolution.getFileSchema().getChildren().size() + 1];
+
+Set requiredTopLevelSchemaIds = new HashSet<>();
+for (int i = 1; i < lowLevelIncludes.length; ++i) {
+  if (lowLevelIncludes[i]) {
+int topLevelParentId = getTopLevelParentId(i, parentIdMap);
+if (!requiredTopLevelSchemaIds.contains(topLevelParentId)) {
+  requiredTopLevelSchemaIds.add(topLevelParentId);
+}
+  }
+}
+
+List topLevelFields = 
evolution.getFileSchema().getChildren();
+
+for (int typeDescriptionId : requiredTopLevelSchemaIds) {
+  result[IntStream.range(0, topLevelFields.size()).filter(
+  i -> typeDescriptionId == 
topLevelFields.get(i).getId()).findFirst().getAsInt() + 1] = true;
+}
+
+return result;
+  }
+
+  /**
+   * Recursively builds 2 maps:
+   *  ID to type description
+   *  child to parent type description

Review comment:
   child to partent id?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -2693,6 +2696,77 @@ public static TypeDescription 
getDesiredRowTypeDescr(Configuration conf,
 return result;
   }
 
+  /**
+   * Based on the file schema and the low level file includes provided in the 
SchemaEvolution instance, this method
+   * calculates which top level columns should be included i.e. if any of the 
nested columns inside complex types is
+   * required, then its relevant top level parent column will be considered as 
required (and thus the full subtree).
+   * Hive and LLAP currently only supports column pruning on the first level, 
thus we need to calculate this ourselves.
+   * @param evolution
+   * @return bool array of include values, where 0th element is root struct, 
and any Nth element is a first level
+   * column within that
+   */
+  public static boolean[] firstLevelFileIncludes(SchemaEvolution evolution) {
+// This is the leaf level type description include bool array
+boolean[] lowLevelIncludes = evolution.getFileIncluded();
+Map idMap = new HashMap<>();
+Map parentIdMap = new HashMap<>();
+idToFieldSchemaMap(evolution.getFileSchema(), idMap, parentIdMap);
+
+// Root + N top level columns...
+boolean[] result = new 
boolean[evolution.getFileSchema().getChildren().size() + 1];
+
+Set requiredTopLevelSchemaIds = new HashSet<>();
+for (int i = 1; i < lowLevelIncludes.length; ++i) {
+  if (lowLevelIncludes[i]) {
+int topLevelParentId = getTopLevelParentId(i, parentIdMap);
+if (!requiredTopLevelSchemaIds.contains(topLevelParentId)) {
+  requiredTopLevelSchemaIds.add(topLevelParentId);
+}
+  }
+}
+
+List topLevelFields = 
evolution.getFileSchema().getChildren();
+
+for (int typeDescriptionId : requiredTopLevelSchemaIds) {
+  result[IntStream.range(0, topLevelFields.size()).filter(
+  i -> typeDescriptionId == 
topLevelFields.get(i).getId()).findFirst().getAsInt() + 1] = true;
+}
+
+return result;
+  }
+
+  /**
+   * Recursively builds 2 maps:
+   *  ID to type description
+   *  child to parent type description

Review comment:
   child to 

[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641010
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:19
Start Date: 24/Aug/21 09:19
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694673583



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -2693,6 +2696,77 @@ public static TypeDescription 
getDesiredRowTypeDescr(Configuration conf,
 return result;
   }
 
+  /**
+   * Based on the file schema and the low level file includes provided in the 
SchemaEvolution instance, this method
+   * calculates which top level columns should be included i.e. if any of the 
nested columns inside complex types is
+   * required, then its relevant top level parent column will be considered as 
required (and thus the full subtree).
+   * Hive and LLAP currently only supports column pruning on the first level, 
thus we need to calculate this ourselves.

Review comment:
   So for ACID tables column pruning is not working, since the data is 
contained in the row struct?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641010)
Time Spent: 1.5h  (was: 1h 20m)

> Add LLAP IO support for Iceberg ORC tables
> --
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641009
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:18
Start Date: 24/Aug/21 09:18
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694672513



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java
##
@@ -720,13 +750,54 @@ public SchemaEvolution 
createSchemaEvolution(TypeDescription fileSchema) {
   readerSchema, readerLogicalColumnIds);
   Reader.Options options = new Reader.Options(jobConf)
   .include(readerIncludes).includeAcidColumns(includeAcidColumns);
-  return new SchemaEvolution(fileSchema, readerSchema, options);
+  evolution = new SchemaEvolution(fileSchema, readerSchema, options);
+
+  generateLogicalOrderedColumnIds();
+  return evolution;
+}
+
+/**
+ * LLAP IO always returns the column vectors in the order as they are seen 
in the file.
+ * To support logical column reordering, we need to do a matching between 
file and read schemas.
+ * (this only supports one level of schema reordering, not within complex 
types, also not supported for ORC ACID)
+ */
+private void generateLogicalOrderedColumnIds() {

Review comment:
   Maybe some debug level logging?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641009)
Time Spent: 1h 20m  (was: 1h 10m)

> Add LLAP IO support for Iceberg ORC tables
> --
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641006
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:16
Start Date: 24/Aug/21 09:16
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694671199



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/ColumnVectorProducer.java
##
@@ -48,14 +48,17 @@
 boolean[] generateFileIncludes(TypeDescription fileSchema);
 List getPhysicalColumnIds();
 List getReaderLogicalColumnIds();
+

Review comment:
   nit: Why is this change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641006)
Time Spent: 1h  (was: 50m)

> Add LLAP IO support for Iceberg ORC tables
> --
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641007
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:16
Start Date: 24/Aug/21 09:16
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694671519



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/ColumnVectorProducer.java
##
@@ -48,14 +48,17 @@
 boolean[] generateFileIncludes(TypeDescription fileSchema);
 List getPhysicalColumnIds();
 List getReaderLogicalColumnIds();
+
 TypeDescription[] getBatchReaderTypes(TypeDescription fileSchema);
+

Review comment:
   nit: Why do we have this change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641007)
Time Spent: 1h 10m  (was: 1h)

> Add LLAP IO support for Iceberg ORC tables
> --
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641004
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:14
Start Date: 24/Aug/21 09:14
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694669605



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java
##
@@ -158,8 +167,11 @@ private LlapRecordReader(MapWork mapWork, JobConf job, 
FileSplit split,
 rbCtx = ctx != null ? ctx : LlapInputFormat.createFakeVrbCtx(mapWork);
 
 isAcidScan = AcidUtils.isFullAcidScan(jobConf);
-TypeDescription schema = OrcInputFormat.getDesiredRowTypeDescr(
-job, isAcidScan, Integer.MAX_VALUE);
+
+String icebergOrcSchema = 
job.get(ColumnProjectionUtils.ICEBERG_ORC_SCHEMA_STRING);

Review comment:
   This is a little strange here. I would try to avoid using Iceberg 
specific stuff in LLAP packages.
   Maybe a job config containing the requested schema?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641004)
Time Spent: 50m  (was: 40m)

> Add LLAP IO support for Iceberg ORC tables
> --
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641003
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:11
Start Date: 24/Aug/21 09:11
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694667660



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##
@@ -417,15 +419,21 @@ public OrcTail getOrcTailFromCache(Path path, 
Configuration jobConf, CacheTag ta
   }
 
   @Override
-  public RecordReader 
llapVectorizedOrcReaderForPath(Object fileKey, Path path, CacheTag tag, 
List tableIncludedCols,
-  JobConf conf, long offset, long length) throws IOException {
+  public RecordReader 
llapVectorizedOrcReaderForPath(Object fileKey, Path path,
+  CacheTag tag, List tableIncludedCols, JobConf conf, long 
offset, long length, Reporter reporter)
+  throws IOException {
 
-OrcTail tail = getOrcTailFromCache(path, conf, tag, fileKey);
+OrcTail tail = null;
+if (tag != null) {

Review comment:
   Why not put this inside the `getOrcTailFromCache`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641003)
Time Spent: 40m  (was: 0.5h)

> Add LLAP IO support for Iceberg ORC tables
> --
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=641002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641002
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 09:06
Start Date: 24/Aug/21 09:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694663951



##
File path: itests/src/test/resources/testconfiguration.properties
##
@@ -1251,4 +1251,11 @@ erasurecoding.only.query.files=\
 # tests that requires external database connection
 externalDB.llap.query.files=\
   dataconnector.q,\
-  dataconnector_mysql.q
\ No newline at end of file
+  dataconnector_mysql.q
+
+iceberg.llap.query.files=\

Review comment:
   We might want to use specific directories specified by the driver and 
then we do not have to use `testconfiguration.properties`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641002)
Time Spent: 0.5h  (was: 20m)

> Add LLAP IO support for Iceberg ORC tables
> --
>
> Key: HIVE-25453
> URL: https://issues.apache.org/jira/browse/HIVE-25453
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2021-08-24 Thread Marc Demierre (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105218#comment-17105218
 ] 

Marc Demierre edited comment on HIVE-23437 at 8/24/21, 8:54 AM:


We tried a workaround on the client side to ensure the calls are not 
simultaneous by delaying them. It didn't solve the issue, only made it rarer. 
We also observed a second instance of the problem which is slightly different:
 * T1:
 ** R1 creates the directory, then is paused/waiting
 * T2:
 ** R2 arrives, does not create the directory as it exists
 ** R2 creates the partition (wins the race on DB) and completes
 * T3:
 ** R1 resumes, sees that it failed the DB transaction, deletes the folder

Relevant logs (R1=2558, R2=2556):
{code:java}
 2020-05-11 20:00:00,944 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:00,945 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydomain.net  
ip=10.222.76.2  cmd=append_partition_by_name: db=myproj_dev_autodump 
tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2556: append_partition_by_name: 
db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydomain.net  
ip=10.222.76.2  cmd=append_partition_by_name: db=myproj_dev_autodump 
tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,481 INFO  [pool-7-thread-2558]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,521 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(352)) - Updating partition stats 
fast for: myproj_dev_debug_hive_4
2020-05-11 20:00:01,537 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(355)) - Updated size to 0
2020-05-11 20:00:01,764 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18'
 to trash at: 
hdfs://platform/user/kafka-dump/.Trash/Current/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,788 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@3254e57d" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydomain.net  
ip=10.222.76.2  cmd=append_partition_by_name: db=myproj_dev_autodump 
tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,869 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(203)) - 
AlreadyExistsException(message:Partition already 
exists:Partition(values:[ingestion, hourly, 2020-05-11, 18], 
dbName:myproj_dev_autodump, tableName:myproj_dev_debug_hive_4, createTime:0, 
lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, 
type:string, comment:null), FieldSchema(name:age, type:int, 

[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2021-08-24 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Description: 
There seems to be a race condition in Hive Metasore when issuing several 
concurrent partition creation requests for the same new partition.

In our case, this triggered due to Kafka Connect Hive integration, which fires 
simultaneous partition creation requests from all its tasks when syncing to 
Hive.

We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
the same in 3.1.2 (latest Hive release).

Our investigation pointed to the following code (here in Hive 2.1.0, the base 
for HDP 2.6.5):

[https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]

Same code in 3.1.2:

https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3202

The generic scenario is the following:
 # T1 (time period 1):
 ** R1 (request 1) creates the HDFS dir
 ** R2 also tries creating the HDFS dir
 ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
interverted)
 # T2:
 ** R1 creates the partition in metastore DB, all OK
 # T3:
 ** R2 tries to create partition in metastore DB, gets exception from DB 
because it exists. Rollback transaction.
 ** R2 thinks it created the directory (in fact they both did we do not know 
who), so it removes it
 # T4: State is invalid:
 ## Partition exists
 ## HDFS folder does not exist
 ## Some Hive/Spark queries fail when trying to use the folder

Here are some logs of the issue happening on our cluster in a standalone 
metastore (R1 = thread 2303, R2 = thread 2302):

{code:none}
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
db=myproj_autodump tbl=myproj_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
db=myproj_autodump tbl=myproj_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydoman.net   
   ip=10.222.76.1  cmd=append_partition_by_name
: db=myproj_autodump tbl=myproj_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydoman.net   
   ip=10.222.76.1  cmd=append_partition_by_name
: db=myproj_autodump tbl=myproj_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
e=2020-05-11/hour=11
2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
 to trash at: 
hdfs://platfrom/user/kafka-dump/.Trash/Current/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 ERROR [pool-7-thread-2302]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@548a5b6c" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
  Detail: Key ("PART_NAME", 
"TBL_ID")=(time=ingestion/bucket=hourly/date=2020-05-11/hour=11, 6015512) 
already exists.
at 

[jira] [Work logged] (HIVE-25453) Add LLAP IO support for Iceberg ORC tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25453?focusedWorklogId=640982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640982
 ]

ASF GitHub Bot logged work on HIVE-25453:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 08:01
Start Date: 24/Aug/21 08:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2586:
URL: https://github.com/apache/hive/pull/2586#discussion_r694599589



##
File path: data/conf/iceberg/llap/hive-site.xml
##
@@ -0,0 +1,394 @@
+
+
+
+
+
+
+
+hive.in.test
+true
+Internal marker for test. Used for masking env-dependent 
values
+
+
+
+hive.in.iceberg.test
+true
+
+
+
+
+
+
+
+
+
+
+hadoop.tmp.dir
+${test.tmp.dir}/hadoop-tmp
+A base for other temporary directories.
+
+
+
+hive.tez.container.size
+128
+
+
+
+
+hive.merge.tezfiles
+false
+Merge small files at the end of a Tez DAG
+
+
+
+hive.tez.input.format
+org.apache.hadoop.hive.ql.io.HiveInputFormat
+The default input format for tez. Tez groups splits in 
the AM.
+
+
+
+hive.exec.scratchdir
+${test.tmp.dir}/scratchdir
+Scratch space for Hive jobs
+
+
+
+datanucleus.schema.autoCreateAll
+true
+
+
+
+datanucleus.connectionPool.maxPoolSize
+4
+
+
+
+hive.exec.local.scratchdir
+${test.tmp.dir}/localscratchdir/
+Local scratch space for Hive jobs
+
+
+
+javax.jdo.option.ConnectionURL
+
jdbc:derby:memory:${test.tmp.dir}/junit_metastore_db;create=true
+
+
+
+hive.metastore.schema.verification
+false
+
+
+
+javax.jdo.option.ConnectionDriverName
+org.apache.derby.jdbc.EmbeddedDriver
+
+
+
+javax.jdo.option.ConnectionUserName
+APP
+
+
+
+javax.jdo.option.ConnectionPassword
+mine
+
+
+
+
+hive.metastore.warehouse.dir
+${test.warehouse.dir}
+
+
+
+
+hive.metastore.metadb.dir
+file://${test.tmp.dir}/metadb/
+
+Required by metastore server or if the uris argument below is not 
supplied
+
+
+
+
+test.log.dir
+${test.tmp.dir}/log/
+
+
+
+
+test.data.files
+${hive.root}/data/files
+
+
+
+
+test.data.scripts
+${hive.root}/data/scripts
+
+
+
+
+hive.jar.path
+
${maven.local.repository}/org/apache/hive/hive-exec/${hive.version}/hive-exec-${hive.version}.jar
+
+
+
+
+hive.metastore.rawstore.impl
+org.apache.hadoop.hive.metastore.ObjectStore
+Name of the class that implements 
org.apache.hadoop.hive.metastore.rawstore interface. This class is used to 
store and retrieval of raw metadata objects such as table, 
database
+
+
+
+hive.querylog.location
+${test.tmp.dir}/tmp
+Location of the structured hive logs
+
+
+
+hive.exec.pre.hooks
+org.apache.hadoop.hive.ql.hooks.PreExecutePrinter, 
org.apache.hadoop.hive.ql.hooks.EnforceReadOnlyTables
+Pre Execute Hook for Tests
+
+
+
+hive.exec.post.hooks
+org.apache.hadoop.hive.ql.hooks.PostExecutePrinter
+Post Execute Hook for Tests
+
+
+
+hive.support.concurrency
+false
+Whether hive supports concurrency or not. A zookeeper 
instance must be up and running for the default hive lock manager to support 
read-write locks.
+
+
+
+fs.pfile.impl
+org.apache.hadoop.fs.ProxyLocalFileSystem
+A proxy for local file system used for cross file system 
testing
+
+
+
+hive.exec.mode.local.auto
+false
+
+Let hive determine whether to run in local mode automatically
+Disabling this for tests so that minimr is not affected
+
+
+
+
+hive.auto.convert.join
+false
+Whether Hive enable the optimization about converting 
common join into mapjoin based on the input file size
+
+
+
+hive.ignore.mapjoin.hint
+true
+Whether Hive ignores the mapjoin hint
+
+
+
+io.sort.mb
+10
+
+
+
+hive.input.format
+org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
+The default input format, if it is not specified, the 
system assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 
19, whereas it is set to CombineHiveInputFormat for hadoop 20. The user can 
always overwrite it - if there is a bug in CombineHiveInputFormat, it can 
always be 

[jira] [Work logged] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25404?focusedWorklogId=640966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640966
 ]

ASF GitHub Bot logged work on HIVE-25404:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 07:00
Start Date: 24/Aug/21 07:00
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2568:
URL: https://github.com/apache/hive/pull/2568#discussion_r694551954



##
File path: ql/src/test/queries/clientpositive/merge_partitioned_insert.q
##
@@ -0,0 +1,19 @@
+--! qt:transactional
+
+drop table u;
+drop table t;

Review comment:
   I haven't found where is table `t` created but `t1` and `t2`.

##
File path: ql/src/test/queries/clientpositive/merge_partitioned_insert.q
##
@@ -0,0 +1,19 @@
+--! qt:transactional
+
+drop table u;
+drop table t;
+
+create table u(id integer);
+insert into u values(3);
+
+create table t1(id integer, value string default 'def');
+insert into t1 values(1,'xx');
+insert into t1 (id) values(2);
+
+merge into t1 t using u on t.id=u.id when not matched then insert (id) values 
(u.id);
+

Review comment:
   Do we have tests that checks the content of the target table after merge 
like `select * from t1` ?

##
File path: ql/src/test/queries/clientpositive/merge_partitioned_insert.q
##
@@ -0,0 +1,19 @@
+--! qt:transactional
+
+drop table u;
+drop table t;
+
+create table u(id integer);
+insert into u values(3);
+
+create table t1(id integer, value string default 'def');
+insert into t1 values(1,'xx');
+insert into t1 (id) values(2);
+
+merge into t1 t using u on t.id=u.id when not matched then insert (id) values 
(u.id);

Review comment:
   Adding `explain merge into t1 t using u on t.id=u.id when not matched 
then insert (id) values (u.id);` would help checking the right plan is 
generated. Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 640966)
Time Spent: 20m  (was: 10m)

> Inserts inside merge statements are rewritten incorrectly for partitioned 
> tables
> 
>
> Key: HIVE-25404
> URL: https://issues.apache.org/jira/browse/HIVE-25404
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> drop table u;drop table t;
> create table t(value string default 'def') partitioned by (id integer);
> create table u(id integer);
> {code}
> #1 id specified
> rewritten
> {code}
> FROM
>   `default`.`t`
>   RIGHT OUTER JOIN
>   `default`.`u`
>   ON `t`.`id`=`u`.`id`
> INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause
>   SELECT `u`.`id`,'x'
>WHERE `t`.`id` IS NULL
> {code}
> #2 when values is not specified
> {code}
> merge into t using u on t.id=u.id when not matched then insert (id) values 
> (u.id);
> {code}
> rewritten query:
> {code}
> FROM
>   `default`.`t`
>   RIGHT OUTER JOIN
>   `default`.`u`
>   ON `t`.`id`=`u`.`id`
> INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause
>   SELECT `u`.`id`
>WHERE `t`.`id` IS NULL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)