[jira] [Updated] (HIVE-25084) Incorrect aggregate results on bucketed table

2021-04-30 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-25084:
--
Attachment: test4.q

> Incorrect aggregate results on bucketed table
> -
>
> Key: HIVE-25084
> URL: https://issues.apache.org/jira/browse/HIVE-25084
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Priority: Major
> Attachments: test4.q
>
>
> Steps to repro
> {code:java}
> CREATE TABLE test_table(
> col1 int,
> col2 char(32),
> col3 varchar(3))
> CLUSTERED BY (col2)
>  SORTED BY (
>col2 ASC,
>col3 ASC,
>col1 ASC)
>  INTO 32 BUCKETS stored as orc;
> set hive.query.results.cache.enabled=false;
> insert into test_table values(2, "123456", "15");
> insert into test_table values(1, "123456", "15");
> SELECT col2, col3, max(col1) AS max_sequence FROM test_table GROUP BY col2, 
> col3;
> ==> LocalFetch correct result <==
> 123456 15 2 
> ==> Wrong result with Tez/Llap <==
> set hive.fetch.task.conversion=none;
> 123456 15 2 
> 123456 15 1 
> ==> Correct result with Tez/Llap disabling map aggregation <==
> set hive.map.aggr=false;
> 123456 15 2 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25083) Extra reviewer pattern

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25083?focusedWorklogId=591632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591632
 ]

ASF GitHub Bot logged work on HIVE-25083:
-

Author: ASF GitHub Bot
Created on: 30/Apr/21 16:34
Start Date: 30/Apr/21 16:34
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #2237:
URL: https://github.com/apache/hive/pull/2237


   Change-Id: I9f507147d8749a0eab4fcf7ea8ea24449a6f6024
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 591632)
Remaining Estimate: 0h
Time Spent: 10m

> Extra reviewer pattern
> --
>
> Key: HIVE-25083
> URL: https://issues.apache.org/jira/browse/HIVE-25083
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25083) Extra reviewer pattern

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25083:
--
Labels: pull-request-available  (was: )

> Extra reviewer pattern
> --
>
> Key: HIVE-25083
> URL: https://issues.apache.org/jira/browse/HIVE-25083
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25083) Extra reviewer pattern

2021-04-30 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-25083:
-


> Extra reviewer pattern
> --
>
> Key: HIVE-25083
> URL: https://issues.apache.org/jira/browse/HIVE-25083
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25082) Make updateTimezone a default method on SettableTreeReader

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25082?focusedWorklogId=591622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591622
 ]

ASF GitHub Bot logged work on HIVE-25082:
-

Author: ASF GitHub Bot
Created on: 30/Apr/21 16:21
Start Date: 30/Apr/21 16:21
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #2236:
URL: https://github.com/apache/hive/pull/2236


   Change-Id: I1585469ac7f6ec032fc666d467cb0725bff19633
   
   
   
   ### What changes were proposed in this pull request?
   Avoid useless TimestampStreamReader instance checks by making 
updateTimezone() a default method in SettableTreeReader
   
   
   ### Why are the changes needed?
   Cleaner code, less instance of checks
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 591622)
Remaining Estimate: 0h
Time Spent: 10m

> Make updateTimezone a default method on SettableTreeReader
> --
>
> Key: HIVE-25082
> URL: https://issues.apache.org/jira/browse/HIVE-25082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Avoid useless TimestampStreamReader instance checks by making 
> updateTimezone() a default method in SettableTreeReader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25082) Make updateTimezone a default method on SettableTreeReader

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25082:
--
Labels: pull-request-available  (was: )

> Make updateTimezone a default method on SettableTreeReader
> --
>
> Key: HIVE-25082
> URL: https://issues.apache.org/jira/browse/HIVE-25082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Avoid useless TimestampStreamReader instance checks by making 
> updateTimezone() a default method in SettableTreeReader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25082) Make updateTimezone a default method on SettableTreeReader

2021-04-30 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-25082:
--
Summary: Make updateTimezone a default method on SettableTreeReader  (was: 
Make SettableTreeReader updateTimezone a default method)

> Make updateTimezone a default method on SettableTreeReader
> --
>
> Key: HIVE-25082
> URL: https://issues.apache.org/jira/browse/HIVE-25082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>
> Avoid useless TimestampStreamReader instance checks by making 
> updateTimezone() a default method in SettableTreeReader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25082) Make SettableTreeReader updateTimezone a default method

2021-04-30 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-25082:
-


> Make SettableTreeReader updateTimezone a default method
> ---
>
> Key: HIVE-25082
> URL: https://issues.apache.org/jira/browse/HIVE-25082
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Minor
>
> Avoid useless TimestampStreamReader instance checks by making 
> updateTimezone() a default method in SettableTreeReader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25061) PTF: Improve BoundaryCache / ValueBoundaryScanner

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=591584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591584
 ]

ASF GitHub Bot logged work on HIVE-25061:
-

Author: ASF GitHub Bot
Created on: 30/Apr/21 15:34
Start Date: 30/Apr/21 15:34
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2225:
URL: https://github.com/apache/hive/pull/2225#discussion_r623972738



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java
##
@@ -414,16 +423,35 @@ protected int computeStartPreceding(int rowIdx, 
PTFPartition p) throws HiveExcep
   return r + 1;
 }
 else { // Use Case 5.
+  Pair start = binaryPreSearchBack(r, p, sortKey, rowVal, 
amt);
+  //start again with linear search from the last point where 
!isDistanceGreater was true
+  r = start.getLeft();
+  rowVal = start.getRight();
   while (r >= 0 && !isDistanceGreater(sortKey, rowVal, amt) ) {
 Pair stepResult = skipOrStepBack(r, p);
 r = stepResult.getLeft();
 rowVal = stepResult.getRight();
   }
-
   return r + 1;
 }
   }
 
+  private Pair binaryPreSearchBack(int r, PTFPartition p, 
Object sortKey,

Review comment:
   I guess existing PTF tests should covert this optimization but would be 
great if we would add specific ones for cases 4 and  5 above




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 591584)
Time Spent: 40m  (was: 0.5h)

> PTF: Improve BoundaryCache / ValueBoundaryScanner
> -
>
> Key: HIVE-25061
> URL: https://issues.apache.org/jira/browse/HIVE-25061
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> First, I need to check whether TreeMap is really needed for our case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25061) PTF: Improve BoundaryCache / ValueBoundaryScanner

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=591583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591583
 ]

ASF GitHub Bot logged work on HIVE-25061:
-

Author: ASF GitHub Bot
Created on: 30/Apr/21 15:32
Start Date: 30/Apr/21 15:32
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2225:
URL: https://github.com/apache/hive/pull/2225#discussion_r623971857



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java
##
@@ -406,6 +411,10 @@ protected int computeStartPreceding(int rowIdx, 
PTFPartition p) throws HiveExcep
 
 // Use Case 4.
 if ( expressionDef.getOrder() == Order.DESC ) {
+  Pair start = binaryPreSearchBack(r, p, sortKey, rowVal, 
amt);

Review comment:
   lets add some context why binary search is useful 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 591583)
Time Spent: 0.5h  (was: 20m)

> PTF: Improve BoundaryCache / ValueBoundaryScanner
> -
>
> Key: HIVE-25061
> URL: https://issues.apache.org/jira/browse/HIVE-25061
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> First, I need to check whether TreeMap is really needed for our case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25061) PTF: Improve BoundaryCache / ValueBoundaryScanner

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=591574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591574
 ]

ASF GitHub Bot logged work on HIVE-25061:
-

Author: ASF GitHub Bot
Created on: 30/Apr/21 15:29
Start Date: 30/Apr/21 15:29
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2225:
URL: https://github.com/apache/hive/pull/2225#discussion_r623969469



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java
##
@@ -218,6 +220,9 @@ public BasePartitionEvaluator(
 this.outputOI = outputOI;
 this.nullsLast = nullsLast;
 this.isCountEvaluator = wrappedEvaluator instanceof 
GenericUDAFCount.GenericUDAFCountEvaluator;
+// use a periodic logger which ignores very small partitions
+this.stopwatch = new PeriodicLoggerWithStopwatch(

Review comment:
   should probably remove/comment out logging here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 591574)
Time Spent: 20m  (was: 10m)

> PTF: Improve BoundaryCache / ValueBoundaryScanner
> -
>
> Key: HIVE-25061
> URL: https://issues.apache.org/jira/browse/HIVE-25061
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> First, I need to check whether TreeMap is really needed for our case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23458) Introduce unified thread pool for scheduled jobs

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23458?focusedWorklogId=591454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591454
 ]

ASF GitHub Bot logged work on HIVE-23458:
-

Author: ASF GitHub Bot
Created on: 30/Apr/21 11:37
Start Date: 30/Apr/21 11:37
Worklog Time Spent: 10m 
  Work Description: EugeneChung edited a comment on pull request #1919:
URL: https://github.com/apache/hive/pull/1919#issuecomment-808207986


   If hive.query.timeout.seconds is set to bigger than 0, a new thread is 
always created (and just destroyed) for every SQL operation by calling 
Executors.newSingleThreadScheduledExecutor(). Most of the scheduled tasks for 
cancelling the operation wouldn't be called, too. The unified scheduler pool 
removes those inefficiencies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 591454)
Time Spent: 2h 10m  (was: 2h)

> Introduce unified thread pool for scheduled jobs
> 
>
> Key: HIVE-23458
> URL: https://issues.apache.org/jira/browse/HIVE-23458
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
>  Labels: pull-request-available, todoc4.0
> Fix For: 4.0.0
>
> Attachments: HIVE-23458.01.patch, HIVE-23458.02.patch, 
> HIVE-23458.03.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As I mentioned in [the comment of 
> HIVE-23164|https://issues.apache.org/jira/browse/HIVE-23164?focusedCommentId=17089506=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17089506],
>  I've made the unified scheduled executor service like 
> org.apache.hadoop.hive.metastore.ThreadPool.
> I think it could help
> 1. to minimize the possibility of making non-daemon threads when developers 
> need ScheduledExecutorService
> 2. to achieve the utilization of server resources because the current 
> situation is all of the modules make its own ScheduledExecutorService and all 
> of the threads are just using for one job. 
> 3. administrators of Hive servers by providing 
> hive.exec.scheduler.num.threads configuration so that they can predict and 
> set how many threads are used and needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25076) Get number of write tasks from jobConf for Iceberg commits

2021-04-30 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25076.
---
Resolution: Fixed

> Get number of write tasks from jobConf for Iceberg commits
> --
>
> Key: HIVE-25076
> URL: https://issues.apache.org/jira/browse/HIVE-25076
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When writing empty data into Iceberg tables, we can end up with 0 succeeded 
> task count number. With the current logic, we might then erroneously end up 
> taking the number of mapper tasks in the commit logic, which would result in 
> failures. We should instead save the number of succeeded task count into the 
> JobConf under a specified key and retrieve it from there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25076) Get number of write tasks from jobConf for Iceberg commits

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25076?focusedWorklogId=591449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591449
 ]

ASF GitHub Bot logged work on HIVE-25076:
-

Author: ASF GitHub Bot
Created on: 30/Apr/21 11:13
Start Date: 30/Apr/21 11:13
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2233:
URL: https://github.com/apache/hive/pull/2233


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 591449)
Time Spent: 20m  (was: 10m)

> Get number of write tasks from jobConf for Iceberg commits
> --
>
> Key: HIVE-25076
> URL: https://issues.apache.org/jira/browse/HIVE-25076
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When writing empty data into Iceberg tables, we can end up with 0 succeeded 
> task count number. With the current logic, we might then erroneously end up 
> taking the number of mapper tasks in the commit logic, which would result in 
> failures. We should instead save the number of succeeded task count into the 
> JobConf under a specified key and retrieve it from there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25033) HPL/SQL thrift call fails when returning null

2021-04-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-25033.
--
Resolution: Fixed

> HPL/SQL thrift call fails when returning null
> -
>
> Key: HIVE-25033
> URL: https://issues.apache.org/jira/browse/HIVE-25033
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25033) HPL/SQL thrift call fails when returning null

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25033?focusedWorklogId=591395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591395
 ]

ASF GitHub Bot logged work on HIVE-25033:
-

Author: ASF GitHub Bot
Created on: 30/Apr/21 08:18
Start Date: 30/Apr/21 08:18
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2194:
URL: https://github.com/apache/hive/pull/2194


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 591395)
Time Spent: 40m  (was: 0.5h)

> HPL/SQL thrift call fails when returning null
> -
>
> Key: HIVE-25033
> URL: https://issues.apache.org/jira/browse/HIVE-25033
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction

2021-04-30 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25079 started by Antal Sinkovits.
--
> Create new metric about number of writes to tables with manually disabled 
> compaction
> 
>
> Key: HIVE-25079
> URL: https://issues.apache.org/jira/browse/HIVE-25079
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> Create a new metric that measures the number of writes tables that has 
> compaction turned off manually. It does not matter if the write is committed 
> or aborted (both are bad...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25081) Put metrics collection behind a feature flag

2021-04-30 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-25081:
--


> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25080) Create metric about oldest entry in "ready for cleaning" state

2021-04-30 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-25080:
--


> Create metric about oldest entry in "ready for cleaning" state
> --
>
> Key: HIVE-25080
> URL: https://issues.apache.org/jira/browse/HIVE-25080
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> When a compaction txn commits, COMPACTION_QUEUE.CQ_COMMIT_TIME is updated 
> with the current time. Then the compaction state is set to "ready for 
> cleaning". (... and then the Cleaner runs and the state is set to "succeeded" 
> hopefully)
> Based on this we know (roughly) how long a compaction has been in state 
> "ready for cleaning".
> We should create a metric similar to compaction_oldest_enqueue_age_in_sec 
> that would show that the cleaner is blocked by something i.e. find the 
> compaction in "ready for cleaning" that has the oldest commit time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction

2021-04-30 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-25079:
--


> Create new metric about number of writes to tables with manually disabled 
> compaction
> 
>
> Key: HIVE-25079
> URL: https://issues.apache.org/jira/browse/HIVE-25079
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> Create a new metric that measures the number of writes tables that has 
> compaction turned off manually. It does not matter if the write is committed 
> or aborted (both are bad...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24722) LLAP cache hydration

2021-04-30 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits resolved HIVE-24722.

Fix Version/s: 4.0.0
   Resolution: Fixed

All subtasks are committed, closing this.

> LLAP cache hydration
> 
>
> Key: HIVE-24722
> URL: https://issues.apache.org/jira/browse/HIVE-24722
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: llap
> Fix For: 4.0.0
>
>
> Provide a way to save and reload the contents of the cache in the llap 
> daemons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25071) Number of reducers limited to fixed 1 when updating/deleting

2021-04-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25071?focusedWorklogId=591383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591383
 ]

ASF GitHub Bot logged work on HIVE-25071:
-

Author: ASF GitHub Bot
Created on: 30/Apr/21 07:28
Start Date: 30/Apr/21 07:28
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on pull request #2231:
URL: https://github.com/apache/hive/pull/2231#issuecomment-829900602


   Hi Marta,
   Thanks for reviewing this patch.
   
   This is what I found about distributing rows to reducers while I was 
debugging:
   Let's say we have the following statements:
   ```
   create table acidtbl(a int, b int) clustered by (a) into 2 buckets stored as 
orc TBLPROPERTIES ('transactional'='true');
   insert ...
   delete from acidtbl where a = 1 or a = 3;
   ```
   This case the the plan of the delete statement after ReduceSinkDeDuplication 
looks like:
   ```
   TS[0]-FIL[8]-SEL[2]-RS[5]-SEL[6]-FS[7]
   ```
   So with Tez we have a mapper: TS[0]-FIL[8]-SEL[2]-RS[5]
   and have two reducers each of them has: SEL[6]-FS[7]
   
   RS[5] has
   Partition keys: GenericUDFBridge ==> UDFToInteger (Column[_col0])
   Sort keys: Column[_col0]
   And maxReducers: 2
   
   where _col0 is the row_id coming from SEL[2].
   
   UDFToInteger() extracts the bucket_id field which is going to 
be used to generate a `reducesink.key` in the RS operator. This is going to be 
passed to the wrapped `OutputCollector` with the row. This case this is an 
`org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput`. This class 
is part of Tez which I'm not familiar with but I found that this is where rows 
are distributed to reducers by the key coming from RS.
   
   Hive/hadoop also has a setting 
`hive.exec.reducers.max`/`mapreduce.job.reduces`. This limits the maxReducers 
in RS operator. If the table has more buckets than the max reducers then 
FileSink operator also distributes the rows into different files. If I 
understand correctly this is done by the `multiFileSpray` functionality.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 591383)
Time Spent: 0.5h  (was: 20m)

> Number of reducers limited to fixed 1 when updating/deleting
> 
>
> Key: HIVE-25071
> URL: https://issues.apache.org/jira/browse/HIVE-25071
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When updating/deleting bucketed tables an extra ReduceSink operator is 
> created to enforce bucketing. After HIVE-22538 number of reducers limited to 
> fixed 1 in these RS operators.
> This can lead to performance degradation.
> Prior HIVE-22538 multiple reducers was available such cases. The reason for 
> limiting the number of reducers is to ensure RowId ascending order in delete 
> delta files produced by the update/delete statements.
> This is the plan of delete statement like:
> {code}
> DELETE FROM t1 WHERE a = 1;
> {code}
> {code}
> TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7]
> {code}
> RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of 
> reducers were limited to bucket number in the table or 
> hive.exec.reducers.max. However RS[5] does not provide any ordering so above 
> plan may generate unsorted deleted deltas which leads to corrupted data reads.
> Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication 
> and the resulting RS kept the ordering and enabled multiple reducers. It 
> could do because ReduceSinkDeduplication was prepared for ACID writes. This 
> was removed by HIVE-22538 to get a more generic ReduceSinkDeduplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25078) [cachedstore]

2021-04-30 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-25078:



> [cachedstore]
> -
>
> Key: HIVE-25078
> URL: https://issues.apache.org/jira/browse/HIVE-25078
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>
> Description
> Add Table id check in following while extracting (i.e. get call) cached table 
> from cached store
> 1. Table
> 2. Partitions
> 3. Constrains 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)