[jira] [Work logged] (HIVE-23731) Review of AvroInstance Cache

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23731?focusedWorklogId=479287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479287
 ]

ASF GitHub Bot logged work on HIVE-23731:
-

Author: ASF GitHub Bot
Created on: 05/Sep/20 01:21
Start Date: 05/Sep/20 01:21
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1153:
URL: https://github.com/apache/hive/pull/1153#issuecomment-687508001


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479287)
Time Spent: 1h  (was: 50m)

> Review of AvroInstance Cache
> 
>
> Key: HIVE-23731
> URL: https://issues.apache.org/jira/browse/HIVE-23731
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23802) “merge files” job was submited to default queue when set hive.merge.tezfiles to true

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23802?focusedWorklogId=479286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479286
 ]

ASF GitHub Bot logged work on HIVE-23802:
-

Author: ASF GitHub Bot
Created on: 05/Sep/20 01:21
Start Date: 05/Sep/20 01:21
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1206:
URL: https://github.com/apache/hive/pull/1206#issuecomment-687507985


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479286)
Time Spent: 20m  (was: 10m)

> “merge files” job was submited to default queue when set hive.merge.tezfiles 
> to true
> 
>
> Key: HIVE-23802
> URL: https://issues.apache.org/jira/browse/HIVE-23802
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0
>Reporter: gaozhan ding
>Assignee: gaozhan ding
>Priority: Major
>  Labels: pull-request-available
> Attachments: 15940042679272.png, HIVE-23802.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We use tez as the query engine. When hive.merge.tezfiles  set to true,merge 
> files task,  which followed by orginal task,  will be submit to default queue 
> rather then the queue same with orginal task.
> I study this issue for days and found that, every time starting a container, 
> "tez,queue.name" whill be unset in current session. Code are as below:
> {code:java}
> // TezSessionState.startSessionAndContainers()
> // sessionState.getQueueName() comes from cluster wide configured queue names.
>  // sessionState.getConf().get("tez.queue.name") is explicitly set by user in 
> a session.
>  // TezSessionPoolManager sets tez.queue.name if user has specified one or 
> use the one from
>  // cluster wide queue names.
>  // There is no way to differentiate how this was set (user vs system).
>  // Unset this after opening the session so that reopening of session uses 
> the correct queue
>  // names i.e, if client has not died and if the user has explicitly set a 
> queue name
>  // then reopened session will use user specified queue name else default 
> cluster queue names.
>  conf.unset(TezConfiguration.TEZ_QUEUE_NAME);
> {code}
> So after the orgin task was submited to yarn, "tez.queue.name" will be unset. 
> While starting merge file task, it will try use the same session with orgin 
> job, but get false due to tez.queue.name was unset. Seems like we could not 
> unset this property.
> {code:java}
> // TezSessionPoolManager.canWorkWithSameSession()
> if (!session.isDefault()) {
>   String queueName = session.getQueueName();
>   String confQueueName = conf.get(TezConfiguration.TEZ_QUEUE_NAME);
>   LOG.info("Current queue name is " + queueName + " incoming queue name is " 
> + confQueueName);
>   return (queueName == null) ? confQueueName == null : 
> queueName.equals(confQueueName);
> } else {
>   // this session should never be a default session unless something has 
> messed up.
>   throw new HiveException("The pool session " + session + " should have been 
> returned to the pool"); 
> }
> {code}
>    !15940042679272.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24039) Update jquery version to mitigate CVE-2020-11023

2020-09-04 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das reassigned HIVE-24039:
-

Assignee: Rajkumar Singh  (was: Kishen Das)

> Update jquery version to mitigate CVE-2020-11023
> 
>
> Key: HIVE-24039
> URL: https://issues.apache.org/jira/browse/HIVE-24039
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>
> there is known vulnerability in jquery version used by hive, with this jira 
> plan is to upgrade the jquery version 3.5.0 where it's been fixed. more 
> details about the vulnerability can be found here.
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11023



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-04 Thread Ashutosh Chauhan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190899#comment-17190899
 ] 

Ashutosh Chauhan commented on HIVE-23408:
-

+1

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23454:
--
Labels: pull-request-available  (was: )

> Querying hive table which has Materialized view fails with 
> HiveAccessControlException
> -
>
> Key: HIVE-23454
> URL: https://issues.apache.org/jira/browse/HIVE-23454
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 3.0.0, 3.2.0
>Reporter: Chiran Ravani
>Assignee: Vineet Garg
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Query fails with HiveAccessControlException against table when there is  
> Materialized view pointing to that table which end user does not have access 
> to, but the actual table user has all the privileges.
> From the HiveServer2 logs - it looks as part of optimization Hive uses 
> materialized view to query the data instead of table and since end user does 
> not have access on MV we receive HiveAccessControlException.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99
> The Simplest reproducer for this issue is as below.
> 1. Create a table using hive user and insert some data
> {code:java}
> create table db1.testmvtable(id int, name string) partitioned by(year int);
> insert into db1.testmvtable partition(year=2020) values(1,'Name1');
> insert into db1.testmvtable partition(year=2020) values(1,'Name2');
> insert into db1.testmvtable partition(year=2016) values(1,'Name1');
> insert into db1.testmvtable partition(year=2016) values(1,'Name2');
> {code}
> 2. Create Materialized view on top of above table with partitioned and where 
> clause as hive user.
> {code:java}
> CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from 
> db1.testmvtable tmv where year >= 2018;
> {code}
> 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on 
> database db1.
> 4. Run select on base table db1.testmvtable as 'chiran' with where clause 
> having partition value >=2018, it runs into HiveAccessControlException on 
> db2.testmv
> {code:java}
> eg:- (select * from db1.testmvtable where year=2020;)
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [chiran] does not have [SELECT] privilege on 
> [db2/testmv/*] (state=42000,code=4)
> {code}
> 5. This works when partition column is not in MV
> {code:java}
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016;
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 
> txnid:897
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, 
> comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), 
> FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.222 seconds
> DEBUG : Encoding valid txn write ids info 
> 897$db1.testmvtable:4:9223372036854775807:: txnid:897
> INFO  : Executing 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> INFO  : Completed executing 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.008 seconds
> INFO  : OK
> DEBUG : Shutting down query select * from db1.testmvtable where year=2016
> +-+---+---+
> | testmvtable.id  | testmvtable.name  | testmvtable.year  |
> +-+---+---+
> | 1   | Name1 | 2016  |
> | 1   | Name2 | 2016  |
> +-+---+---+
> 2 rows selected (0.302 seconds)
> 0: jdbc:hive2://node2>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23454?focusedWorklogId=479213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479213
 ]

ASF GitHub Bot logged work on HIVE-23454:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 18:22
Start Date: 04/Sep/20 18:22
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 opened a new pull request #1471:
URL: https://github.com/apache/hive/pull/1471


   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479213)
Remaining Estimate: 0h
Time Spent: 10m

> Querying hive table which has Materialized view fails with 
> HiveAccessControlException
> -
>
> Key: HIVE-23454
> URL: https://issues.apache.org/jira/browse/HIVE-23454
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 3.0.0, 3.2.0
>Reporter: Chiran Ravani
>Assignee: Vineet Garg
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Query fails with HiveAccessControlException against table when there is  
> Materialized view pointing to that table which end user does not have access 
> to, but the actual table user has all the privileges.
> From the HiveServer2 logs - it looks as part of optimization Hive uses 
> materialized view to query the data instead of table and since end user does 
> not have access on MV we receive HiveAccessControlException.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99
> The Simplest reproducer for this issue is as below.
> 1. Create a table using hive user and insert some data
> {code:java}
> create table db1.testmvtable(id int, name string) partitioned by(year int);
> insert into db1.testmvtable partition(year=2020) values(1,'Name1');
> insert into db1.testmvtable partition(year=2020) values(1,'Name2');
> insert into db1.testmvtable partition(year=2016) values(1,'Name1');
> insert into db1.testmvtable partition(year=2016) values(1,'Name2');
> {code}
> 2. Create Materialized view on top of above table with partitioned and where 
> clause as hive user.
> {code:java}
> CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from 
> db1.testmvtable tmv where year >= 2018;
> {code}
> 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on 
> database db1.
> 4. Run select on base table db1.testmvtable as 'chiran' with where clause 
> having partition value >=2018, it runs into HiveAccessControlException on 
> db2.testmv
> {code:java}
> eg:- (select * from db1.testmvtable where year=2020;)
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [chiran] does not have [SELECT] privilege on 
> [db2/testmv/*] (state=42000,code=4)
> {code}
> 5. This works when partition column is not in MV
> {code:java}
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016;
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 
> txnid:897
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, 
> comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), 
> FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.222 seconds
> DEBUG : Encoding valid txn write ids info 
> 897$db1.testmvtable:4:9223372036854775807:: txnid:897
> INFO  : Executing 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> INFO  : Completed executing 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.008 seconds
> INFO  : OK
> DEBUG : Shutting down query 

[jira] [Assigned] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException

2020-09-04 Thread Vineet Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-23454:
--

Assignee: Vineet Garg  (was: Nishant Goel)

> Querying hive table which has Materialized view fails with 
> HiveAccessControlException
> -
>
> Key: HIVE-23454
> URL: https://issues.apache.org/jira/browse/HIVE-23454
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 3.0.0, 3.2.0
>Reporter: Chiran Ravani
>Assignee: Vineet Garg
>Priority: Critical
>
> Query fails with HiveAccessControlException against table when there is  
> Materialized view pointing to that table which end user does not have access 
> to, but the actual table user has all the privileges.
> From the HiveServer2 logs - it looks as part of optimization Hive uses 
> materialized view to query the data instead of table and since end user does 
> not have access on MV we receive HiveAccessControlException.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99
> The Simplest reproducer for this issue is as below.
> 1. Create a table using hive user and insert some data
> {code:java}
> create table db1.testmvtable(id int, name string) partitioned by(year int);
> insert into db1.testmvtable partition(year=2020) values(1,'Name1');
> insert into db1.testmvtable partition(year=2020) values(1,'Name2');
> insert into db1.testmvtable partition(year=2016) values(1,'Name1');
> insert into db1.testmvtable partition(year=2016) values(1,'Name2');
> {code}
> 2. Create Materialized view on top of above table with partitioned and where 
> clause as hive user.
> {code:java}
> CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from 
> db1.testmvtable tmv where year >= 2018;
> {code}
> 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on 
> database db1.
> 4. Run select on base table db1.testmvtable as 'chiran' with where clause 
> having partition value >=2018, it runs into HiveAccessControlException on 
> db2.testmv
> {code:java}
> eg:- (select * from db1.testmvtable where year=2020;)
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [chiran] does not have [SELECT] privilege on 
> [db2/testmv/*] (state=42000,code=4)
> {code}
> 5. This works when partition column is not in MV
> {code:java}
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016;
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 
> txnid:897
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, 
> comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), 
> FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.222 seconds
> DEBUG : Encoding valid txn write ids info 
> 897$db1.testmvtable:4:9223372036854775807:: txnid:897
> INFO  : Executing 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> INFO  : Completed executing 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.008 seconds
> INFO  : OK
> DEBUG : Shutting down query select * from db1.testmvtable where year=2016
> +-+---+---+
> | testmvtable.id  | testmvtable.name  | testmvtable.year  |
> +-+---+---+
> | 1   | Name1 | 2016  |
> | 1   | Name2 | 2016  |
> +-+---+---+
> 2 rows selected (0.302 seconds)
> 0: jdbc:hive2://node2>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact

2020-09-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804
 ] 

László Bodor edited comment on HIVE-24111 at 9/4/20, 4:45 PM:
--

For reference:
logs for a good run:  
[^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] 
logs for a hanging run:  [^TestCrudCompactorTez.log] 

what is strange for the first sight, I cannot see MergeManager related log 
messages when it's expected, so this could be a shuffle issue

good run:
{code}
2020-09-03T15:13:19,604  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-03T15:13:19,606  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-03T15:13:19,606  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-03T15:13:19,606  INFO [ShuffleAndMergeRunner {Map_1}] 
orderedgrouped.MergeManager: Setting merger's parent thread to 
ShuffleAndMergeRunner {Map_1}
2020-09-03T15:13:19,606  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0
2020-09-03T15:13:19,607  INFO 
[TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] 
orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: 
numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0
2020-09-03T15:13:19,607  INFO [TezChild] exec.SerializationUtilities: 
Deserializing ReduceWork using kryo
2020-09-03T15:13:19,607  INFO [TezChild] exec.Utilities: Deserialized plan (via 
RPC) - name: Reducer 2 size: 1.87KB
2020-09-03T15:13:19,607  INFO [TezChild] tez.ObjectCache: Caching key: 
lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 
2__REDUCE_PLAN__
2020-09-03T15:13:19,607  INFO [TezChild] tez.RecordProcessor: conf class path = 
[]
2020-09-03T15:13:19,608  INFO [TezChild] tez.RecordProcessor: thread class path 
= []
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. 
Current: 
path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out,
 len=26
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: 
Completed fetch for attempt: {0, 0, 
attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, 
dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1
{code}

hanging run:
{code}
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-04T02:12:16,394  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-04T02:12:16,398  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-04T02:12:16,398  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-04T02:12:16,398  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0
2020-09-04T02:12:16,416  INFO [TezChild] 

[jira] [Comment Edited] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact

2020-09-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804
 ] 

László Bodor edited comment on HIVE-24111 at 9/4/20, 4:39 PM:
--

For reference:
logs for a good run:  
[^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] 
logs for a hanging run:  [^TestCrudCompactorTez.log] 

what is strange for the first sight, I cannot see MergeManager related log 
messages when it's expected, so this could be a shuffle issue

good run:
{code}
2020-09-03T15:13:19,604  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-03T15:13:19,606  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-03T15:13:19,606  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-03T15:13:19,606  INFO [ShuffleAndMergeRunner {Map_1}] 
orderedgrouped.MergeManager: Setting merger's parent thread to 
ShuffleAndMergeRunner {Map_1}
2020-09-03T15:13:19,606  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0
2020-09-03T15:13:19,607  INFO 
[TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] 
orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: 
numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0
2020-09-03T15:13:19,607  INFO [TezChild] exec.SerializationUtilities: 
Deserializing ReduceWork using kryo
2020-09-03T15:13:19,607  INFO [TezChild] exec.Utilities: Deserialized plan (via 
RPC) - name: Reducer 2 size: 1.87KB
2020-09-03T15:13:19,607  INFO [TezChild] tez.ObjectCache: Caching key: 
lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 
2__REDUCE_PLAN__
2020-09-03T15:13:19,607  INFO [TezChild] tez.RecordProcessor: conf class path = 
[]
2020-09-03T15:13:19,608  INFO [TezChild] tez.RecordProcessor: thread class path 
= []
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. 
Current: 
path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out,
 len=26
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: 
Completed fetch for attempt: {0, 0, 
attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, 
dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1
{code}

hanging run:
{code}
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-04T02:12:16,394  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-04T02:12:16,398  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-04T02:12:16,398  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-04T02:12:16,398  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0
2020-09-04T02:12:16,416  INFO [TezChild] 

[jira] [Comment Edited] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact

2020-09-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804
 ] 

László Bodor edited comment on HIVE-24111 at 9/4/20, 4:17 PM:
--

For reference:
logs for a good run:  
[^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] 
logs for a hanging run:  [^TestCrudCompactorTez.log] 

what is strange for the first sight, I cannot see MergeManager related log 
messages when it's expected, so this could be a shuffle issue

good run:
{code}
2020-09-03T15:13:19,604  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-03T15:13:19,606  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-03T15:13:19,606  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-03T15:13:19,606  INFO [ShuffleAndMergeRunner {Map_1}] 
orderedgrouped.MergeManager: Setting merger's parent thread to 
ShuffleAndMergeRunner {Map_1}
2020-09-03T15:13:19,606  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0
2020-09-03T15:13:19,607  INFO 
[TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] 
orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: 
numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0
2020-09-03T15:13:19,607  INFO [TezChild] exec.SerializationUtilities: 
Deserializing ReduceWork using kryo
2020-09-03T15:13:19,607  INFO [TezChild] exec.Utilities: Deserialized plan (via 
RPC) - name: Reducer 2 size: 1.87KB
2020-09-03T15:13:19,607  INFO [TezChild] tez.ObjectCache: Caching key: 
lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 
2__REDUCE_PLAN__
2020-09-03T15:13:19,607  INFO [TezChild] tez.RecordProcessor: conf class path = 
[]
2020-09-03T15:13:19,608  INFO [TezChild] tez.RecordProcessor: thread class path 
= []
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. 
Current: 
path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out,
 len=26
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: 
Completed fetch for attempt: {0, 0, 
attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, 
dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1
{code}

hanging run:
{code}
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-04T02:12:16,394  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-04T02:12:16,398  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-04T02:12:16,398  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-04T02:12:16,398  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0
2020-09-04T02:12:16,416  INFO [TezChild] 

[jira] [Comment Edited] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact

2020-09-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804
 ] 

László Bodor edited comment on HIVE-24111 at 9/4/20, 4:13 PM:
--

For reference:
logs for a good run:  
[^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] 
logs for a hanging run:  [^TestCrudCompactorTez.log] 

what is strange for the first sight, I cannot see MergeManager related log 
messages when it's expected, so this could be a shuffle issue

good run:
{code}
2020-09-03T15:13:19,604  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-03T15:13:19,606  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-03T15:13:19,606  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-03T15:13:19,606  INFO [ShuffleAndMergeRunner {Map_1}] 
orderedgrouped.MergeManager: Setting merger's parent thread to 
ShuffleAndMergeRunner {Map_1}
2020-09-03T15:13:19,606  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0
2020-09-03T15:13:19,607  INFO 
[TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] 
orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: 
numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0
2020-09-03T15:13:19,607  INFO [TezChild] exec.SerializationUtilities: 
Deserializing ReduceWork using kryo
2020-09-03T15:13:19,607  INFO [TezChild] exec.Utilities: Deserialized plan (via 
RPC) - name: Reducer 2 size: 1.87KB
2020-09-03T15:13:19,607  INFO [TezChild] tez.ObjectCache: Caching key: 
lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 
2__REDUCE_PLAN__
2020-09-03T15:13:19,607  INFO [TezChild] tez.RecordProcessor: conf class path = 
[]
2020-09-03T15:13:19,608  INFO [TezChild] tez.RecordProcessor: thread class path 
= []
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. 
Current: 
path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out,
 len=26
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: 
Completed fetch for attempt: {0, 0, 
attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, 
dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1
{code}

hanging run:
{code}
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-04T02:12:16,394  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-04T02:12:16,398  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-04T02:12:16,398  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-04T02:12:16,398  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0
2020-09-04T02:12:16,416  INFO [TezChild] 

[jira] [Commented] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact

2020-09-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804
 ] 

László Bodor commented on HIVE-24111:
-

For reference:
logs for a good run:  
[^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] 
logs for a hanging run:  [^TestCrudCompactorTez.log] 

what is strange for the first sight, I cannot see MergeManager related log 
messages when it's expected, so this could be a shuffle issue

good run:
{code}
2020-09-03T15:13:19,604  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-03T15:13:19,605  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-03T15:13:19,606  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-03T15:13:19,606  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-03T15:13:19,606  INFO [ShuffleAndMergeRunner {Map_1}] 
orderedgrouped.MergeManager: Setting merger's parent thread to 
ShuffleAndMergeRunner {Map_1}
2020-09-03T15:13:19,606  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0
2020-09-03T15:13:19,607  INFO 
[TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] 
orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: 
numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0
2020-09-03T15:13:19,607  INFO [TezChild] exec.SerializationUtilities: 
Deserializing ReduceWork using kryo
2020-09-03T15:13:19,607  INFO [TezChild] exec.Utilities: Deserialized plan (via 
RPC) - name: Reducer 2 size: 1.87KB
2020-09-03T15:13:19,607  INFO [TezChild] tez.ObjectCache: Caching key: 
lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 
2__REDUCE_PLAN__
2020-09-03T15:13:19,607  INFO [TezChild] tez.RecordProcessor: conf class path = 
[]
2020-09-03T15:13:19,608  INFO [TezChild] tez.RecordProcessor: thread class path 
= []
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. 
Current: 
path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out,
 len=26
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: 
Completed fetch for attempt: {0, 0, 
attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, 
dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s
2020-09-03T15:13:19,608  INFO [Fetcher_O {Map_1} #0] 
orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1
{code}

hanging run:
{code}
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, 
ifileReadAhead: true
2020-09-04T02:12:16,392  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, 
maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, 
postMergeMem=0, memToMemMergeOutputsThreshold=10
2020-09-04T02:12:16,394  INFO [I/O Setup 0 Start: {Map 1}] 
orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 
1 with configuration: maxFetchFailuresBeforeReporting=5, 
reportReadErrorImmediately=true, maxFailedUniqueFetches=1, 
abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, 
hostFailureFraction=0.2, minFailurePerHost=4, 
maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, 
minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true
2020-09-04T02:12:16,398  INFO [I/O Setup 0 Start: {Map 1}] 
runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1
2020-09-04T02:12:16,398  INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: 
AutoStartComplete
2020-09-04T02:12:16,398  INFO [TezChild] task.TaskRunner2Callable: Running 
task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0
2020-09-04T02:12:16,416  INFO [TezChild] tez.ReduceRecordProcessor: Waiting for 
ShuffleInputs to become 

[jira] [Updated] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact

2020-09-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24111:

Attachment: 
org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log

> TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
> ---
>
> Key: HIVE-24111
> URL: https://issues.apache.org/jira/browse/HIVE-24111
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TestCrudCompactorTez.log, jstack.log, 
> org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log
>
>
> Reproduced issue in ptest run which I made to run against tez staging 
> artifacts 
> (https://repository.apache.org/content/repositories/orgapachetez-1068/)
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1311/14/pipeline/417
> I'm about to investigate this. I think Tez 0.10.0 cannot be released until we 
> won't confirm if it's a hive or tez bug.
> {code}
> mvn test -Pitests,hadoop-2 -Dtest=TestMmCompactorOnTez -pl ./itests/hive-unit
> {code}
> tez setup:
> https://github.com/apache/hive/commit/92516631ab39f39df5d0692f98ac32c2cd320997#diff-a22bcc9ba13b310c7abfee4a57c4b130R83-R97



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact

2020-09-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24111:

Attachment: (was: stacktrace.log)

> TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
> ---
>
> Key: HIVE-24111
> URL: https://issues.apache.org/jira/browse/HIVE-24111
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TestCrudCompactorTez.log, jstack.log
>
>
> Reproduced issue in ptest run which I made to run against tez staging 
> artifacts 
> (https://repository.apache.org/content/repositories/orgapachetez-1068/)
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1311/14/pipeline/417
> I'm about to investigate this. I think Tez 0.10.0 cannot be released until we 
> won't confirm if it's a hive or tez bug.
> {code}
> mvn test -Pitests,hadoop-2 -Dtest=TestMmCompactorOnTez -pl ./itests/hive-unit
> {code}
> tez setup:
> https://github.com/apache/hive/commit/92516631ab39f39df5d0692f98ac32c2cd320997#diff-a22bcc9ba13b310c7abfee4a57c4b130R83-R97



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact

2020-09-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24111:

Attachment: jstack.log

> TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
> ---
>
> Key: HIVE-24111
> URL: https://issues.apache.org/jira/browse/HIVE-24111
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TestCrudCompactorTez.log, jstack.log
>
>
> Reproduced issue in ptest run which I made to run against tez staging 
> artifacts 
> (https://repository.apache.org/content/repositories/orgapachetez-1068/)
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1311/14/pipeline/417
> I'm about to investigate this. I think Tez 0.10.0 cannot be released until we 
> won't confirm if it's a hive or tez bug.
> {code}
> mvn test -Pitests,hadoop-2 -Dtest=TestMmCompactorOnTez -pl ./itests/hive-unit
> {code}
> tez setup:
> https://github.com/apache/hive/commit/92516631ab39f39df5d0692f98ac32c2cd320997#diff-a22bcc9ba13b310c7abfee4a57c4b130R83-R97



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact

2020-09-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24111:

Attachment: stacktrace.log
TestCrudCompactorTez.log

> TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
> ---
>
> Key: HIVE-24111
> URL: https://issues.apache.org/jira/browse/HIVE-24111
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: TestCrudCompactorTez.log, stacktrace.log
>
>
> Reproduced issue in ptest run which I made to run against tez staging 
> artifacts 
> (https://repository.apache.org/content/repositories/orgapachetez-1068/)
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1311/14/pipeline/417
> I'm about to investigate this. I think Tez 0.10.0 cannot be released until we 
> won't confirm if it's a hive or tez bug.
> {code}
> mvn test -Pitests,hadoop-2 -Dtest=TestMmCompactorOnTez -pl ./itests/hive-unit
> {code}
> tez setup:
> https://github.com/apache/hive/commit/92516631ab39f39df5d0692f98ac32c2cd320997#diff-a22bcc9ba13b310c7abfee4a57c4b130R83-R97



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24121) dynamic_semijoin_reduction_on_aggcol.q is flaky

2020-09-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24121:

Attachment: ci.hive.apache.org.txt

> dynamic_semijoin_reduction_on_aggcol.q is flaky
> ---
>
> Key: HIVE-24121
> URL: https://issues.apache.org/jira/browse/HIVE-24121
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Major
> Attachments: ci.hive.apache.org.txt
>
>
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1379/3/tests
> {code}
> java.lang.AssertionError: 
> Client Execution succeeded but contained differences (error code = 1) after 
> executing dynamic_semijoin_reduction_on_aggcol.q 
> 186c186,188
> < 0   NULL
> ---
> > 0   val_0
> > 0   val_0
> > 0   val_0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24121) dynamic_semijoin_reduction_on_aggcol.q is flaky

2020-09-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24121:

Description: 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1379/3/tests

{code}
java.lang.AssertionError: 
Client Execution succeeded but contained differences (error code = 1) after 
executing dynamic_semijoin_reduction_on_aggcol.q 
186c186,188
< 0 NULL
---
> 0 val_0
> 0 val_0
> 0 val_0
{code}



> dynamic_semijoin_reduction_on_aggcol.q is flaky
> ---
>
> Key: HIVE-24121
> URL: https://issues.apache.org/jira/browse/HIVE-24121
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Major
>
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1379/3/tests
> {code}
> java.lang.AssertionError: 
> Client Execution succeeded but contained differences (error code = 1) after 
> executing dynamic_semijoin_reduction_on_aggcol.q 
> 186c186,188
> < 0   NULL
> ---
> > 0   val_0
> > 0   val_0
> > 0   val_0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?focusedWorklogId=479105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479105
 ]

ASF GitHub Bot logged work on HIVE-24120:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 13:35
Start Date: 04/Sep/20 13:35
Worklog Time Spent: 10m 
  Work Description: gatorblue opened a new pull request #1470:
URL: https://github.com/apache/hive/pull/1470


   This PR implements the design outlined in this doc:
   
https://docs.google.com/document/d/1lR2IDiqjMiG1zb7o_4sEa6wULzH1CvtRodsghCtxYWY/edit#heading=h.imzjzzvayyi1
   
   Quoting from the design doc:
   **Goals**
   Introduce a way to load a custom jar in metastore which can override 
database specific DatabaseProduct which provides the database specific SQL code 
execution in Hive Metastore. 
   
   **Non-goals**
   We do not want to evolve metastore into a Datanucleus (ORM) like software 
which transparently handles all the different nuances of supporting multiple 
database types. The pluggable custom instance of the DatabaseProduct must be 
ANSI SQL compliant since the MetastoreDirectSQL and SqlGenerator assumes that. 
We would like to keep the changes to MetastoreDirectSQL at minimum and assume 
that all the supported databases are ANSI SQL compliant.
   Upgrade and performance testing of custom implementations. 
   Schematool to be able to load this custom jar to execute schema 
initialization and upgrade scripts. This is not currently in scope of this 
document.
   
   **About this PR**
   As per the design, DatabaseProduct has been refactored as a class.
   It's a singleton class, which gets instantiated the first time method 
determineDatabaseProduct is invoked.
   Existing SQL logic that was scattered around other classes has been moved to 
methods in this class. This makes it
   possible for an external plugin to override the existing SQL logic.
   @vihangk1 @thejasmn @nrg4878
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479105)
Remaining Estimate: 0h
Time Spent: 10m

> Plugin for external DatabaseProduct in standalone HMS
> -
>
> Key: HIVE-24120
> URL: https://issues.apache.org/jira/browse/HIVE-24120
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: Gustavo Arocena
>Assignee: Gustavo Arocena
>Priority: Minor
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a pluggable way to support ANSI compliant databases as backends for 
> standalone HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24120:
--
Labels: pull-request-available  (was: )

> Plugin for external DatabaseProduct in standalone HMS
> -
>
> Key: HIVE-24120
> URL: https://issues.apache.org/jira/browse/HIVE-24120
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: Gustavo Arocena
>Assignee: Gustavo Arocena
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a pluggable way to support ANSI compliant databases as backends for 
> standalone HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS

2020-09-04 Thread Gustavo Arocena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gustavo Arocena reassigned HIVE-24120:
--


> Plugin for external DatabaseProduct in standalone HMS
> -
>
> Key: HIVE-24120
> URL: https://issues.apache.org/jira/browse/HIVE-24120
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.1
>Reporter: Gustavo Arocena
>Assignee: Gustavo Arocena
>Priority: Minor
> Fix For: 4.0.0
>
>
> Add a pluggable way to support ANSI compliant databases as backends for 
> standalone HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=479100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479100
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 13:13
Start Date: 04/Sep/20 13:13
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1379:
URL: https://github.com/apache/hive/pull/1379#discussion_r483607071



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
##
@@ -265,11 +279,70 @@ public URI apply(Path path) {
 }
 dag.addURIsForCredentials(uris);
   }
+  getKafkaCredentials((MapWork)work, dag, conf);
 }
-
 getCredentialsForFileSinks(work, dag);
   }
 
+  private void getKafkaCredentials(MapWork work, DAG dag, JobConf conf) {
+Token tokenCheck = 
dag.getCredentials().getToken(KAFKA_DELEGATION_TOKEN_KEY);
+if (tokenCheck != null) {
+  LOG.debug("Kafka credentials already added, skipping...");
+  return;
+}
+LOG.info("Getting kafka credentials for mapwork: " + work.getName());
+
+String kafkaBrokers = null;
+Map partitions = work.getAliasToPartnInfo();

Review comment:
   @ashutoshc : what do you think about this? 
https://github.com/apache/hive/pull/1379/commits/edc4ad440af4e234b731104bbcb9837cbdf43a19#diff-d7b5b051769b68ed5dd602cc30744439R303-R306
   (tested on cluster)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479100)
Time Spent: 1h 10m  (was: 1h)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23976) Enable vectorization for multi-col semi join reducers

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23976?focusedWorklogId=479072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479072
 ]

ASF GitHub Bot logged work on HIVE-23976:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 11:49
Start Date: 04/Sep/20 11:49
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1458:
URL: https://github.com/apache/hive/pull/1458#issuecomment-687095101


   @t3rmin4t0r , @zabetak , @ramesh0201: could you please take a look? good-old 
vectorization coverage patch, murmurhash with 2 column params



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479072)
Time Spent: 20m  (was: 10m)

> Enable vectorization for multi-col semi join reducers
> -
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23995) Don't set location for managed tables in case of replication

2020-09-04 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190685#comment-17190685
 ] 

Aasha Medhi commented on HIVE-23995:


This was removed because the entire migration and upgrades workflow needs to be 
throught through and won't work as is for the latest version of hive.

> Don't set location for managed tables in case of replication
> 
>
> Key: HIVE-23995
> URL: https://issues.apache.org/jira/browse/HIVE-23995
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23995.01.patch, HIVE-23995.02.patch, 
> HIVE-23995.03.patch, HIVE-23995.04.patch, HIVE-23995.05.patch, 
> HIVE-23995.06.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Managed table location should not be set
> Migration code of replication should be removed
> add logging to all ack files
> set hive.repl.data.copy.lazy to true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23995) Don't set location for managed tables in case of replication

2020-09-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190684#comment-17190684
 ] 

Ádám Szita commented on HIVE-23995:
---

This commit has removed HiveStrictManagedMigration.java (and the corresponding 
test) but I don't see its functionality being ported to anywhere. AFAIK it was 
not really related to replication but we use this class to prepare Hive 
warehouse between major upgrades. Was there a particular reason this had to be 
removed?

cc:

[~thejas] [~jdere]

> Don't set location for managed tables in case of replication
> 
>
> Key: HIVE-23995
> URL: https://issues.apache.org/jira/browse/HIVE-23995
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23995.01.patch, HIVE-23995.02.patch, 
> HIVE-23995.03.patch, HIVE-23995.04.patch, HIVE-23995.05.patch, 
> HIVE-23995.06.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Managed table location should not be set
> Migration code of replication should be removed
> add logging to all ack files
> set hive.repl.data.copy.lazy to true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=479034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479034
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 09:28
Start Date: 04/Sep/20 09:28
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1379:
URL: https://github.com/apache/hive/pull/1379#discussion_r483501753



##
File path: pom.xml
##
@@ -169,6 +169,7 @@
 4.13
 5.6.2
 5.6.2
+2.5.0

Review comment:
   sure!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479034)
Time Spent: 1h  (was: 50m)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24117) Fix for not setting managed table location in incremental load

2020-09-04 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190635#comment-17190635
 ] 

Aasha Medhi commented on HIVE-24117:


[~thejas] [~ngangam] Commit for this Jira is blocked on 
https://issues.apache.org/jira/browse/HIVE-23387
Tests are failing as getDefaultTablePath for managed tables returns external 
table path instead of managed table.

> Fix for not setting managed table location in incremental load
> --
>
> Key: HIVE-24117
> URL: https://issues.apache.org/jira/browse/HIVE-24117
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24117.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24116) LLAP: Provide an opportunity for preempted tasks to get better locality in next iteration

2020-09-04 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HIVE-24116.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> LLAP: Provide an opportunity for preempted tasks to get better locality in 
> next iteration
> -
>
> Key: HIVE-24116
> URL: https://issues.apache.org/jira/browse/HIVE-24116
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In certain DAGs, tasks get preempted as higher priority tasks need to be 
> executed. These preempted tasks are scheduled to run later, but they end up 
> missing locality information. Ref: HIVE-24061
> Remote storage reads can be avoided, if an opportunity is provided for these 
> preempted to get better locality in next iteration.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24116) LLAP: Provide an opportunity for preempted tasks to get better locality in next iteration

2020-09-04 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190634#comment-17190634
 ] 

Rajesh Balamohan commented on HIVE-24116:
-

Thanks [~gopalv] . Committed to master.

> LLAP: Provide an opportunity for preempted tasks to get better locality in 
> next iteration
> -
>
> Key: HIVE-24116
> URL: https://issues.apache.org/jira/browse/HIVE-24116
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In certain DAGs, tasks get preempted as higher priority tasks need to be 
> executed. These preempted tasks are scheduled to run later, but they end up 
> missing locality information. Ref: HIVE-24061
> Remote storage reads can be avoided, if an opportunity is provided for these 
> preempted to get better locality in next iteration.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24116) LLAP: Provide an opportunity for preempted tasks to get better locality in next iteration

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24116?focusedWorklogId=479021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479021
 ]

ASF GitHub Bot logged work on HIVE-24116:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 08:59
Start Date: 04/Sep/20 08:59
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on pull request #1466:
URL: https://github.com/apache/hive/pull/1466#issuecomment-687020811


   Thanks @prasanthj . Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479021)
Time Spent: 40m  (was: 0.5h)

> LLAP: Provide an opportunity for preempted tasks to get better locality in 
> next iteration
> -
>
> Key: HIVE-24116
> URL: https://issues.apache.org/jira/browse/HIVE-24116
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In certain DAGs, tasks get preempted as higher priority tasks need to be 
> executed. These preempted tasks are scheduled to run later, but they end up 
> missing locality information. Ref: HIVE-24061
> Remote storage reads can be avoided, if an opportunity is provided for these 
> preempted to get better locality in next iteration.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24117) Fix for not setting managed table location in incremental load

2020-09-04 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190635#comment-17190635
 ] 

Aasha Medhi edited comment on HIVE-24117 at 9/4/20, 8:59 AM:
-

[~thejas] [~ngangam] Commit for this Jira is blocked on 
https://issues.apache.org/jira/browse/HIVE-23387
Tests are failing as getDefaultTablePath for managed tables returns external 
table path instead of managed table.

cc [~anishek]


was (Author: aasha):
[~thejas] [~ngangam] Commit for this Jira is blocked on 
https://issues.apache.org/jira/browse/HIVE-23387
Tests are failing as getDefaultTablePath for managed tables returns external 
table path instead of managed table.

> Fix for not setting managed table location in incremental load
> --
>
> Key: HIVE-24117
> URL: https://issues.apache.org/jira/browse/HIVE-24117
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24117.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24116) LLAP: Provide an opportunity for preempted tasks to get better locality in next iteration

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24116?focusedWorklogId=479020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479020
 ]

ASF GitHub Bot logged work on HIVE-24116:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 08:57
Start Date: 04/Sep/20 08:57
Worklog Time Spent: 10m 
  Work Description: rbalamohan merged pull request #1466:
URL: https://github.com/apache/hive/pull/1466


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479020)
Time Spent: 0.5h  (was: 20m)

> LLAP: Provide an opportunity for preempted tasks to get better locality in 
> next iteration
> -
>
> Key: HIVE-24116
> URL: https://issues.apache.org/jira/browse/HIVE-24116
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In certain DAGs, tasks get preempted as higher priority tasks need to be 
> executed. These preempted tasks are scheduled to run later, but they end up 
> missing locality information. Ref: HIVE-24061
> Remote storage reads can be avoided, if an opportunity is provided for these 
> preempted to get better locality in next iteration.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24119) fix the issue that the client got wrong jobTrackerUrl when resourcemanager has ha instances

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24119?focusedWorklogId=479013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479013
 ]

ASF GitHub Bot logged work on HIVE-24119:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 08:46
Start Date: 04/Sep/20 08:46
Worklog Time Spent: 10m 
  Work Description: Neilxzn opened a new pull request #1469:
URL: https://github.com/apache/hive/pull/1469


   
   
   ### What changes were proposed in this pull request?
   
   change the method  Hadoop23Shims.getJobLauncherRpcAddress to fix the issue 
that the client got wrong jobTrackerUrl when resourcemanager has ha instances
   
   ### Why are the changes needed?
   
   When the clusters set HA ResourceManagers , the conf  
`yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`  will 
replace  the conf `yarn.resourcemanager.address`. 
   
   the conf `yarn.resourcemanager.address` may be not set,  then the method 
Hadoop23Shims.getJobLauncherRpcAddress  will return a wrong value. 
   Maybe it should return the value of  the conf 
`yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`
   https://issues.apache.org/jira/browse/HIVE-24119
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   ### How was this patch tested?
   
   No



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479013)
Remaining Estimate: 0h
Time Spent: 10m

> fix the issue that the client got wrong jobTrackerUrl when resourcemanager 
> has ha instances
> ---
>
> Key: HIVE-24119
> URL: https://issues.apache.org/jira/browse/HIVE-24119
> Project: Hive
>  Issue Type: Improvement
>  Components: Shims
>Affects Versions: 2.1.0
> Environment: ha resourcemanager
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
> Attachments: image-2020-09-04-16-34-28-341.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the clusters set HA ResourceManagers , the conf  
> `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`  
> will replace  the conf `yarn.resourcemanager.address`. 
> the conf `yarn.resourcemanager.address` may be not set,  then the method 
> Hadoop23Shims.getJobLauncherRpcAddress  will return a wrong value. 
> !image-2020-09-04-16-34-28-341.png!
> Maybe it should return the value of  the conf 
> `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24119) fix the issue that the client got wrong jobTrackerUrl when resourcemanager has ha instances

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24119:
--
Labels: pull-request-available  (was: )

> fix the issue that the client got wrong jobTrackerUrl when resourcemanager 
> has ha instances
> ---
>
> Key: HIVE-24119
> URL: https://issues.apache.org/jira/browse/HIVE-24119
> Project: Hive
>  Issue Type: Improvement
>  Components: Shims
>Affects Versions: 2.1.0
> Environment: ha resourcemanager
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2020-09-04-16-34-28-341.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the clusters set HA ResourceManagers , the conf  
> `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`  
> will replace  the conf `yarn.resourcemanager.address`. 
> the conf `yarn.resourcemanager.address` may be not set,  then the method 
> Hadoop23Shims.getJobLauncherRpcAddress  will return a wrong value. 
> !image-2020-09-04-16-34-28-341.png!
> Maybe it should return the value of  the conf 
> `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24119) fix the issue that the client got wrong jobTrackerUrl when resourcemanager has ha instances

2020-09-04 Thread Max Xie (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max  Xie reassigned HIVE-24119:
---


> fix the issue that the client got wrong jobTrackerUrl when resourcemanager 
> has ha instances
> ---
>
> Key: HIVE-24119
> URL: https://issues.apache.org/jira/browse/HIVE-24119
> Project: Hive
>  Issue Type: Improvement
>  Components: Shims
>Affects Versions: 2.1.0
> Environment: ha resourcemanager
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
> Attachments: image-2020-09-04-16-34-28-341.png
>
>
> When the clusters set HA ResourceManagers , the conf  
> `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`  
> will replace  the conf `yarn.resourcemanager.address`. 
> the conf `yarn.resourcemanager.address` may be not set,  then the method 
> Hadoop23Shims.getJobLauncherRpcAddress  will return a wrong value. 
> !image-2020-09-04-16-34-28-341.png!
> Maybe it should return the value of  the conf 
> `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23406) SharedWorkOptimizer should check nullSortOrders when comparing ReduceSink operators

2020-09-04 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-23406.
---
Resolution: Fixed

Pushed to master, thanks [~jcamachorodriguez] for review.

> SharedWorkOptimizer should check nullSortOrders when comparing ReduceSink 
> operators
> ---
>
> Key: HIVE-23406
> URL: https://issues.apache.org/jira/browse/HIVE-23406
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SharedWorkOptimizer does not checks null sort order in ReduceSinkDesc when 
> compares ReduceSink operators:
>  
> [https://github.com/apache/hive/blob/ca9aba606c4d09b91ee28bf9ee1ae918db8cdfb9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1444]
> {code:java}
>   ReduceSinkDesc op1Conf = ((ReduceSinkOperator) op1).getConf();
>   ReduceSinkDesc op2Conf = ((ReduceSinkOperator) op2).getConf();
>   if (StringUtils.equals(op1Conf.getKeyColString(), 
> op2Conf.getKeyColString()) &&
> StringUtils.equals(op1Conf.getValueColsString(), 
> op2Conf.getValueColsString()) &&
> StringUtils.equals(op1Conf.getParitionColsString(), 
> op2Conf.getParitionColsString()) &&
> op1Conf.getTag() == op2Conf.getTag() &&
> StringUtils.equals(op1Conf.getOrder(), op2Conf.getOrder()) &&
> op1Conf.getTopN() == op2Conf.getTopN() &&
> canDeduplicateReduceTraits(op1Conf, op2Conf)) {
> return true;
>   } else {
> return false;
>   }
> {code}
> An expression like
> {code:java}
> StringUtils.equals(op1Conf.getNullOrder(), op2Conf.getNullOrder()) &&
> {code}
> should be added.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23888) Simplify special_character_in_tabnames_1.q

2020-09-04 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190618#comment-17190618
 ] 

Krisztian Kasa commented on HIVE-23888:
---

Pushed to master, thanks [~jcamachorodriguez] for review.

> Simplify special_character_in_tabnames_1.q
> --
>
> Key: HIVE-23888
> URL: https://issues.apache.org/jira/browse/HIVE-23888
> Project: Hive
>  Issue Type: Task
>  Components: Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> * move similar queries to unit tests into the parser module and keep only one 
> in the q test.
> * use *explain* instead of executing the queries if possible since we are 
> focusing on parser testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23406) SharedWorkOptimizer should check nullSortOrders when comparing ReduceSink operators

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23406?focusedWorklogId=478989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-478989
 ]

ASF GitHub Bot logged work on HIVE-23406:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 08:04
Start Date: 04/Sep/20 08:04
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #1464:
URL: https://github.com/apache/hive/pull/1464


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 478989)
Time Spent: 20m  (was: 10m)

> SharedWorkOptimizer should check nullSortOrders when comparing ReduceSink 
> operators
> ---
>
> Key: HIVE-23406
> URL: https://issues.apache.org/jira/browse/HIVE-23406
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SharedWorkOptimizer does not checks null sort order in ReduceSinkDesc when 
> compares ReduceSink operators:
>  
> [https://github.com/apache/hive/blob/ca9aba606c4d09b91ee28bf9ee1ae918db8cdfb9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1444]
> {code:java}
>   ReduceSinkDesc op1Conf = ((ReduceSinkOperator) op1).getConf();
>   ReduceSinkDesc op2Conf = ((ReduceSinkOperator) op2).getConf();
>   if (StringUtils.equals(op1Conf.getKeyColString(), 
> op2Conf.getKeyColString()) &&
> StringUtils.equals(op1Conf.getValueColsString(), 
> op2Conf.getValueColsString()) &&
> StringUtils.equals(op1Conf.getParitionColsString(), 
> op2Conf.getParitionColsString()) &&
> op1Conf.getTag() == op2Conf.getTag() &&
> StringUtils.equals(op1Conf.getOrder(), op2Conf.getOrder()) &&
> op1Conf.getTopN() == op2Conf.getTopN() &&
> canDeduplicateReduceTraits(op1Conf, op2Conf)) {
> return true;
>   } else {
> return false;
>   }
> {code}
> An expression like
> {code:java}
> StringUtils.equals(op1Conf.getNullOrder(), op2Conf.getNullOrder()) &&
> {code}
> should be added.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)