[jira] [Work logged] (HIVE-23731) Review of AvroInstance Cache
[ https://issues.apache.org/jira/browse/HIVE-23731?focusedWorklogId=479287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479287 ] ASF GitHub Bot logged work on HIVE-23731: - Author: ASF GitHub Bot Created on: 05/Sep/20 01:21 Start Date: 05/Sep/20 01:21 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1153: URL: https://github.com/apache/hive/pull/1153#issuecomment-687508001 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479287) Time Spent: 1h (was: 50m) > Review of AvroInstance Cache > > > Key: HIVE-23731 > URL: https://issues.apache.org/jira/browse/HIVE-23731 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23802) “merge files” job was submited to default queue when set hive.merge.tezfiles to true
[ https://issues.apache.org/jira/browse/HIVE-23802?focusedWorklogId=479286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479286 ] ASF GitHub Bot logged work on HIVE-23802: - Author: ASF GitHub Bot Created on: 05/Sep/20 01:21 Start Date: 05/Sep/20 01:21 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1206: URL: https://github.com/apache/hive/pull/1206#issuecomment-687507985 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479286) Time Spent: 20m (was: 10m) > “merge files” job was submited to default queue when set hive.merge.tezfiles > to true > > > Key: HIVE-23802 > URL: https://issues.apache.org/jira/browse/HIVE-23802 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.1.0 >Reporter: gaozhan ding >Assignee: gaozhan ding >Priority: Major > Labels: pull-request-available > Attachments: 15940042679272.png, HIVE-23802.patch > > Time Spent: 20m > Remaining Estimate: 0h > > We use tez as the query engine. When hive.merge.tezfiles set to true,merge > files task, which followed by orginal task, will be submit to default queue > rather then the queue same with orginal task. > I study this issue for days and found that, every time starting a container, > "tez,queue.name" whill be unset in current session. Code are as below: > {code:java} > // TezSessionState.startSessionAndContainers() > // sessionState.getQueueName() comes from cluster wide configured queue names. > // sessionState.getConf().get("tez.queue.name") is explicitly set by user in > a session. > // TezSessionPoolManager sets tez.queue.name if user has specified one or > use the one from > // cluster wide queue names. > // There is no way to differentiate how this was set (user vs system). > // Unset this after opening the session so that reopening of session uses > the correct queue > // names i.e, if client has not died and if the user has explicitly set a > queue name > // then reopened session will use user specified queue name else default > cluster queue names. > conf.unset(TezConfiguration.TEZ_QUEUE_NAME); > {code} > So after the orgin task was submited to yarn, "tez.queue.name" will be unset. > While starting merge file task, it will try use the same session with orgin > job, but get false due to tez.queue.name was unset. Seems like we could not > unset this property. > {code:java} > // TezSessionPoolManager.canWorkWithSameSession() > if (!session.isDefault()) { > String queueName = session.getQueueName(); > String confQueueName = conf.get(TezConfiguration.TEZ_QUEUE_NAME); > LOG.info("Current queue name is " + queueName + " incoming queue name is " > + confQueueName); > return (queueName == null) ? confQueueName == null : > queueName.equals(confQueueName); > } else { > // this session should never be a default session unless something has > messed up. > throw new HiveException("The pool session " + session + " should have been > returned to the pool"); > } > {code} > !15940042679272.png! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24039) Update jquery version to mitigate CVE-2020-11023
[ https://issues.apache.org/jira/browse/HIVE-24039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das reassigned HIVE-24039: - Assignee: Rajkumar Singh (was: Kishen Das) > Update jquery version to mitigate CVE-2020-11023 > > > Key: HIVE-24039 > URL: https://issues.apache.org/jira/browse/HIVE-24039 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > > there is known vulnerability in jquery version used by hive, with this jira > plan is to upgrade the jquery version 3.5.0 where it's been fixed. more > details about the vulnerability can be found here. > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11023 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment
[ https://issues.apache.org/jira/browse/HIVE-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190899#comment-17190899 ] Ashutosh Chauhan commented on HIVE-23408: - +1 > Hive on Tez : Kafka storage handler broken in secure environment > - > > Key: HIVE-23408 > URL: https://issues.apache.org/jira/browse/HIVE-23408 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Rajkumar Singh >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > hive.server2.authentication.kerberos.principal set in the form of > hive/_HOST@REALM, > Tez task can start at the random NM host and unfold the value of _HOST with > the value of fqdn where it is running. this leads to an authentication issue. > for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 > onwards support delegation token and we should take advantage of it for hive > on tez. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException
[ https://issues.apache.org/jira/browse/HIVE-23454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23454: -- Labels: pull-request-available (was: ) > Querying hive table which has Materialized view fails with > HiveAccessControlException > - > > Key: HIVE-23454 > URL: https://issues.apache.org/jira/browse/HIVE-23454 > Project: Hive > Issue Type: Bug > Components: Authorization, HiveServer2 >Affects Versions: 3.0.0, 3.2.0 >Reporter: Chiran Ravani >Assignee: Vineet Garg >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Query fails with HiveAccessControlException against table when there is > Materialized view pointing to that table which end user does not have access > to, but the actual table user has all the privileges. > From the HiveServer2 logs - it looks as part of optimization Hive uses > materialized view to query the data instead of table and since end user does > not have access on MV we receive HiveAccessControlException. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99 > The Simplest reproducer for this issue is as below. > 1. Create a table using hive user and insert some data > {code:java} > create table db1.testmvtable(id int, name string) partitioned by(year int); > insert into db1.testmvtable partition(year=2020) values(1,'Name1'); > insert into db1.testmvtable partition(year=2020) values(1,'Name2'); > insert into db1.testmvtable partition(year=2016) values(1,'Name1'); > insert into db1.testmvtable partition(year=2016) values(1,'Name2'); > {code} > 2. Create Materialized view on top of above table with partitioned and where > clause as hive user. > {code:java} > CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from > db1.testmvtable tmv where year >= 2018; > {code} > 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on > database db1. > 4. Run select on base table db1.testmvtable as 'chiran' with where clause > having partition value >=2018, it runs into HiveAccessControlException on > db2.testmv > {code:java} > eg:- (select * from db1.testmvtable where year=2020;) > 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020; > Error: Error while compiling statement: FAILED: HiveAccessControlException > Permission denied: user [chiran] does not have [SELECT] privilege on > [db2/testmv/*] (state=42000,code=4) > {code} > 5. This works when partition column is not in MV > {code:java} > 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016; > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): > select * from db1.testmvtable where year=2016 > DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 > txnid:897 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, > comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), > FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); > Time taken: 0.222 seconds > DEBUG : Encoding valid txn write ids info > 897$db1.testmvtable:4:9223372036854775807:: txnid:897 > INFO : Executing > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): > select * from db1.testmvtable where year=2016 > INFO : Completed executing > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); > Time taken: 0.008 seconds > INFO : OK > DEBUG : Shutting down query select * from db1.testmvtable where year=2016 > +-+---+---+ > | testmvtable.id | testmvtable.name | testmvtable.year | > +-+---+---+ > | 1 | Name1 | 2016 | > | 1 | Name2 | 2016 | > +-+---+---+ > 2 rows selected (0.302 seconds) > 0: jdbc:hive2://node2> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException
[ https://issues.apache.org/jira/browse/HIVE-23454?focusedWorklogId=479213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479213 ] ASF GitHub Bot logged work on HIVE-23454: - Author: ASF GitHub Bot Created on: 04/Sep/20 18:22 Start Date: 04/Sep/20 18:22 Worklog Time Spent: 10m Work Description: vineetgarg02 opened a new pull request #1471: URL: https://github.com/apache/hive/pull/1471 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479213) Remaining Estimate: 0h Time Spent: 10m > Querying hive table which has Materialized view fails with > HiveAccessControlException > - > > Key: HIVE-23454 > URL: https://issues.apache.org/jira/browse/HIVE-23454 > Project: Hive > Issue Type: Bug > Components: Authorization, HiveServer2 >Affects Versions: 3.0.0, 3.2.0 >Reporter: Chiran Ravani >Assignee: Vineet Garg >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Query fails with HiveAccessControlException against table when there is > Materialized view pointing to that table which end user does not have access > to, but the actual table user has all the privileges. > From the HiveServer2 logs - it looks as part of optimization Hive uses > materialized view to query the data instead of table and since end user does > not have access on MV we receive HiveAccessControlException. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99 > The Simplest reproducer for this issue is as below. > 1. Create a table using hive user and insert some data > {code:java} > create table db1.testmvtable(id int, name string) partitioned by(year int); > insert into db1.testmvtable partition(year=2020) values(1,'Name1'); > insert into db1.testmvtable partition(year=2020) values(1,'Name2'); > insert into db1.testmvtable partition(year=2016) values(1,'Name1'); > insert into db1.testmvtable partition(year=2016) values(1,'Name2'); > {code} > 2. Create Materialized view on top of above table with partitioned and where > clause as hive user. > {code:java} > CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from > db1.testmvtable tmv where year >= 2018; > {code} > 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on > database db1. > 4. Run select on base table db1.testmvtable as 'chiran' with where clause > having partition value >=2018, it runs into HiveAccessControlException on > db2.testmv > {code:java} > eg:- (select * from db1.testmvtable where year=2020;) > 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020; > Error: Error while compiling statement: FAILED: HiveAccessControlException > Permission denied: user [chiran] does not have [SELECT] privilege on > [db2/testmv/*] (state=42000,code=4) > {code} > 5. This works when partition column is not in MV > {code:java} > 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016; > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): > select * from db1.testmvtable where year=2016 > DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 > txnid:897 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, > comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), > FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); > Time taken: 0.222 seconds > DEBUG : Encoding valid txn write ids info > 897$db1.testmvtable:4:9223372036854775807:: txnid:897 > INFO : Executing > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): > select * from db1.testmvtable where year=2016 > INFO : Completed executing > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); > Time taken: 0.008 seconds > INFO : OK > DEBUG : Shutting down query
[jira] [Assigned] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException
[ https://issues.apache.org/jira/browse/HIVE-23454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg reassigned HIVE-23454: -- Assignee: Vineet Garg (was: Nishant Goel) > Querying hive table which has Materialized view fails with > HiveAccessControlException > - > > Key: HIVE-23454 > URL: https://issues.apache.org/jira/browse/HIVE-23454 > Project: Hive > Issue Type: Bug > Components: Authorization, HiveServer2 >Affects Versions: 3.0.0, 3.2.0 >Reporter: Chiran Ravani >Assignee: Vineet Garg >Priority: Critical > > Query fails with HiveAccessControlException against table when there is > Materialized view pointing to that table which end user does not have access > to, but the actual table user has all the privileges. > From the HiveServer2 logs - it looks as part of optimization Hive uses > materialized view to query the data instead of table and since end user does > not have access on MV we receive HiveAccessControlException. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99 > The Simplest reproducer for this issue is as below. > 1. Create a table using hive user and insert some data > {code:java} > create table db1.testmvtable(id int, name string) partitioned by(year int); > insert into db1.testmvtable partition(year=2020) values(1,'Name1'); > insert into db1.testmvtable partition(year=2020) values(1,'Name2'); > insert into db1.testmvtable partition(year=2016) values(1,'Name1'); > insert into db1.testmvtable partition(year=2016) values(1,'Name2'); > {code} > 2. Create Materialized view on top of above table with partitioned and where > clause as hive user. > {code:java} > CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from > db1.testmvtable tmv where year >= 2018; > {code} > 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on > database db1. > 4. Run select on base table db1.testmvtable as 'chiran' with where clause > having partition value >=2018, it runs into HiveAccessControlException on > db2.testmv > {code:java} > eg:- (select * from db1.testmvtable where year=2020;) > 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020; > Error: Error while compiling statement: FAILED: HiveAccessControlException > Permission denied: user [chiran] does not have [SELECT] privilege on > [db2/testmv/*] (state=42000,code=4) > {code} > 5. This works when partition column is not in MV > {code:java} > 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016; > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): > select * from db1.testmvtable where year=2016 > DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 > txnid:897 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, > comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), > FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); > Time taken: 0.222 seconds > DEBUG : Encoding valid txn write ids info > 897$db1.testmvtable:4:9223372036854775807:: txnid:897 > INFO : Executing > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): > select * from db1.testmvtable where year=2016 > INFO : Completed executing > command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); > Time taken: 0.008 seconds > INFO : OK > DEBUG : Shutting down query select * from db1.testmvtable where year=2016 > +-+---+---+ > | testmvtable.id | testmvtable.name | testmvtable.year | > +-+---+---+ > | 1 | Name1 | 2016 | > | 1 | Name2 | 2016 | > +-+---+---+ > 2 rows selected (0.302 seconds) > 0: jdbc:hive2://node2> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
[ https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804 ] László Bodor edited comment on HIVE-24111 at 9/4/20, 4:45 PM: -- For reference: logs for a good run: [^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] logs for a hanging run: [^TestCrudCompactorTez.log] what is strange for the first sight, I cannot see MergeManager related log messages when it's expected, so this could be a shuffle issue good run: {code} 2020-09-03T15:13:19,604 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-03T15:13:19,606 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-03T15:13:19,606 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-03T15:13:19,606 INFO [ShuffleAndMergeRunner {Map_1}] orderedgrouped.MergeManager: Setting merger's parent thread to ShuffleAndMergeRunner {Map_1} 2020-09-03T15:13:19,606 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0 2020-09-03T15:13:19,607 INFO [TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0 2020-09-03T15:13:19,607 INFO [TezChild] exec.SerializationUtilities: Deserializing ReduceWork using kryo 2020-09-03T15:13:19,607 INFO [TezChild] exec.Utilities: Deserialized plan (via RPC) - name: Reducer 2 size: 1.87KB 2020-09-03T15:13:19,607 INFO [TezChild] tez.ObjectCache: Caching key: lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 2__REDUCE_PLAN__ 2020-09-03T15:13:19,607 INFO [TezChild] tez.RecordProcessor: conf class path = [] 2020-09-03T15:13:19,608 INFO [TezChild] tez.RecordProcessor: thread class path = [] 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. Current: path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out, len=26 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: Completed fetch for attempt: {0, 0, attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1 {code} hanging run: {code} 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-04T02:12:16,394 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-04T02:12:16,398 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-04T02:12:16,398 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-04T02:12:16,398 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0 2020-09-04T02:12:16,416 INFO [TezChild]
[jira] [Comment Edited] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
[ https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804 ] László Bodor edited comment on HIVE-24111 at 9/4/20, 4:39 PM: -- For reference: logs for a good run: [^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] logs for a hanging run: [^TestCrudCompactorTez.log] what is strange for the first sight, I cannot see MergeManager related log messages when it's expected, so this could be a shuffle issue good run: {code} 2020-09-03T15:13:19,604 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-03T15:13:19,606 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-03T15:13:19,606 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-03T15:13:19,606 INFO [ShuffleAndMergeRunner {Map_1}] orderedgrouped.MergeManager: Setting merger's parent thread to ShuffleAndMergeRunner {Map_1} 2020-09-03T15:13:19,606 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0 2020-09-03T15:13:19,607 INFO [TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0 2020-09-03T15:13:19,607 INFO [TezChild] exec.SerializationUtilities: Deserializing ReduceWork using kryo 2020-09-03T15:13:19,607 INFO [TezChild] exec.Utilities: Deserialized plan (via RPC) - name: Reducer 2 size: 1.87KB 2020-09-03T15:13:19,607 INFO [TezChild] tez.ObjectCache: Caching key: lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 2__REDUCE_PLAN__ 2020-09-03T15:13:19,607 INFO [TezChild] tez.RecordProcessor: conf class path = [] 2020-09-03T15:13:19,608 INFO [TezChild] tez.RecordProcessor: thread class path = [] 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. Current: path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out, len=26 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: Completed fetch for attempt: {0, 0, attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1 {code} hanging run: {code} 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-04T02:12:16,394 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-04T02:12:16,398 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-04T02:12:16,398 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-04T02:12:16,398 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0 2020-09-04T02:12:16,416 INFO [TezChild]
[jira] [Comment Edited] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
[ https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804 ] László Bodor edited comment on HIVE-24111 at 9/4/20, 4:17 PM: -- For reference: logs for a good run: [^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] logs for a hanging run: [^TestCrudCompactorTez.log] what is strange for the first sight, I cannot see MergeManager related log messages when it's expected, so this could be a shuffle issue good run: {code} 2020-09-03T15:13:19,604 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-03T15:13:19,606 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-03T15:13:19,606 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-03T15:13:19,606 INFO [ShuffleAndMergeRunner {Map_1}] orderedgrouped.MergeManager: Setting merger's parent thread to ShuffleAndMergeRunner {Map_1} 2020-09-03T15:13:19,606 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0 2020-09-03T15:13:19,607 INFO [TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0 2020-09-03T15:13:19,607 INFO [TezChild] exec.SerializationUtilities: Deserializing ReduceWork using kryo 2020-09-03T15:13:19,607 INFO [TezChild] exec.Utilities: Deserialized plan (via RPC) - name: Reducer 2 size: 1.87KB 2020-09-03T15:13:19,607 INFO [TezChild] tez.ObjectCache: Caching key: lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 2__REDUCE_PLAN__ 2020-09-03T15:13:19,607 INFO [TezChild] tez.RecordProcessor: conf class path = [] 2020-09-03T15:13:19,608 INFO [TezChild] tez.RecordProcessor: thread class path = [] 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. Current: path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out, len=26 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: Completed fetch for attempt: {0, 0, attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1 {code} hanging run: {code} 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-04T02:12:16,394 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-04T02:12:16,398 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-04T02:12:16,398 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-04T02:12:16,398 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0 2020-09-04T02:12:16,416 INFO [TezChild]
[jira] [Comment Edited] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
[ https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804 ] László Bodor edited comment on HIVE-24111 at 9/4/20, 4:13 PM: -- For reference: logs for a good run: [^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] logs for a hanging run: [^TestCrudCompactorTez.log] what is strange for the first sight, I cannot see MergeManager related log messages when it's expected, so this could be a shuffle issue good run: {code} 2020-09-03T15:13:19,604 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-03T15:13:19,606 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-03T15:13:19,606 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-03T15:13:19,606 INFO [ShuffleAndMergeRunner {Map_1}] orderedgrouped.MergeManager: Setting merger's parent thread to ShuffleAndMergeRunner {Map_1} 2020-09-03T15:13:19,606 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0 2020-09-03T15:13:19,607 INFO [TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0 2020-09-03T15:13:19,607 INFO [TezChild] exec.SerializationUtilities: Deserializing ReduceWork using kryo 2020-09-03T15:13:19,607 INFO [TezChild] exec.Utilities: Deserialized plan (via RPC) - name: Reducer 2 size: 1.87KB 2020-09-03T15:13:19,607 INFO [TezChild] tez.ObjectCache: Caching key: lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 2__REDUCE_PLAN__ 2020-09-03T15:13:19,607 INFO [TezChild] tez.RecordProcessor: conf class path = [] 2020-09-03T15:13:19,608 INFO [TezChild] tez.RecordProcessor: thread class path = [] 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. Current: path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out, len=26 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: Completed fetch for attempt: {0, 0, attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1 {code} hanging run: {code} 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-04T02:12:16,394 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-04T02:12:16,398 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-04T02:12:16,398 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-04T02:12:16,398 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0 2020-09-04T02:12:16,416 INFO [TezChild]
[jira] [Commented] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
[ https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190804#comment-17190804 ] László Bodor commented on HIVE-24111: - For reference: logs for a good run: [^org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log] logs for a hanging run: [^TestCrudCompactorTez.log] what is strange for the first sight, I cannot see MergeManager related log messages when it's expected, so this could be a shuffle issue good run: {code} 2020-09-03T15:13:19,604 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-03T15:13:19,605 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-03T15:13:19,606 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-03T15:13:19,606 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-03T15:13:19,606 INFO [ShuffleAndMergeRunner {Map_1}] orderedgrouped.MergeManager: Setting merger's parent thread to ShuffleAndMergeRunner {Map_1} 2020-09-03T15:13:19,606 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599171197926_0001_1_01_00_0 2020-09-03T15:13:19,607 INFO [TezTaskEventRouter{attempt_1599171197926_0001_1_01_00_0}] orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: Map 1: numDmeEventsSeen=1, numDmeEventsSeenWithNoData=0, numObsoletionEventsSeen=0 2020-09-03T15:13:19,607 INFO [TezChild] exec.SerializationUtilities: Deserializing ReduceWork using kryo 2020-09-03T15:13:19,607 INFO [TezChild] exec.Utilities: Deserialized plan (via RPC) - name: Reducer 2 size: 1.87KB 2020-09-03T15:13:19,607 INFO [TezChild] tez.ObjectCache: Caching key: lbodor_20200903151317_7f539b53-07fb-4bb1-97db-c37d72aba99d_Reducer 2__REDUCE_PLAN__ 2020-09-03T15:13:19,607 INFO [TezChild] tez.RecordProcessor: conf class path = [] 2020-09-03T15:13:19,608 INFO [TezChild] tez.RecordProcessor: thread class path = [] 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.MergeManager: close onDiskFile. State: NumOnDiskFiles=1. Current: path=/Users/lbodor/apache/hive/itests/hive-unit/target/tmp/scratchdir/lbodor/_tez_session_dir/e01fa9d5-36d9-4449-bfa4-d12b5fa290f8/.tez/application_1599171197926_0001_wd/localmode-local-dir/output/attempt_1599171197926_0001_1_00_00_0_10098/file.out, len=26 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] ShuffleScheduler.fetch: Completed fetch for attempt: {0, 0, attempt_1599171197926_0001_1_00_00_0_10098} to DISK_DIRECT, csize=26, dsize=22, EndTime=1599171199608, TimeTaken=1, Rate=0.02 MB/s 2020-09-03T15:13:19,608 INFO [Fetcher_O {Map_1} #0] orderedgrouped.ShuffleScheduler: All inputs fetched for input vertex : Map 1 {code} hanging run: {code} 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.Shuffle: Map_1: Shuffle assigned with 1 inputs, codec: None, ifileReadAhead: true 2020-09-04T02:12:16,392 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.MergeManager: Map 1: MergerManager: memoryLimit=1278984847, maxSingleShuffleLimit=319746208, mergeThreshold=844130048, ioSortFactor=10, postMergeMem=0, memToMemMergeOutputsThreshold=10 2020-09-04T02:12:16,394 INFO [I/O Setup 0 Start: {Map 1}] orderedgrouped.ShuffleScheduler: ShuffleScheduler running for sourceVertex: Map 1 with configuration: maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, maxFailedUniqueFetches=1, abortFailureLimit=15, maxTaskOutputAtOnce=20, numFetchers=1, hostFailureFraction=0.2, minFailurePerHost=4, maxAllowedFailedFetchFraction=0.5, maxStallTimeFraction=0.5, minReqProgressFraction=0.5, checkFailedFetchSinceLastCompletion=true 2020-09-04T02:12:16,398 INFO [I/O Setup 0 Start: {Map 1}] runtime.LogicalIOProcessorRuntimeTask: Started Input with src edge: Map 1 2020-09-04T02:12:16,398 INFO [TezChild] runtime.LogicalIOProcessorRuntimeTask: AutoStartComplete 2020-09-04T02:12:16,398 INFO [TezChild] task.TaskRunner2Callable: Running task, taskAttemptId=attempt_1599210735282_0001_1_01_00_0 2020-09-04T02:12:16,416 INFO [TezChild] tez.ReduceRecordProcessor: Waiting for ShuffleInputs to become
[jira] [Updated] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
[ https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24111: Attachment: org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log > TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact > --- > > Key: HIVE-24111 > URL: https://issues.apache.org/jira/browse/HIVE-24111 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: TestCrudCompactorTez.log, jstack.log, > org.apache.hadoop.hive.ql.txn.compactor.TestMmCompactorOnTez-output.txt.log > > > Reproduced issue in ptest run which I made to run against tez staging > artifacts > (https://repository.apache.org/content/repositories/orgapachetez-1068/) > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1311/14/pipeline/417 > I'm about to investigate this. I think Tez 0.10.0 cannot be released until we > won't confirm if it's a hive or tez bug. > {code} > mvn test -Pitests,hadoop-2 -Dtest=TestMmCompactorOnTez -pl ./itests/hive-unit > {code} > tez setup: > https://github.com/apache/hive/commit/92516631ab39f39df5d0692f98ac32c2cd320997#diff-a22bcc9ba13b310c7abfee4a57c4b130R83-R97 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
[ https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24111: Attachment: (was: stacktrace.log) > TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact > --- > > Key: HIVE-24111 > URL: https://issues.apache.org/jira/browse/HIVE-24111 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: TestCrudCompactorTez.log, jstack.log > > > Reproduced issue in ptest run which I made to run against tez staging > artifacts > (https://repository.apache.org/content/repositories/orgapachetez-1068/) > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1311/14/pipeline/417 > I'm about to investigate this. I think Tez 0.10.0 cannot be released until we > won't confirm if it's a hive or tez bug. > {code} > mvn test -Pitests,hadoop-2 -Dtest=TestMmCompactorOnTez -pl ./itests/hive-unit > {code} > tez setup: > https://github.com/apache/hive/commit/92516631ab39f39df5d0692f98ac32c2cd320997#diff-a22bcc9ba13b310c7abfee4a57c4b130R83-R97 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
[ https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24111: Attachment: jstack.log > TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact > --- > > Key: HIVE-24111 > URL: https://issues.apache.org/jira/browse/HIVE-24111 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: TestCrudCompactorTez.log, jstack.log > > > Reproduced issue in ptest run which I made to run against tez staging > artifacts > (https://repository.apache.org/content/repositories/orgapachetez-1068/) > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1311/14/pipeline/417 > I'm about to investigate this. I think Tez 0.10.0 cannot be released until we > won't confirm if it's a hive or tez bug. > {code} > mvn test -Pitests,hadoop-2 -Dtest=TestMmCompactorOnTez -pl ./itests/hive-unit > {code} > tez setup: > https://github.com/apache/hive/commit/92516631ab39f39df5d0692f98ac32c2cd320997#diff-a22bcc9ba13b310c7abfee4a57c4b130R83-R97 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24111) TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact
[ https://issues.apache.org/jira/browse/HIVE-24111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24111: Attachment: stacktrace.log TestCrudCompactorTez.log > TestMmCompactorOnTez hangs when running against Tez 0.10.0 staging artifact > --- > > Key: HIVE-24111 > URL: https://issues.apache.org/jira/browse/HIVE-24111 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: TestCrudCompactorTez.log, stacktrace.log > > > Reproduced issue in ptest run which I made to run against tez staging > artifacts > (https://repository.apache.org/content/repositories/orgapachetez-1068/) > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1311/14/pipeline/417 > I'm about to investigate this. I think Tez 0.10.0 cannot be released until we > won't confirm if it's a hive or tez bug. > {code} > mvn test -Pitests,hadoop-2 -Dtest=TestMmCompactorOnTez -pl ./itests/hive-unit > {code} > tez setup: > https://github.com/apache/hive/commit/92516631ab39f39df5d0692f98ac32c2cd320997#diff-a22bcc9ba13b310c7abfee4a57c4b130R83-R97 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24121) dynamic_semijoin_reduction_on_aggcol.q is flaky
[ https://issues.apache.org/jira/browse/HIVE-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24121: Attachment: ci.hive.apache.org.txt > dynamic_semijoin_reduction_on_aggcol.q is flaky > --- > > Key: HIVE-24121 > URL: https://issues.apache.org/jira/browse/HIVE-24121 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Priority: Major > Attachments: ci.hive.apache.org.txt > > > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1379/3/tests > {code} > java.lang.AssertionError: > Client Execution succeeded but contained differences (error code = 1) after > executing dynamic_semijoin_reduction_on_aggcol.q > 186c186,188 > < 0 NULL > --- > > 0 val_0 > > 0 val_0 > > 0 val_0 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24121) dynamic_semijoin_reduction_on_aggcol.q is flaky
[ https://issues.apache.org/jira/browse/HIVE-24121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-24121: Description: http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1379/3/tests {code} java.lang.AssertionError: Client Execution succeeded but contained differences (error code = 1) after executing dynamic_semijoin_reduction_on_aggcol.q 186c186,188 < 0 NULL --- > 0 val_0 > 0 val_0 > 0 val_0 {code} > dynamic_semijoin_reduction_on_aggcol.q is flaky > --- > > Key: HIVE-24121 > URL: https://issues.apache.org/jira/browse/HIVE-24121 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Priority: Major > > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1379/3/tests > {code} > java.lang.AssertionError: > Client Execution succeeded but contained differences (error code = 1) after > executing dynamic_semijoin_reduction_on_aggcol.q > 186c186,188 > < 0 NULL > --- > > 0 val_0 > > 0 val_0 > > 0 val_0 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS
[ https://issues.apache.org/jira/browse/HIVE-24120?focusedWorklogId=479105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479105 ] ASF GitHub Bot logged work on HIVE-24120: - Author: ASF GitHub Bot Created on: 04/Sep/20 13:35 Start Date: 04/Sep/20 13:35 Worklog Time Spent: 10m Work Description: gatorblue opened a new pull request #1470: URL: https://github.com/apache/hive/pull/1470 This PR implements the design outlined in this doc: https://docs.google.com/document/d/1lR2IDiqjMiG1zb7o_4sEa6wULzH1CvtRodsghCtxYWY/edit#heading=h.imzjzzvayyi1 Quoting from the design doc: **Goals** Introduce a way to load a custom jar in metastore which can override database specific DatabaseProduct which provides the database specific SQL code execution in Hive Metastore. **Non-goals** We do not want to evolve metastore into a Datanucleus (ORM) like software which transparently handles all the different nuances of supporting multiple database types. The pluggable custom instance of the DatabaseProduct must be ANSI SQL compliant since the MetastoreDirectSQL and SqlGenerator assumes that. We would like to keep the changes to MetastoreDirectSQL at minimum and assume that all the supported databases are ANSI SQL compliant. Upgrade and performance testing of custom implementations. Schematool to be able to load this custom jar to execute schema initialization and upgrade scripts. This is not currently in scope of this document. **About this PR** As per the design, DatabaseProduct has been refactored as a class. It's a singleton class, which gets instantiated the first time method determineDatabaseProduct is invoked. Existing SQL logic that was scattered around other classes has been moved to methods in this class. This makes it possible for an external plugin to override the existing SQL logic. @vihangk1 @thejasmn @nrg4878 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479105) Remaining Estimate: 0h Time Spent: 10m > Plugin for external DatabaseProduct in standalone HMS > - > > Key: HIVE-24120 > URL: https://issues.apache.org/jira/browse/HIVE-24120 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.1 >Reporter: Gustavo Arocena >Assignee: Gustavo Arocena >Priority: Minor > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Add a pluggable way to support ANSI compliant databases as backends for > standalone HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS
[ https://issues.apache.org/jira/browse/HIVE-24120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24120: -- Labels: pull-request-available (was: ) > Plugin for external DatabaseProduct in standalone HMS > - > > Key: HIVE-24120 > URL: https://issues.apache.org/jira/browse/HIVE-24120 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.1 >Reporter: Gustavo Arocena >Assignee: Gustavo Arocena >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Add a pluggable way to support ANSI compliant databases as backends for > standalone HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24120) Plugin for external DatabaseProduct in standalone HMS
[ https://issues.apache.org/jira/browse/HIVE-24120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gustavo Arocena reassigned HIVE-24120: -- > Plugin for external DatabaseProduct in standalone HMS > - > > Key: HIVE-24120 > URL: https://issues.apache.org/jira/browse/HIVE-24120 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.1 >Reporter: Gustavo Arocena >Assignee: Gustavo Arocena >Priority: Minor > Fix For: 4.0.0 > > > Add a pluggable way to support ANSI compliant databases as backends for > standalone HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment
[ https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=479100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479100 ] ASF GitHub Bot logged work on HIVE-23408: - Author: ASF GitHub Bot Created on: 04/Sep/20 13:13 Start Date: 04/Sep/20 13:13 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1379: URL: https://github.com/apache/hive/pull/1379#discussion_r483607071 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ## @@ -265,11 +279,70 @@ public URI apply(Path path) { } dag.addURIsForCredentials(uris); } + getKafkaCredentials((MapWork)work, dag, conf); } - getCredentialsForFileSinks(work, dag); } + private void getKafkaCredentials(MapWork work, DAG dag, JobConf conf) { +Token tokenCheck = dag.getCredentials().getToken(KAFKA_DELEGATION_TOKEN_KEY); +if (tokenCheck != null) { + LOG.debug("Kafka credentials already added, skipping..."); + return; +} +LOG.info("Getting kafka credentials for mapwork: " + work.getName()); + +String kafkaBrokers = null; +Map partitions = work.getAliasToPartnInfo(); Review comment: @ashutoshc : what do you think about this? https://github.com/apache/hive/pull/1379/commits/edc4ad440af4e234b731104bbcb9837cbdf43a19#diff-d7b5b051769b68ed5dd602cc30744439R303-R306 (tested on cluster) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479100) Time Spent: 1h 10m (was: 1h) > Hive on Tez : Kafka storage handler broken in secure environment > - > > Key: HIVE-23408 > URL: https://issues.apache.org/jira/browse/HIVE-23408 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Rajkumar Singh >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > hive.server2.authentication.kerberos.principal set in the form of > hive/_HOST@REALM, > Tez task can start at the random NM host and unfold the value of _HOST with > the value of fqdn where it is running. this leads to an authentication issue. > for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 > onwards support delegation token and we should take advantage of it for hive > on tez. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23976) Enable vectorization for multi-col semi join reducers
[ https://issues.apache.org/jira/browse/HIVE-23976?focusedWorklogId=479072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479072 ] ASF GitHub Bot logged work on HIVE-23976: - Author: ASF GitHub Bot Created on: 04/Sep/20 11:49 Start Date: 04/Sep/20 11:49 Worklog Time Spent: 10m Work Description: abstractdog commented on pull request #1458: URL: https://github.com/apache/hive/pull/1458#issuecomment-687095101 @t3rmin4t0r , @zabetak , @ramesh0201: could you please take a look? good-old vectorization coverage patch, murmurhash with 2 column params This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479072) Time Spent: 20m (was: 10m) > Enable vectorization for multi-col semi join reducers > - > > Key: HIVE-23976 > URL: https://issues.apache.org/jira/browse/HIVE-23976 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-21196 introduces multi-column semi-join reducers in the query engine. > However, the implementation relies on GenericUDFMurmurHash which is not > vectorized thus the respective operators cannot be executed in vectorized > mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23995) Don't set location for managed tables in case of replication
[ https://issues.apache.org/jira/browse/HIVE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190685#comment-17190685 ] Aasha Medhi commented on HIVE-23995: This was removed because the entire migration and upgrades workflow needs to be throught through and won't work as is for the latest version of hive. > Don't set location for managed tables in case of replication > > > Key: HIVE-23995 > URL: https://issues.apache.org/jira/browse/HIVE-23995 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23995.01.patch, HIVE-23995.02.patch, > HIVE-23995.03.patch, HIVE-23995.04.patch, HIVE-23995.05.patch, > HIVE-23995.06.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Managed table location should not be set > Migration code of replication should be removed > add logging to all ack files > set hive.repl.data.copy.lazy to true -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23995) Don't set location for managed tables in case of replication
[ https://issues.apache.org/jira/browse/HIVE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190684#comment-17190684 ] Ádám Szita commented on HIVE-23995: --- This commit has removed HiveStrictManagedMigration.java (and the corresponding test) but I don't see its functionality being ported to anywhere. AFAIK it was not really related to replication but we use this class to prepare Hive warehouse between major upgrades. Was there a particular reason this had to be removed? cc: [~thejas] [~jdere] > Don't set location for managed tables in case of replication > > > Key: HIVE-23995 > URL: https://issues.apache.org/jira/browse/HIVE-23995 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23995.01.patch, HIVE-23995.02.patch, > HIVE-23995.03.patch, HIVE-23995.04.patch, HIVE-23995.05.patch, > HIVE-23995.06.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Managed table location should not be set > Migration code of replication should be removed > add logging to all ack files > set hive.repl.data.copy.lazy to true -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment
[ https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=479034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479034 ] ASF GitHub Bot logged work on HIVE-23408: - Author: ASF GitHub Bot Created on: 04/Sep/20 09:28 Start Date: 04/Sep/20 09:28 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1379: URL: https://github.com/apache/hive/pull/1379#discussion_r483501753 ## File path: pom.xml ## @@ -169,6 +169,7 @@ 4.13 5.6.2 5.6.2 +2.5.0 Review comment: sure! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479034) Time Spent: 1h (was: 50m) > Hive on Tez : Kafka storage handler broken in secure environment > - > > Key: HIVE-23408 > URL: https://issues.apache.org/jira/browse/HIVE-23408 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Rajkumar Singh >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > hive.server2.authentication.kerberos.principal set in the form of > hive/_HOST@REALM, > Tez task can start at the random NM host and unfold the value of _HOST with > the value of fqdn where it is running. this leads to an authentication issue. > for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 > onwards support delegation token and we should take advantage of it for hive > on tez. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24117) Fix for not setting managed table location in incremental load
[ https://issues.apache.org/jira/browse/HIVE-24117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190635#comment-17190635 ] Aasha Medhi commented on HIVE-24117: [~thejas] [~ngangam] Commit for this Jira is blocked on https://issues.apache.org/jira/browse/HIVE-23387 Tests are failing as getDefaultTablePath for managed tables returns external table path instead of managed table. > Fix for not setting managed table location in incremental load > -- > > Key: HIVE-24117 > URL: https://issues.apache.org/jira/browse/HIVE-24117 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24117.01.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24116) LLAP: Provide an opportunity for preempted tasks to get better locality in next iteration
[ https://issues.apache.org/jira/browse/HIVE-24116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan resolved HIVE-24116. - Fix Version/s: 4.0.0 Resolution: Fixed > LLAP: Provide an opportunity for preempted tasks to get better locality in > next iteration > - > > Key: HIVE-24116 > URL: https://issues.apache.org/jira/browse/HIVE-24116 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > In certain DAGs, tasks get preempted as higher priority tasks need to be > executed. These preempted tasks are scheduled to run later, but they end up > missing locality information. Ref: HIVE-24061 > Remote storage reads can be avoided, if an opportunity is provided for these > preempted to get better locality in next iteration. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24116) LLAP: Provide an opportunity for preempted tasks to get better locality in next iteration
[ https://issues.apache.org/jira/browse/HIVE-24116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190634#comment-17190634 ] Rajesh Balamohan commented on HIVE-24116: - Thanks [~gopalv] . Committed to master. > LLAP: Provide an opportunity for preempted tasks to get better locality in > next iteration > - > > Key: HIVE-24116 > URL: https://issues.apache.org/jira/browse/HIVE-24116 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In certain DAGs, tasks get preempted as higher priority tasks need to be > executed. These preempted tasks are scheduled to run later, but they end up > missing locality information. Ref: HIVE-24061 > Remote storage reads can be avoided, if an opportunity is provided for these > preempted to get better locality in next iteration. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24116) LLAP: Provide an opportunity for preempted tasks to get better locality in next iteration
[ https://issues.apache.org/jira/browse/HIVE-24116?focusedWorklogId=479021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479021 ] ASF GitHub Bot logged work on HIVE-24116: - Author: ASF GitHub Bot Created on: 04/Sep/20 08:59 Start Date: 04/Sep/20 08:59 Worklog Time Spent: 10m Work Description: rbalamohan commented on pull request #1466: URL: https://github.com/apache/hive/pull/1466#issuecomment-687020811 Thanks @prasanthj . Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479021) Time Spent: 40m (was: 0.5h) > LLAP: Provide an opportunity for preempted tasks to get better locality in > next iteration > - > > Key: HIVE-24116 > URL: https://issues.apache.org/jira/browse/HIVE-24116 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In certain DAGs, tasks get preempted as higher priority tasks need to be > executed. These preempted tasks are scheduled to run later, but they end up > missing locality information. Ref: HIVE-24061 > Remote storage reads can be avoided, if an opportunity is provided for these > preempted to get better locality in next iteration. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24117) Fix for not setting managed table location in incremental load
[ https://issues.apache.org/jira/browse/HIVE-24117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190635#comment-17190635 ] Aasha Medhi edited comment on HIVE-24117 at 9/4/20, 8:59 AM: - [~thejas] [~ngangam] Commit for this Jira is blocked on https://issues.apache.org/jira/browse/HIVE-23387 Tests are failing as getDefaultTablePath for managed tables returns external table path instead of managed table. cc [~anishek] was (Author: aasha): [~thejas] [~ngangam] Commit for this Jira is blocked on https://issues.apache.org/jira/browse/HIVE-23387 Tests are failing as getDefaultTablePath for managed tables returns external table path instead of managed table. > Fix for not setting managed table location in incremental load > -- > > Key: HIVE-24117 > URL: https://issues.apache.org/jira/browse/HIVE-24117 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24117.01.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24116) LLAP: Provide an opportunity for preempted tasks to get better locality in next iteration
[ https://issues.apache.org/jira/browse/HIVE-24116?focusedWorklogId=479020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479020 ] ASF GitHub Bot logged work on HIVE-24116: - Author: ASF GitHub Bot Created on: 04/Sep/20 08:57 Start Date: 04/Sep/20 08:57 Worklog Time Spent: 10m Work Description: rbalamohan merged pull request #1466: URL: https://github.com/apache/hive/pull/1466 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479020) Time Spent: 0.5h (was: 20m) > LLAP: Provide an opportunity for preempted tasks to get better locality in > next iteration > - > > Key: HIVE-24116 > URL: https://issues.apache.org/jira/browse/HIVE-24116 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In certain DAGs, tasks get preempted as higher priority tasks need to be > executed. These preempted tasks are scheduled to run later, but they end up > missing locality information. Ref: HIVE-24061 > Remote storage reads can be avoided, if an opportunity is provided for these > preempted to get better locality in next iteration. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24119) fix the issue that the client got wrong jobTrackerUrl when resourcemanager has ha instances
[ https://issues.apache.org/jira/browse/HIVE-24119?focusedWorklogId=479013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479013 ] ASF GitHub Bot logged work on HIVE-24119: - Author: ASF GitHub Bot Created on: 04/Sep/20 08:46 Start Date: 04/Sep/20 08:46 Worklog Time Spent: 10m Work Description: Neilxzn opened a new pull request #1469: URL: https://github.com/apache/hive/pull/1469 ### What changes were proposed in this pull request? change the method Hadoop23Shims.getJobLauncherRpcAddress to fix the issue that the client got wrong jobTrackerUrl when resourcemanager has ha instances ### Why are the changes needed? When the clusters set HA ResourceManagers , the conf `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2` will replace the conf `yarn.resourcemanager.address`. the conf `yarn.resourcemanager.address` may be not set, then the method Hadoop23Shims.getJobLauncherRpcAddress will return a wrong value. Maybe it should return the value of the conf `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2` https://issues.apache.org/jira/browse/HIVE-24119 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 479013) Remaining Estimate: 0h Time Spent: 10m > fix the issue that the client got wrong jobTrackerUrl when resourcemanager > has ha instances > --- > > Key: HIVE-24119 > URL: https://issues.apache.org/jira/browse/HIVE-24119 > Project: Hive > Issue Type: Improvement > Components: Shims >Affects Versions: 2.1.0 > Environment: ha resourcemanager >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Attachments: image-2020-09-04-16-34-28-341.png > > Time Spent: 10m > Remaining Estimate: 0h > > When the clusters set HA ResourceManagers , the conf > `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2` > will replace the conf `yarn.resourcemanager.address`. > the conf `yarn.resourcemanager.address` may be not set, then the method > Hadoop23Shims.getJobLauncherRpcAddress will return a wrong value. > !image-2020-09-04-16-34-28-341.png! > Maybe it should return the value of the conf > `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24119) fix the issue that the client got wrong jobTrackerUrl when resourcemanager has ha instances
[ https://issues.apache.org/jira/browse/HIVE-24119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24119: -- Labels: pull-request-available (was: ) > fix the issue that the client got wrong jobTrackerUrl when resourcemanager > has ha instances > --- > > Key: HIVE-24119 > URL: https://issues.apache.org/jira/browse/HIVE-24119 > Project: Hive > Issue Type: Improvement > Components: Shims >Affects Versions: 2.1.0 > Environment: ha resourcemanager >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Labels: pull-request-available > Attachments: image-2020-09-04-16-34-28-341.png > > Time Spent: 10m > Remaining Estimate: 0h > > When the clusters set HA ResourceManagers , the conf > `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2` > will replace the conf `yarn.resourcemanager.address`. > the conf `yarn.resourcemanager.address` may be not set, then the method > Hadoop23Shims.getJobLauncherRpcAddress will return a wrong value. > !image-2020-09-04-16-34-28-341.png! > Maybe it should return the value of the conf > `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24119) fix the issue that the client got wrong jobTrackerUrl when resourcemanager has ha instances
[ https://issues.apache.org/jira/browse/HIVE-24119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Xie reassigned HIVE-24119: --- > fix the issue that the client got wrong jobTrackerUrl when resourcemanager > has ha instances > --- > > Key: HIVE-24119 > URL: https://issues.apache.org/jira/browse/HIVE-24119 > Project: Hive > Issue Type: Improvement > Components: Shims >Affects Versions: 2.1.0 > Environment: ha resourcemanager >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Attachments: image-2020-09-04-16-34-28-341.png > > > When the clusters set HA ResourceManagers , the conf > `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2` > will replace the conf `yarn.resourcemanager.address`. > the conf `yarn.resourcemanager.address` may be not set, then the method > Hadoop23Shims.getJobLauncherRpcAddress will return a wrong value. > !image-2020-09-04-16-34-28-341.png! > Maybe it should return the value of the conf > `yarn.resourcemanager.address.rm1` or `yarn.resourcemanager.address.rm2` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23406) SharedWorkOptimizer should check nullSortOrders when comparing ReduceSink operators
[ https://issues.apache.org/jira/browse/HIVE-23406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-23406. --- Resolution: Fixed Pushed to master, thanks [~jcamachorodriguez] for review. > SharedWorkOptimizer should check nullSortOrders when comparing ReduceSink > operators > --- > > Key: HIVE-23406 > URL: https://issues.apache.org/jira/browse/HIVE-23406 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > SharedWorkOptimizer does not checks null sort order in ReduceSinkDesc when > compares ReduceSink operators: > > [https://github.com/apache/hive/blob/ca9aba606c4d09b91ee28bf9ee1ae918db8cdfb9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1444] > {code:java} > ReduceSinkDesc op1Conf = ((ReduceSinkOperator) op1).getConf(); > ReduceSinkDesc op2Conf = ((ReduceSinkOperator) op2).getConf(); > if (StringUtils.equals(op1Conf.getKeyColString(), > op2Conf.getKeyColString()) && > StringUtils.equals(op1Conf.getValueColsString(), > op2Conf.getValueColsString()) && > StringUtils.equals(op1Conf.getParitionColsString(), > op2Conf.getParitionColsString()) && > op1Conf.getTag() == op2Conf.getTag() && > StringUtils.equals(op1Conf.getOrder(), op2Conf.getOrder()) && > op1Conf.getTopN() == op2Conf.getTopN() && > canDeduplicateReduceTraits(op1Conf, op2Conf)) { > return true; > } else { > return false; > } > {code} > An expression like > {code:java} > StringUtils.equals(op1Conf.getNullOrder(), op2Conf.getNullOrder()) && > {code} > should be added. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23888) Simplify special_character_in_tabnames_1.q
[ https://issues.apache.org/jira/browse/HIVE-23888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190618#comment-17190618 ] Krisztian Kasa commented on HIVE-23888: --- Pushed to master, thanks [~jcamachorodriguez] for review. > Simplify special_character_in_tabnames_1.q > -- > > Key: HIVE-23888 > URL: https://issues.apache.org/jira/browse/HIVE-23888 > Project: Hive > Issue Type: Task > Components: Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > * move similar queries to unit tests into the parser module and keep only one > in the q test. > * use *explain* instead of executing the queries if possible since we are > focusing on parser testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23406) SharedWorkOptimizer should check nullSortOrders when comparing ReduceSink operators
[ https://issues.apache.org/jira/browse/HIVE-23406?focusedWorklogId=478989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-478989 ] ASF GitHub Bot logged work on HIVE-23406: - Author: ASF GitHub Bot Created on: 04/Sep/20 08:04 Start Date: 04/Sep/20 08:04 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #1464: URL: https://github.com/apache/hive/pull/1464 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 478989) Time Spent: 20m (was: 10m) > SharedWorkOptimizer should check nullSortOrders when comparing ReduceSink > operators > --- > > Key: HIVE-23406 > URL: https://issues.apache.org/jira/browse/HIVE-23406 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > SharedWorkOptimizer does not checks null sort order in ReduceSinkDesc when > compares ReduceSink operators: > > [https://github.com/apache/hive/blob/ca9aba606c4d09b91ee28bf9ee1ae918db8cdfb9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1444] > {code:java} > ReduceSinkDesc op1Conf = ((ReduceSinkOperator) op1).getConf(); > ReduceSinkDesc op2Conf = ((ReduceSinkOperator) op2).getConf(); > if (StringUtils.equals(op1Conf.getKeyColString(), > op2Conf.getKeyColString()) && > StringUtils.equals(op1Conf.getValueColsString(), > op2Conf.getValueColsString()) && > StringUtils.equals(op1Conf.getParitionColsString(), > op2Conf.getParitionColsString()) && > op1Conf.getTag() == op2Conf.getTag() && > StringUtils.equals(op1Conf.getOrder(), op2Conf.getOrder()) && > op1Conf.getTopN() == op2Conf.getTopN() && > canDeduplicateReduceTraits(op1Conf, op2Conf)) { > return true; > } else { > return false; > } > {code} > An expression like > {code:java} > StringUtils.equals(op1Conf.getNullOrder(), op2Conf.getNullOrder()) && > {code} > should be added. > -- This message was sent by Atlassian Jira (v8.3.4#803005)