[jira] [Resolved] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3
[ https://issues.apache.org/jira/browse/HIVE-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-26522. -- Fix Version/s: 2.3.9 3.2.0 4.0.0 Resolution: Fixed Thanks for the contribution [~planka] ! Patch merged to all branches. > Test for HIVE-22033 and backport to 3.1 and 2.3 > --- > > Key: HIVE-26522 > URL: https://issues.apache.org/jira/browse/HIVE-26522 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 2.3.8, 3.1.3 >Reporter: Pavan Lanka >Assignee: Pavan Lanka >Priority: Major > Labels: pull-request-available > Fix For: 2.3.9, 3.2.0, 4.0.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal > time is effective. > This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 > and 2.3 branches in Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-25646) Thrift metastore URI reverse resolution could fail in some environments
[ https://issues.apache.org/jira/browse/HIVE-25646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-25646. -- Fix Version/s: 3.2.0 4.0.0 Resolution: Fixed > Thrift metastore URI reverse resolution could fail in some environments > --- > > Key: HIVE-25646 > URL: https://issues.apache.org/jira/browse/HIVE-25646 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2, 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0, 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When custom URI resolver is not specified, the default thrift metastore URI > goes through DNS reverse resolution (getCanonicalHostname) which is unlikely > to resolve correctly when the HMS is sitting behind LBs and proxies. This is > a change in behaviour from hive 2.x branch which isn't required. If reverse > resolution is required, custom URI resolver can be implemented. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25646) Thrift metastore URI reverse resolution could fail in some environments
[ https://issues.apache.org/jira/browse/HIVE-25646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-25646: Assignee: Prasanth Jayachandran > Thrift metastore URI reverse resolution could fail in some environments > --- > > Key: HIVE-25646 > URL: https://issues.apache.org/jira/browse/HIVE-25646 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2, 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When custom URI resolver is not specified, the default thrift metastore URI > goes through DNS reverse resolution (getCanonicalHostname) which is unlikely > to resolve correctly when the HMS is sitting behind LBs and proxies. This is > a change in behaviour from hive 2.x branch which isn't required. If reverse > resolution is required, custom URI resolver can be implemented. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-24866) FileNotFoundException during alter table concat
[ https://issues.apache.org/jira/browse/HIVE-24866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-24866: > FileNotFoundException during alter table concat > --- > > Key: HIVE-24866 > URL: https://issues.apache.org/jira/browse/HIVE-24866 > Project: Hive > Issue Type: Bug >Affects Versions: 2.4.0, 3.2.0, 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > Because of the way combinefile IF groups files based on node and rack > locality, there are cases where single big orc file gets spread across 2 or > more combine hive split. When first task completes, as part of jobCloseOp the > source orc file of concatenation is moved/renamed which can lead to > FileNotFoundException in subsequent mappers that has partial split of that > file. > A simple fix would be for the mapper with start of the split to own the > entire orc file for concatenation. If a mapper gets partial split which is > not the start then it can skip the entire file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24786: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. > > Also HIVE-12371 seems to apply the socket timeout only to binary transport. > Same can be passed on to http client as well to avoid retry hang issues with > infinite timeouts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24786: - Description: When hiveserver2 is behind multiple proxies there is possibility of "broken pipe", "connect timeout" and "read timeout" exceptions if one of the intermediate proxies or load balancers decided to reset the underlying tcp socket after idle timeout. When the connection is broken and when the query is submitted after idle timeout from beeline (or client) perspective the connection is open but http methods (POST/GET) fails with socket related exceptions. Since these methods are not sent to the server these are safe for client side retries. Also HIVE-12371 seems to apply the socket timeout only to binary transport. Same can be passed on to http client as well to avoid retry hang issues with infinite timeouts. was:When hiveserver2 is behind multiple proxies there is possibility of "broken pipe", "connect timeout" and "read timeout" exceptions if one of the intermediate proxies or load balancers decided to reset the underlying tcp socket after idle timeout. When the connection is broken and when the query is submitted after idle timeout from beeline (or client) perspective the connection is open but http methods (POST/GET) fails with socket related exceptions. Since these methods are not sent to the server these are safe for client side retries. > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. > > Also HIVE-12371 seems to apply the socket timeout only to binary transport. > Same can be passed on to http client as well to avoid retry hang issues with > infinite timeouts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-22196) Socket timeouts happen when other drivers set DriverManager.loginTimeout
[ https://issues.apache.org/jira/browse/HIVE-22196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-22196. -- Resolution: Fixed Fixed by HIVE-12371 > Socket timeouts happen when other drivers set DriverManager.loginTimeout > > > Key: HIVE-22196 > URL: https://issues.apache.org/jira/browse/HIVE-22196 > Project: Hive > Issue Type: Bug > Components: JDBC, Thrift API >Affects Versions: 1.2.1, 2.0.0, 3.1.2 > Environment: Any Hive JDBC client that uses other SQL clients besides > Hive, or any other kind of JDBC driver (e.g. connection pooling). This can > only happen if the other driver writes values to > {{DriverManager.setLoginTimeout()}}. HikariCP is one suspect, there are > probably others as well. >Reporter: Nathan Clark >Priority: Major > > There are a few somewhat sketchy things happening in Hive/Thrift code in the > JDBC client that result in intermittent "read timed out" (and subsequently > "out of sequence") errors when other JDBC drivers are active in the same > client JVM that set {{DriverManager.loginTimeout}}. > # The login timeout used to initialize a {{HiveConnection}} is populated > from {{DriverManager.loginTimeout}} in the core Java JDBC library. This > sounds like a nice, orthodox place to get a login timeout from, but it's > fundamentally problematic and really shouldn't be used. The reason is that > it's a *global* singleton value, and any JDBC Driver (or any other piece of > code for that matter) can write to it at will (and is implicitly invited to). > The Hive JDBC stack _itself_ writes values to this global setting in a couple > of places seemingly unrelated to the client connection setup. > # The _read_ timeout for Thrift _socket-level_ reads is actually populated > from this _login_ timeout (a.k.a. "connect timeout") setting. (See Thrift's > {{TSocket(String host, int port, int timeout)}} and its callers in > {{HiveAuthFactory}}. Also note the numerous code comments that speak of > setting {{SO_TIMEOUT}} (the socket read timeout) while the actual code > references a variable called {{loginTimeout}}.) Socket reads can occur > thousands of times in an application that does lots of Hive queries, and > their individual workloads are each individually less predictable than simply > getting a connection, which typically happens at most a few times. So you > have a huge probability that a login timeout setting, which seems to usually > receive a reasonable value of 30 seconds if constrained at all, will > occasionally (way too often) be inadequate for a socket read. > # There seems to be no option to set this login timeout (or the actual read > timeout) explicitly as an externalized override setting (but see HIVE-12371). > *Summary:* {\{DriverManager.loginTimeout}} can be innocently set by any JDBC > driver present in the JVM, you can't override it, and it's misused by Hive as > a socket read timeout. There's no way to prevent intermittent read timeouts > in this scenario unless you're lucky enough to find the JDBC driver and > reconfigure its timeout setting to something workable for Hive socket reads. > An easy, crude patch: > modify the first line of {{HiveConnection.setupLoginTimeout()}} from: > {{long timeOut = TimeUnit.SECONDS.toMillis(DriverManager.getLoginTimeout());}} > to: > {{long timeOut = TimeUnit.SECONDS.toMillis(0);}} > This is of course not a robust fix, as server issues during socket reads can > result in a hung client thread. Some other hardcoded value might be more > advisable, as long as it's long enough to prevent spurious read timeouts. > The right approach is to prioritize HIVE-12371 (proposed socket timeout > override setting that doesn't depend on {{DriverManager.loginTimeout}}) and > implement it in all possible versions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-2357) Support connection timeout in hive JDBC
[ https://issues.apache.org/jira/browse/HIVE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-2357. - Resolution: Fixed Fixed by HIVE-12371 > Support connection timeout in hive JDBC > --- > > Key: HIVE-2357 > URL: https://issues.apache.org/jira/browse/HIVE-2357 > Project: Hive > Issue Type: New Feature >Reporter: Vaibhav Aggarwal >Assignee: Vaibhav Aggarwal >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-14517) Hive JDBC driver login timeout used as socket timeout
[ https://issues.apache.org/jira/browse/HIVE-14517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-14517. -- Resolution: Fixed Fixed by HIVE-12371 > Hive JDBC driver login timeout used as socket timeout > - > > Key: HIVE-14517 > URL: https://issues.apache.org/jira/browse/HIVE-14517 > Project: Hive > Issue Type: Bug >Reporter: Mark Kidwell >Priority: Major > > HIVE-5351 added client timeout support by setting the transport socket read > timeout to the JDBC DriverManager login timeout. While useful as a global > network IO timeout, it isn't the expected behavior for this timeout setting. > It also makes it impossible to require logins to complete quickly, for > example, but allow queries to run for longer periods. > Ideally multiple timeouts (connect, login and socket read) would be supported > as in other JDBC drivers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-12371) Adding a timeout connection parameter for JDBC
[ https://issues.apache.org/jira/browse/HIVE-12371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-12371. -- Fix Version/s: 4.0.0 Resolution: Fixed PR merged to master. Thanks for the contribution! > Adding a timeout connection parameter for JDBC > -- > > Key: HIVE-12371 > URL: https://issues.apache.org/jira/browse/HIVE-12371 > Project: Hive > Issue Type: Improvement > Components: JDBC >Reporter: Nemon Lou >Assignee: Xi Chen >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > There are some timeout settings from server side: > HIVE-4766 > HIVE-6679 > Adding a timeout connection parameter for JDBC is useful in some scenario: > 1,beeline (which can not set timeout manually) > 2,customize timeout for different connections (among hive or RDBs,which can > not be done via DriverManager.setLoginTimeout()) > Just like postgresql, > {noformat} > jdbc:postgresql://localhost/test?user=fred&password=secret&ssl=true&connectTimeout=0 > {noformat} > or mysql > {noformat} > jdbc:mysql://xxx.xx.xxx.xxx:3306/database?connectTimeout=6&socketTimeout=6 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285169#comment-17285169 ] Prasanth Jayachandran commented on HIVE-24786: -- [~thejas] [~ngangam] Can you please help with reviewing this PR? > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the a query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24786: - Description: When hiveserver2 is behind multiple proxies there is possibility of "broken pipe", "connect timeout" and "read timeout" exceptions if one of the intermediate proxies or load balancers decided to reset the underlying tcp socket after idle timeout. When the connection is broken and when the query is submitted after idle timeout from beeline (or client) perspective the connection is open but http methods (POST/GET) fails with socket related exceptions. Since these methods are not sent to the server these are safe for client side retries. (was: When hiveserver2 is behind multiple proxies there is possibility of "broken pipe", "connect timeout" and "read timeout" exceptions if one of the intermediate proxies or load balancers decided to reset the underlying tcp socket after idle timeout. When the connection is broken and when the a query is submitted after idle timeout from beeline (or client) perspective the connection is open but http methods (POST/GET) fails with socket related exceptions. Since these methods are not sent to the server these are safe for client side retries. ) > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24786: - Status: Patch Available (was: Open) > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the a query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-24786: > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the a query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
[ https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24501: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > UpdateInputAccessTimeHook should not update stats > - > > Key: HIVE-24501 > URL: https://issues.apache.org/jira/browse/HIVE-24501 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > UpdateInputAccessTimeHook can fail for transactional tables with following > exception. > The hook should skip updating the stats and only update the access time. > {code:java} > ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at > org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) > at > org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) > at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at > org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)Caused by: > MetaException(message:Cannot change stats state for a transactional table > default.test without providing the transactional write state for verification > (new write ID 0, valid write IDs > default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.had
[jira] [Commented] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders
[ https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266271#comment-17266271 ] Prasanth Jayachandran commented on HIVE-24569: -- Left a question about how the Idle purging is triggered (we wanted to avoid files being closed too frequently). Looks good otherwise, +1. Thanks for adding tests for it! > LLAP daemon leaks file descriptors/log4j appenders > -- > > Key: HIVE-24569 > URL: https://issues.apache.org/jira/browse/HIVE-24569 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: llap-appender-gc-roots.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > With HIVE-9756 query logs in LLAP are directed to different files (file per > query) using a Log4j2 routing appender. Without a purge policy in place, > appenders are created dynamically by the routing appender, one for each > query, and remain in memory forever. The dynamic appenders write to files so > each appender holds to a file descriptor. > Further work HIVE-14224 has mitigated the issue by introducing a custom > purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic > appenders (and closes the respective files) when the query is completed > (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion). > > However, in the presence of multiple threads appending to the logs there are > race conditions. In an internal Hive cluster the number of file descriptors > started going up approx one descriptor leaking per query. After some > debugging it turns out that one thread (running the > QueryTracker#handleLogOnQueryCompletion) signals that the query has finished > and thus the purge policy should get rid of the respective appender (and > close the file) while another (Task-Executor-0) attempts to append another > log message for the same query. The initial appender is closed after the > request from the query tracker but a new one is created to accomodate the > message from the task executor and the latter is never removed thus creating > a leak. > Similar leaks have been identified and fixed for HS2 with the most similar > one being that described > [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041]. > > The problem relies on the timing of threads so it may not manifestate in all > versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via > lsof (or other similar command) with the following output: > {noformat} > # 1494391 is the PID of the LLAP daemon process > ls -ltr /proc/1494391/fd > ... > lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567_0008_198.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 980 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121744_fcf37921-4351-4368-95ee-b5be2592d89a-dag_1608659125567_0008_195.log > lrwx-- 1 hive hadoop 64 Dec 24 12
[jira] [Resolved] (HIVE-24514) UpdateMDatabaseURI does not update managed location URI
[ https://issues.apache.org/jira/browse/HIVE-24514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-24514. -- Resolution: Fixed Thanks for the review! Merged to master. > UpdateMDatabaseURI does not update managed location URI > --- > > Key: HIVE-24514 > URL: https://issues.apache.org/jira/browse/HIVE-24514 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > When FS Root is updated using metatool, if the DB has managed location > defined, > updateMDatabaseURI API should update the managed location as well. Currently > it only updates location uri. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24514) UpdateMDatabaseURI does not update managed location URI
[ https://issues.apache.org/jira/browse/HIVE-24514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262611#comment-17262611 ] Prasanth Jayachandran commented on HIVE-24514: -- [~ngangam] can you please take another look? addressed your review comment. > UpdateMDatabaseURI does not update managed location URI > --- > > Key: HIVE-24514 > URL: https://issues.apache.org/jira/browse/HIVE-24514 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When FS Root is updated using metatool, if the DB has managed location > defined, > updateMDatabaseURI API should update the managed location as well. Currently > it only updates location uri. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24514) UpdateMDatabaseURI does not update managed location URI
[ https://issues.apache.org/jira/browse/HIVE-24514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247111#comment-17247111 ] Prasanth Jayachandran commented on HIVE-24514: -- [~ngangam] can you please review this change? [https://github.com/apache/hive/pull/1761/files] > UpdateMDatabaseURI does not update managed location URI > --- > > Key: HIVE-24514 > URL: https://issues.apache.org/jira/browse/HIVE-24514 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When FS Root is updated using metatool, if the DB has managed location > defined, > updateMDatabaseURI API should update the managed location as well. Currently > it only updates location uri. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24514) UpdateMDatabaseURI does not update managed location URI
[ https://issues.apache.org/jira/browse/HIVE-24514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-24514: > UpdateMDatabaseURI does not update managed location URI > --- > > Key: HIVE-24514 > URL: https://issues.apache.org/jira/browse/HIVE-24514 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > When FS Root is updated using metatool, if the DB has managed location > defined, > updateMDatabaseURI API should update the managed location as well. Currently > it only updates location uri. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-24497. -- Fix Version/s: 4.0.0 Resolution: Fixed Merged to master! Thanks for your contribution! > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout in cloud environment > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: hive-24497.01.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
[ https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245807#comment-17245807 ] Prasanth Jayachandran commented on HIVE-24501: -- [~ashutoshc] [~jcamachorodriguez] could someone please help with reviewing this small change? > UpdateInputAccessTimeHook should not update stats > - > > Key: HIVE-24501 > URL: https://issues.apache.org/jira/browse/HIVE-24501 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > UpdateInputAccessTimeHook can fail for transactional tables with following > exception. > The hook should skip updating the stats and only update the access time. > {code:java} > ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at > org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) > at > org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) > at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at > org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)Caused by: > MetaException(message:Cannot change stats state for a transactional table > default.test without providing the transactional write state for verification > (new write ID 0, valid write IDs > default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) >
[jira] [Updated] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
[ https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24501: - Status: Patch Available (was: Open) > UpdateInputAccessTimeHook should not update stats > - > > Key: HIVE-24501 > URL: https://issues.apache.org/jira/browse/HIVE-24501 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > UpdateInputAccessTimeHook can fail for transactional tables with following > exception. > The hook should skip updating the stats and only update the access time. > {code:java} > ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at > org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) > at > org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) > at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at > org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)Caused by: > MetaException(message:Cannot change stats state for a transactional table > default.test without providing the transactional write state for verification > (new write ID 0, valid write IDs > default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.ja
[jira] [Assigned] (HIVE-24501) UpdateInputAccessTimeHook should not update stats
[ https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-24501: > UpdateInputAccessTimeHook should not update stats > - > > Key: HIVE-24501 > URL: https://issues.apache.org/jira/browse/HIVE-24501 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > UpdateInputAccessTimeHook can fail for transactional tables with following > exception. > The hook should skip updating the stats and only update the access time. > {code:java} > ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)ERROR : FAILED: Hive Internal Error: > org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.test without > providing the transactional write state for verification (new write ID 0, > valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at > org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at > org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70) > at > org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) > at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at > org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at > org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at > org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)Caused by: > MetaException(message:Cannot change stats state for a transactional table > default.test without providing the transactional write state for verification > (new write ID 0, valid write IDs > default.test:8:9223372036854775807::1,2,3,4,7; current state > {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state > null) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.java) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at > org.apache.hadoop.hive.metastore.api.ThriftHiveM
[jira] [Resolved] (HIVE-24426) Spark job fails with fixed LlapTaskUmbilicalServer port
[ https://issues.apache.org/jira/browse/HIVE-24426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-24426. -- Fix Version/s: 4.0.0 Resolution: Fixed Merged the PR. Thanks [~ayushtkn] for the contribution! > Spark job fails with fixed LlapTaskUmbilicalServer port > --- > > Key: HIVE-24426 > URL: https://issues.apache.org/jira/browse/HIVE-24426 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > In case of cloud deployments, multiple executors are launched on name node, > and incase a fixed umbilical port is specified using > {{spark.hadoop.hive.llap.daemon.umbilical.port=30006}} > The job fails with BindException. > {noformat} > Caused by: java.net.BindException: Problem binding to [0.0.0.0:30006] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:840) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:741) > at org.apache.hadoop.ipc.Server.bind(Server.java:605) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1169) > at org.apache.hadoop.ipc.Server.(Server.java:3032) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1039) > at > org.apache.hadoop.ipc.WritableRpcEngine$Server.(WritableRpcEngine.java:438) > at > org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:332) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848) > at > org.apache.hadoop.hive.llap.tezplugins.helpers.LlapTaskUmbilicalServer.(LlapTaskUmbilicalServer.java:67) > at > org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient$SharedUmbilicalServer.(LlapTaskUmbilicalExternalClient.java:122) > ... 26 more > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85) > at org.apache.hadoop.ipc.Server.bind(Server.java:588) > ... 34 more{noformat} > To counter this, better to provide a range of ports -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24188) CTLT from MM to External or External to MM are failing with hive.strict.managed.tables & hive.create.as.acid
[ https://issues.apache.org/jira/browse/HIVE-24188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-24188. -- Fix Version/s: 4.0.0 Resolution: Fixed Merged to master. Thanks [~nareshpr] for the contribution! > CTLT from MM to External or External to MM are failing with > hive.strict.managed.tables & hive.create.as.acid > > > Key: HIVE-24188 > URL: https://issues.apache.org/jira/browse/HIVE-24188 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Repro steps > > {code:java} > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > create table test_mm(age int, name string) partitioned by(dept string) stored > as orc tblproperties('transactional'='true', > 'transactional_properties'='default'); > create external table test_external like test_mm LOCATION > '${system:test.tmp.dir}/create_like_mm_to_external'; > {code} > Fails with below exception > {code:java} > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.DDLTask. > MetaException(message:default.test_external cannot be declared transactional > because it's an external table) (state=08S01,code=1){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-22290) ObjectStore.cleanWriteNotificationEvents and ObjectStore.cleanupEvents OutOfMemory on large number of pending events
[ https://issues.apache.org/jira/browse/HIVE-22290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-22290. -- Resolution: Fixed Merged to master. Thanks for the contribution [~nareshpr] ! > ObjectStore.cleanWriteNotificationEvents and ObjectStore.cleanupEvents > OutOfMemory on large number of pending events > > > Key: HIVE-22290 > URL: https://issues.apache.org/jira/browse/HIVE-22290 > Project: Hive > Issue Type: Bug > Components: HCatalog, repl >Affects Versions: 4.0.0 >Reporter: Thomas Prelle >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > As in [https://jira.apache.org/jira/browse/HIVE-19430] if there are large > number of events that haven't been cleaned up for some reason, then > ObjectStore.cleanWriteNotificationEvents() and ObjectStore.cleanupEvents can > run out of memory while it loads all the events to be deleted. > It should fetch events in batches. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition
[ https://issues.apache.org/jira/browse/HIVE-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185392#comment-17185392 ] Prasanth Jayachandran commented on HIVE-24020: -- Merged to master. Thanks [~vpnvishv] ! > Automatic Compaction not working in existing partitions for Streaming Ingest > with Dynamic Partition > --- > > Key: HIVE-24020 > URL: https://issues.apache.org/jira/browse/HIVE-24020 > Project: Hive > Issue Type: Bug > Components: Streaming, Transactions >Affects Versions: 4.0.0, 3.1.2 >Reporter: Vipin Vishvkarma >Assignee: Vipin Vishvkarma >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > This issue happens when we try to do streaming ingest with dynamic partition > on already existing partitions. I checked in the code, we have following > check in the AbstractRecordWriter. > > {code:java} > PartitionInfo partitionInfo = > conn.createPartitionIfNotExists(partitionValues); > // collect the newly added partitions. connection.commitTransaction() will > report the dynamically added > // partitions to TxnHandler > if (!partitionInfo.isExists()) { > addedPartitions.add(partitionInfo.getName()); > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("Partition {} already exists for table {}", > partitionInfo.getName(), fullyQualifiedTableName); > } > } > {code} > Above *addedPartitions* is passed to *addDynamicPartitions* during > TransactionBatch commit. So in case of already existing partitions, > *addedPartitions* will be empty and *addDynamicPartitions* **will not move > entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in > Initiator not able to trigger auto compaction. > Another issue which has been observed is, we are not clearing > *addedPartitions* on writer close, which results in information flowing > across transactions. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24061) Improve llap task scheduling for better cache hit rate
[ https://issues.apache.org/jira/browse/HIVE-24061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-24061. -- Fix Version/s: 4.0.0 Resolution: Fixed Merged to master. Thanks [~rajesh.balamohan] ! > Improve llap task scheduling for better cache hit rate > --- > > Key: HIVE-24061 > URL: https://issues.apache.org/jira/browse/HIVE-24061 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Labels: perfomance, pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > TaskInfo is initialized with the "requestTime and locality delay". When lots > of vertices are in the same level, "taskInfo" details would be available > upfront. By the time, it gets to scheduling, "requestTime + localityDelay" > won't be higher than current time. Due to this, it misses scheduling delay > details and ends up choosing random node. This ends up missing cache hits and > reads data from remote storage. > E.g Observed this pattern in Q75 of tpcds. > Related lines of interest in scheduler: > [https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java > > |https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java] > {code:java} >boolean shouldDelayForLocality = > request.shouldDelayForLocality(schedulerAttemptTime); > .. > .. > boolean shouldDelayForLocality(long schedulerAttemptTime) { > return localityDelayTimeout > schedulerAttemptTime; > } > {code} > > Ideally, "localityDelayTimeout" should be adjusted based on it's first > scheduling opportunity. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition
[ https://issues.apache.org/jira/browse/HIVE-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-24020. -- Fix Version/s: 4.0.0 Resolution: Fixed > Automatic Compaction not working in existing partitions for Streaming Ingest > with Dynamic Partition > --- > > Key: HIVE-24020 > URL: https://issues.apache.org/jira/browse/HIVE-24020 > Project: Hive > Issue Type: Bug > Components: Streaming, Transactions >Affects Versions: 4.0.0, 3.1.2 >Reporter: Vipin Vishvkarma >Assignee: Vipin Vishvkarma >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > This issue happens when we try to do streaming ingest with dynamic partition > on already existing partitions. I checked in the code, we have following > check in the AbstractRecordWriter. > > {code:java} > PartitionInfo partitionInfo = > conn.createPartitionIfNotExists(partitionValues); > // collect the newly added partitions. connection.commitTransaction() will > report the dynamically added > // partitions to TxnHandler > if (!partitionInfo.isExists()) { > addedPartitions.add(partitionInfo.getName()); > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("Partition {} already exists for table {}", > partitionInfo.getName(), fullyQualifiedTableName); > } > } > {code} > Above *addedPartitions* is passed to *addDynamicPartitions* during > TransactionBatch commit. So in case of already existing partitions, > *addedPartitions* will be empty and *addDynamicPartitions* **will not move > entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in > Initiator not able to trigger auto compaction. > Another issue which has been observed is, we are not clearing > *addedPartitions* on writer close, which results in information flowing > across transactions. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24068: - Fix Version/s: 4.0.0 > Add re-execution plugin for handling DAG submission and unmanaged AM failures > - > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > DAG submission failure can also happen in environments where AM container > died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. There are retries at getSession and submitDAG level > individually but some submitDAG failure has to retry getSession as well as AM > could be unreachable, this can be handled in re-execution plugin. > There is already AM loss retry execution plugin but it only handles managed > AMs. It can be extended to handle unmanaged AMs as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-24068. -- Resolution: Fixed PR merged to master. Thanks [~kgyrtkirk] for the review! > Add re-execution plugin for handling DAG submission and unmanaged AM failures > - > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > DAG submission failure can also happen in environments where AM container > died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. There are retries at getSession and submitDAG level > individually but some submitDAG failure has to retry getSession as well as AM > could be unreachable, this can be handled in re-execution plugin. > There is already AM loss retry execution plugin but it only handles managed > AMs. It can be extended to handle unmanaged AMs as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24068: - Description: DAG submission failure can also happen in environments where AM container died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started execution yet. There are retries at getSession and submitDAG level individually but some submitDAG failure has to retry getSession as well as AM could be unreachable, this can be handled in re-execution plugin. There is already AM loss retry execution plugin but it only handles managed AMs. It can be extended to handle unmanaged AMs as well. was:DAG submission failure can also happen in environments where AM container died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started execution yet. There are retries at getSession and submitDAG level individually but some submitDAG failure has to retry getSession as well as AM could be unreachable, this can be handled in re-execution plugin. > Add re-execution plugin for handling DAG submission and unmanaged AM failures > - > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > DAG submission failure can also happen in environments where AM container > died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. There are retries at getSession and submitDAG level > individually but some submitDAG failure has to retry getSession as well as AM > could be unreachable, this can be handled in re-execution plugin. > There is already AM loss retry execution plugin but it only handles managed > AMs. It can be extended to handle unmanaged AMs as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24068: - Summary: Add re-execution plugin for handling DAG submission and unmanaged AM failures (was: Add re-execution plugin for handling DAG submission failures) > Add re-execution plugin for handling DAG submission and unmanaged AM failures > - > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > DAG submission failure can also happen in environments where AM container > died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. There are retries at getSession and submitDAG level > individually but some submitDAG failure has to retry getSession as well as AM > could be unreachable, this can be handled in re-execution plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission failures
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24068: - Description: DAG submission failure can also happen in environments where AM container died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started execution yet. There are retries at getSession and submitDAG level individually but some submitDAG failure has to retry getSession as well as AM could be unreachable, this can be handled in re-execution plugin. (was: ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG submission failure can also happen in environments where AM container died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started execution yet.) > Add re-execution plugin for handling DAG submission failures > > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > DAG submission failure can also happen in environments where AM container > died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. There are retries at getSession and submitDAG level > individually but some submitDAG failure has to retry getSession as well as AM > could be unreachable, this can be handled in re-execution plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission failures
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-24068: - Summary: Add re-execution plugin for handling DAG submission failures (was: ReExecutionOverlayPlugin can handle DAG submission failures as well) > Add re-execution plugin for handling DAG submission failures > > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG > submission failure can also happen in environments where AM container died > causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24068) ReExecutionOverlayPlugin can handle DAG submission failures as well
[ https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-24068: > ReExecutionOverlayPlugin can handle DAG submission failures as well > --- > > Key: HIVE-24068 > URL: https://issues.apache.org/jira/browse/HIVE-24068 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG > submission failure can also happen in environments where AM container died > causing DNS issues. DAG submissions are safe to retry as the DAG hasn't > started execution yet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23776) Retire quickstats autocollection
[ https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148114#comment-17148114 ] Prasanth Jayachandran commented on HIVE-23776: -- [~pvary] I understand the performance concerns that the basicstats brings esp. on the cloud environments. But I would like to discuss the alternatives instead of just removing it as there are certainly dependencies on file sizes and number of files which cannot be removed. The rawDataSize is good but only represents the in-memory representation which is certainly good for most optimizations but not for all.. The totalFileSize vs rawDataSize gives approximately the compression ratio which still is beneficial for some optimizations (totalFileSize can be used for estimating the splits, estimating the number of containers/nodes required without running the scans etc.). It is better to pay the cost of it once upfront during ETL when compared to every time when we run a query or desc formatted. If the basicstats are published as counters from the tasks then tez AM can aggregate it at DAG level (https://github.com/apache/hive/blob/6440d93981e6d6aab59ecf2e77ffa45cd84d47de/ql/src/test/results/clientpositive/llap/tez_compile_counters.q.out#L1524-L1530) which HS2 can use to store it into the metastore without ever doing file listing. This is one such approach and this can be abstracted out if this required for other engines. We could explore alternative approaches as well. I do not think it is good idea to remove it just because it is slow on one cloud filesystem. > Retire quickstats autocollection > > > Key: HIVE-23776 > URL: https://issues.apache.org/jira/browse/HIVE-23776 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > this is about: > * num files > * datasize (sum of filesizes) > * num erasure coded files > right now these are scanned during every BasicStatsTask execution - which > means some filesystem reads/etc - for small inserts these are visible in case > the fs is a bit slower (s3 and friends) > I don't think they are really in use...we rely more on columnstats which are > more accurate ; and because of the datasize in this case is for "offline" > (ondisk) - while we should be insted calculate with "online" sizes... > proposal: > * remove collection and storage of this data > * collect it on the fly during "desc formatted" statements to provide them > for informational purposes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23776) Retire quickstats autocollection
[ https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148054#comment-17148054 ] Prasanth Jayachandran commented on HIVE-23776: -- Yes. I know the quickstats part. The workload management triggers can define *any* hive counters that includes the following counters newly added. [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CompileTimeCounters.java] If text files land in some staging table and if there are workload management trigger/guardrails that says "if query scans > 10TB kill query" then removing these quick stats will break the functionality. These staging tables are not going to get analyzed in some cases for it to collect statistics. Just searching the hive code base, unit testing will alone not be sufficient to know if customers are using it or not. If there is a specific need to remove this put it behind a config, deprecate and remove in iterations before removing it in one go. > Retire quickstats autocollection > > > Key: HIVE-23776 > URL: https://issues.apache.org/jira/browse/HIVE-23776 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > this is about: > * num files > * datasize (sum of filesizes) > * num erasure coded files > right now these are scanned during every BasicStatsTask execution - which > means some filesystem reads/etc - for small inserts these are visible in case > the fs is a bit slower (s3 and friends) > I don't think they are really in use...we rely more on columnstats which are > more accurate ; and because of the datasize in this case is for "offline" > (ondisk) - while we should be insted calculate with "online" sizes... > proposal: > * remove collection and storage of this data > * collect it on the fly during "desc formatted" statements to provide them > for informational purposes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23776) Retire quickstats autocollection
[ https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147939#comment-17147939 ] Prasanth Jayachandran commented on HIVE-23776: -- {quote}I don't think they are really in use... {quote} It is used in many places. There is stats annotation fallback which relies on this. There are compile time counters added for this which can be used for workload management guardrails. There are some existing pre-hooks which relies on this or could be relying on this. I am -1 on removing this without having substantial evidence that this is not used. > Retire quickstats autocollection > > > Key: HIVE-23776 > URL: https://issues.apache.org/jira/browse/HIVE-23776 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > this is about: > * num files > * datasize (sum of filesizes) > * num erasure coded files > right now these are scanned during every BasicStatsTask execution - which > means some filesystem reads/etc - for small inserts these are visible in case > the fs is a bit slower (s3 and friends) > I don't think they are really in use...we rely more on columnstats which are > more accurate ; and because of the datasize in this case is for "offline" > (ondisk) - while we should be insted calculate with "online" sizes... > proposal: > * remove collection and storage of this data > * collect it on the fly during "desc formatted" statements to provide them > for informational purposes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142241#comment-17142241 ] Prasanth Jayachandran commented on HIVE-23737: -- cc/ [~rajesh.balamohan] > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22687) Query hangs indefinitely if LLAP daemon registers after the query is submitted
[ https://issues.apache.org/jira/browse/HIVE-22687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134956#comment-17134956 ] Prasanth Jayachandran commented on HIVE-22687: -- Since it is not a very common scenario, I guess it is ok to commit the patch. We can revisit in a follow up if we observe it under different scenarios. > Query hangs indefinitely if LLAP daemon registers after the query is submitted > -- > > Key: HIVE-22687 > URL: https://issues.apache.org/jira/browse/HIVE-22687 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.1.0 >Reporter: Himanshu Mishra >Assignee: Himanshu Mishra >Priority: Major > Attachments: HIVE-22687.01.patch, HIVE-22687.02.patch > > > If a query is submitted and no LLAP daemon is running, it waits for 1 minute > and times out with error {{SERVICE_UNAVAILABLE}}. > While waiting, if a new LLAP Daemon starts, then the timeout is cancelled, > and the tasks do not get scheduled as well. As a result, the query hangs > indefinitely. > This is due to the race condition where LLAP Daemon first registers the LLAP > instance at {{.../workers/worker-}}, and afterwards registers > {{.../workers/slot-}}. In the gap between two, Tez AM gets notified of > worker zk node and while processing it checks if slot zk node is present, if > not it rejects the LLAP Daemon. Error in Tez AM is: > {code:java} > [INFO] [LlapScheduler] |impl.LlapZookeeperRegistryImpl|: Unknown slot for > 8ebfdc45-0382-4757-9416-52898885af90{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22687) Query hangs indefinitely if LLAP daemon registers after the query is submitted
[ https://issues.apache.org/jira/browse/HIVE-22687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134947#comment-17134947 ] Prasanth Jayachandran commented on HIVE-22687: -- The issue I saw was a corner case, with just only one node. With >1 node I didn’t see this issue. > Query hangs indefinitely if LLAP daemon registers after the query is submitted > -- > > Key: HIVE-22687 > URL: https://issues.apache.org/jira/browse/HIVE-22687 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.1.0 >Reporter: Himanshu Mishra >Assignee: Himanshu Mishra >Priority: Major > Attachments: HIVE-22687.01.patch, HIVE-22687.02.patch > > > If a query is submitted and no LLAP daemon is running, it waits for 1 minute > and times out with error {{SERVICE_UNAVAILABLE}}. > While waiting, if a new LLAP Daemon starts, then the timeout is cancelled, > and the tasks do not get scheduled as well. As a result, the query hangs > indefinitely. > This is due to the race condition where LLAP Daemon first registers the LLAP > instance at {{.../workers/worker-}}, and afterwards registers > {{.../workers/slot-}}. In the gap between two, Tez AM gets notified of > worker zk node and while processing it checks if slot zk node is present, if > not it rejects the LLAP Daemon. Error in Tez AM is: > {code:java} > [INFO] [LlapScheduler] |impl.LlapZookeeperRegistryImpl|: Unknown slot for > 8ebfdc45-0382-4757-9416-52898885af90{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable
[ https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23582: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Merged to master. Thanks Gopal for the review! > LLAP: Make SplitLocationProvider impl pluggable > --- > > Key: HIVE-23582 > URL: https://issues.apache.org/jira/browse/HIVE-23582 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23582.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > LLAP uses HostAffinitySplitLocationProvider implementation by default. For > non zookeeper based environments, a different split location provider may be > used. To facilitate that make the SplitLocationProvider implementation class > a pluggable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-19926) Remove deprecated hcatalog streaming
[ https://issues.apache.org/jira/browse/HIVE-19926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-19926. -- Fix Version/s: 4.0.0 Resolution: Fixed Committed to master. Thanks Ashutosh for reviving this patch and uploading it for tests. Thanks Zoltan for ptest run and review. > Remove deprecated hcatalog streaming > > > Key: HIVE-19926 > URL: https://issues.apache.org/jira/browse/HIVE-19926 > Project: Hive > Issue Type: Improvement > Components: Streaming >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-19926.1.patch, HIVE-19926.2.patch, > HIVE-19926.3.patch, HIVE-19926.4.patch, HIVE-19926.5.patch, HIVE-19926.6.patch > > Time Spent: 20m > Remaining Estimate: 0h > > hcatalog streaming is deprecated in 3.0.0. We should remove it in 4.0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21624) LLAP: Cpu metrics at thread level is broken
[ https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-21624: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Committed to master. Thanks Ashutosh for the review! > LLAP: Cpu metrics at thread level is broken > --- > > Key: HIVE-21624 > URL: https://issues.apache.org/jira/browse/HIVE-21624 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 4.0.0, 3.2.0 >Reporter: Nita Dembla >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21624.1.patch, HIVE-21624.2.patch, > HIVE-21624.3.patch, HIVE-21624.4.patch > > Time Spent: 10m > Remaining Estimate: 0h > > ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu > metrics when available. At some point, the thread name which the metrics > publisher looks for has changed causing no metrics to be published for these > counters. > The above counters looks for thread with name starting with > "ContainerExecutor" but the llap task executor thread got changed to > "Task-Executor" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21624) LLAP: Cpu metrics at thread level is broken
[ https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-21624: - Attachment: HIVE-21624.4.patch > LLAP: Cpu metrics at thread level is broken > --- > > Key: HIVE-21624 > URL: https://issues.apache.org/jira/browse/HIVE-21624 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 4.0.0, 3.2.0 >Reporter: Nita Dembla >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21624.1.patch, HIVE-21624.2.patch, > HIVE-21624.3.patch, HIVE-21624.4.patch > > Time Spent: 10m > Remaining Estimate: 0h > > ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu > metrics when available. At some point, the thread name which the metrics > publisher looks for has changed causing no metrics to be published for these > counters. > The above counters looks for thread with name starting with > "ContainerExecutor" but the llap task executor thread got changed to > "Task-Executor" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable
[ https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120032#comment-17120032 ] Prasanth Jayachandran commented on HIVE-23582: -- [~hashutosh] [~gopalv] could you please help review this change? > LLAP: Make SplitLocationProvider impl pluggable > --- > > Key: HIVE-23582 > URL: https://issues.apache.org/jira/browse/HIVE-23582 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23582.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > LLAP uses HostAffinitySplitLocationProvider implementation by default. For > non zookeeper based environments, a different split location provider may be > used. To facilitate that make the SplitLocationProvider implementation class > a pluggable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable
[ https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23582: - Status: Patch Available (was: Open) > LLAP: Make SplitLocationProvider impl pluggable > --- > > Key: HIVE-23582 > URL: https://issues.apache.org/jira/browse/HIVE-23582 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23582.1.patch > > > LLAP uses HostAffinitySplitLocationProvider implementation by default. For > non zookeeper based environments, a different split location provider may be > used. To facilitate that make the SplitLocationProvider implementation class > a pluggable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable
[ https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23582: - Attachment: HIVE-23582.1.patch > LLAP: Make SplitLocationProvider impl pluggable > --- > > Key: HIVE-23582 > URL: https://issues.apache.org/jira/browse/HIVE-23582 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23582.1.patch > > > LLAP uses HostAffinitySplitLocationProvider implementation by default. For > non zookeeper based environments, a different split location provider may be > used. To facilitate that make the SplitLocationProvider implementation class > a pluggable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable
[ https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-23582: > LLAP: Make SplitLocationProvider impl pluggable > --- > > Key: HIVE-23582 > URL: https://issues.apache.org/jira/browse/HIVE-23582 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > LLAP uses HostAffinitySplitLocationProvider implementation by default. For > non zookeeper based environments, a different split location provider may be > used. To facilitate that make the SplitLocationProvider implementation class > a pluggable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23068) Error when submitting fragment to LLAP via external client: IllegalStateException: Only a single registration allowed per entity
[ https://issues.apache.org/jira/browse/HIVE-23068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119742#comment-17119742 ] Prasanth Jayachandran commented on HIVE-23068: -- lgtm, +1. {quote}(for example speculative execution of a query fragment). {quote} Can external clients with speculative execution generate the same fragment id? Not sure how external clients generates the full id but I would expect it to have different attempt numbers atleast just so that the different attempts does not step on each other during speculative execution. > Error when submitting fragment to LLAP via external client: > IllegalStateException: Only a single registration allowed per entity > > > Key: HIVE-23068 > URL: https://issues.apache.org/jira/browse/HIVE-23068 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-23068.1.patch > > > LLAP external client (via hive-warehouse-connector) somehow seems to be > sending duplicate submissions for the same fragment/attempt. When the 2nd > request is sent this results in the following error: > {noformat} > 2020-03-17T06:49:11,239 WARN [IPC Server handler 2 on 15001 ()] > org.apache.hadoop.ipc.Server: IPC Server handler 2 on 15001, call Call#75 > Retry#0 > org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork from > 19.40.252.114:33906 > java.lang.IllegalStateException: Only a single registration allowed per > entity. Duplicate for > TaskWrapper{task=attempt_1854104024183112753_6052_0_00_000128_1, > inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, > canFinish=true, canFinish(in queue)=true, isGuaranteed=false, > firstAttemptStartTime=1584442003327, dagStartTime=1584442003327, > withinDagPriority=0, vertexParallelism= 2132, selfAndUpstreamParallelism= > 2132, selfAndUpstreamComplete= 0} > at > org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.registerForUpdates(QueryInfo.java:233) > ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3] > at > org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.registerForFinishableStateUpdates(QueryInfo.java:205) > ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3] > at > org.apache.hadoop.hive.llap.daemon.impl.QueryFragmentInfo.registerForFinishableStateUpdates(QueryFragmentInfo.java:160) > ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3] > at > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.maybeRegisterForFinishedStateNotifications(TaskExecutorService.java:1167) > ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3] > at > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.schedule(TaskExecutorService.java:564) > ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3] > at > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.schedule(TaskExecutorService.java:93) > ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3] > at > org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:292) > ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:610) > ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:122) > ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3] > at > org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:22695) > ~[hive-exec-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.32-1] > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?] > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?] > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?] > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_191] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_191] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?] > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?] > {noformat} > I think the issue here is that this error o
[jira] [Commented] (HIVE-21624) LLAP: Cpu metrics at thread level is broken
[ https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113529#comment-17113529 ] Prasanth Jayachandran commented on HIVE-21624: -- This feature still needs JDK support for thread CPU metrics. > LLAP: Cpu metrics at thread level is broken > --- > > Key: HIVE-21624 > URL: https://issues.apache.org/jira/browse/HIVE-21624 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 4.0.0, 3.2.0 >Reporter: Nita Dembla >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21624.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu > metrics when available. At some point, the thread name which the metrics > publisher looks for has changed causing no metrics to be published for these > counters. > The above counters looks for thread with name starting with > "ContainerExecutor" but the llap task executor thread got changed to > "Task-Executor" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21624) LLAP: Cpu metrics at thread level is broken
[ https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-21624: - Attachment: HIVE-21624.1.patch > LLAP: Cpu metrics at thread level is broken > --- > > Key: HIVE-21624 > URL: https://issues.apache.org/jira/browse/HIVE-21624 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 4.0.0, 3.2.0 >Reporter: Nita Dembla >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21624.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu > metrics when available. At some point, the thread name which the metrics > publisher looks for has changed causing no metrics to be published for these > counters. > The above counters looks for thread with name starting with > "ContainerExecutor" but the llap task executor thread got changed to > "Task-Executor" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21624) LLAP: Cpu metrics at thread level is broken
[ https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-21624: - Status: Patch Available (was: Open) > LLAP: Cpu metrics at thread level is broken > --- > > Key: HIVE-21624 > URL: https://issues.apache.org/jira/browse/HIVE-21624 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 4.0.0, 3.2.0 >Reporter: Nita Dembla >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21624.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu > metrics when available. At some point, the thread name which the metrics > publisher looks for has changed causing no metrics to be published for these > counters. > The above counters looks for thread with name starting with > "ContainerExecutor" but the llap task executor thread got changed to > "Task-Executor" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23477) LLAP : mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23477: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to master. Thanks Gopal for the review! > LLAP : mmap allocation interruptions fails to notify other threads > -- > > Key: HIVE-23477 > URL: https://issues.apache.org/jira/browse/HIVE-23477 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23477.1.patch, HIVE-23477.2.patch, > HIVE-23477.3.patch > > Time Spent: 10m > Remaining Estimate: 0h > > BuddyAllocator always uses lazy allocation if mmap is enabled. If query > fragment is interrupted at the time of arena allocation, > ClosedByInterruptionException is thrown. This exception artificially triggers > allocator OutOfMemoryError and fails to notify other threads waiting to > allocate arenas. > {code:java} > 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed > trying to allocate memory mapped arena > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) > at > org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) > at > org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.r
[jira] [Commented] (HIVE-23500) [Kubernetes] Use Extend NodeId for LLAP registration
[ https://issues.apache.org/jira/browse/HIVE-23500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110955#comment-17110955 ] Prasanth Jayachandran commented on HIVE-23500: -- HIVE-23466 is the same? > [Kubernetes] Use Extend NodeId for LLAP registration > > > Key: HIVE-23500 > URL: https://issues.apache.org/jira/browse/HIVE-23500 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > In kubernetes environment where pods can have same host name and port, there > can be situations where node trackers could be retaining old instance of the > pod in its cache. In case of Hive LLAP, where the llap tez task scheduler > maintains the membership of nodes based on zookeeper registry events there > can be cases where NODE_ADDED followed by NODE_REMOVED event could end up > removing the node/host from node trackers because of stable hostname and > service port. The NODE_REMOVED event in this case is old stale event of the > already dead pod but ZK will send only after session timeout (in case of > non-graceful shutdown). If this sequence of events happen, a node/host is > completely lost form the schedulers perspective. > To support this scenario, tez can extend yarn's NodeId to include > uniqueIdentifier. Llap task scheduler can construct the container object with > this new NodeId that includes uniqueIdentifier as well so that stale events > like above will only remove the host/node that matches the old > uniqueIdentifier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23466) ZK registry base should remove only specific instance instead of host
[ https://issues.apache.org/jira/browse/HIVE-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23466: - Attachment: HIVE-23466.1.patch > ZK registry base should remove only specific instance instead of host > - > > Key: HIVE-23466 > URL: https://issues.apache.org/jira/browse/HIVE-23466 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23466.1.patch > > > When ZKRegistryBase detects new ZK nodes it maintains path based cache and > host based cache. The host based cached already handles multiple instances > running in same host. But even if single instance is removed all instances > belonging to the host are removed. > Another issue is that, if single host has multiple instances it returns a Set > with no ordering. Ideally, we want the newest instance to be top of the set > (use TreeSet maybe?). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23466) ZK registry base should remove only specific instance instead of host
[ https://issues.apache.org/jira/browse/HIVE-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110708#comment-17110708 ] Prasanth Jayachandran commented on HIVE-23466: -- This patch requires TEZ-4179 and new tez release to make use of ExtendedNodeId API. > ZK registry base should remove only specific instance instead of host > - > > Key: HIVE-23466 > URL: https://issues.apache.org/jira/browse/HIVE-23466 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23466.1.patch > > > When ZKRegistryBase detects new ZK nodes it maintains path based cache and > host based cache. The host based cached already handles multiple instances > running in same host. But even if single instance is removed all instances > belonging to the host are removed. > Another issue is that, if single host has multiple instances it returns a Set > with no ordering. Ideally, we want the newest instance to be top of the set > (use TreeSet maybe?). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23477) LLAP : mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23477: - Attachment: HIVE-23477.3.patch > LLAP : mmap allocation interruptions fails to notify other threads > -- > > Key: HIVE-23477 > URL: https://issues.apache.org/jira/browse/HIVE-23477 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23477.1.patch, HIVE-23477.2.patch, > HIVE-23477.3.patch > > Time Spent: 10m > Remaining Estimate: 0h > > BuddyAllocator always uses lazy allocation if mmap is enabled. If query > fragment is interrupted at the time of arena allocation, > ClosedByInterruptionException is thrown. This exception artificially triggers > allocator OutOfMemoryError and fails to notify other threads waiting to > allocate arenas. > {code:java} > 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed > trying to allocate memory mapped arena > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) > at > org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) > at > org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.
[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23443: - Attachment: HIVE-23443.3.patch > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch, > HIVE-23443.3.patch > > Time Spent: 40m > Remaining Estimate: 0h > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108798#comment-17108798 ] Prasanth Jayachandran commented on HIVE-23443: -- [~pgaref] non-finishable to finishable is not a problem. But there is concern in the line that you pinged in PR that double/multiple addition could be possible with pre-emption queue and I was able to unit test it. Could you look at the diff in PR again? > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch > > Time Spent: 40m > Remaining Estimate: 0h > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23477: - Description: BuddyAllocator always uses lazy allocation if mmap is enabled. If query fragment is interrupted at the time of arena allocation ClosedByInterruptionException is thrown. This exception artificially triggers allocator OutOfMemoryError and fails to notify other threads waiting to allocate arenas. {code:java} 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed trying to allocate memory mapped arena java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingT
[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23477: - Description: BuddyAllocator always uses lazy allocation if mmap is enabled. If query fragment is interrupted at the time of arena allocation, ClosedByInterruptionException is thrown. This exception artificially triggers allocator OutOfMemoryError and fails to notify other threads waiting to allocate arenas. {code:java} 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed trying to allocate memory mapped arena java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) at org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) at org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) at org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecording
[jira] [Commented] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108776#comment-17108776 ] Prasanth Jayachandran commented on HIVE-23477: -- [~ashutoshc] / [~gopalv] can you please help review this change? > [LLAP] mmap allocation interruptions fails to notify other threads > -- > > Key: HIVE-23477 > URL: https://issues.apache.org/jira/browse/HIVE-23477 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23477.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > BuddyAllocator always uses lazy allocation is mmap is enabled. If query > fragment is interrupted at the time of arena allocation > ClosedByInterruptionException is thrown. This exception artificially triggers > allocator OutOfMemoryError and fails to notify other threads waiting to > allocate arenas. > {code:java} > 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed > trying to allocate memory mapped arena > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) > at > org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) > at > org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23477: - Attachment: (was: HIVE-23476.1.patch) > [LLAP] mmap allocation interruptions fails to notify other threads > -- > > Key: HIVE-23477 > URL: https://issues.apache.org/jira/browse/HIVE-23477 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23477.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > BuddyAllocator always uses lazy allocation is mmap is enabled. If query > fragment is interrupted at the time of arena allocation > ClosedByInterruptionException is thrown. This exception artificially triggers > allocator OutOfMemoryError and fails to notify other threads waiting to > allocate arenas. > {code:java} > 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed > trying to allocate memory mapped arena > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) > at > org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) > at > org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRun
[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23477: - Attachment: HIVE-23477.1.patch > [LLAP] mmap allocation interruptions fails to notify other threads > -- > > Key: HIVE-23477 > URL: https://issues.apache.org/jira/browse/HIVE-23477 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23477.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > BuddyAllocator always uses lazy allocation is mmap is enabled. If query > fragment is interrupted at the time of arena allocation > ClosedByInterruptionException is thrown. This exception artificially triggers > allocator OutOfMemoryError and fails to notify other threads waiting to > allocate arenas. > {code:java} > 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed > trying to allocate memory mapped arena > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) > at > org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) > at > org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callabl
[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23477: - Status: Patch Available (was: Open) > [LLAP] mmap allocation interruptions fails to notify other threads > -- > > Key: HIVE-23477 > URL: https://issues.apache.org/jira/browse/HIVE-23477 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23476.1.patch > > > BuddyAllocator always uses lazy allocation is mmap is enabled. If query > fragment is interrupted at the time of arena allocation > ClosedByInterruptionException is thrown. This exception artificially triggers > allocator OutOfMemoryError and fails to notify other threads waiting to > allocate arenas. > {code:java} > 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed > trying to allocate memory mapped arena > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) > at > org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) > at > org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2C
[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23477: - Attachment: HIVE-23476.1.patch > [LLAP] mmap allocation interruptions fails to notify other threads > -- > > Key: HIVE-23477 > URL: https://issues.apache.org/jira/browse/HIVE-23477 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23476.1.patch > > > BuddyAllocator always uses lazy allocation is mmap is enabled. If query > fragment is interrupted at the time of arena allocation > ClosedByInterruptionException is thrown. This exception artificially triggers > allocator OutOfMemoryError and fails to notify other threads waiting to > allocate arenas. > {code:java} > 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed > trying to allocate memory mapped arena > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) > at > org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) > at > org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callabl
[jira] [Commented] (HIVE-23476) [LLAP] Preallocate arenas for mmap case as well
[ https://issues.apache.org/jira/browse/HIVE-23476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108739#comment-17108739 ] Prasanth Jayachandran commented on HIVE-23476: -- [~hashutosh]/[~gopalv] can you please review the change? > [LLAP] Preallocate arenas for mmap case as well > --- > > Key: HIVE-23476 > URL: https://issues.apache.org/jira/browse/HIVE-23476 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23476.1.patch > > > BuddyAllocator pre-allocation of arenas does not happen for mmap cache case. > Since we are not filling up the mmap'ed buffers the upfront allocations in > constructor is cheap. This can avoid lock free allocation of arenas later in > the code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23476) [LLAP] Preallocate arenas for mmap case as well
[ https://issues.apache.org/jira/browse/HIVE-23476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23476: - Status: Patch Available (was: Open) > [LLAP] Preallocate arenas for mmap case as well > --- > > Key: HIVE-23476 > URL: https://issues.apache.org/jira/browse/HIVE-23476 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23476.1.patch > > > BuddyAllocator pre-allocation of arenas does not happen for mmap cache case. > Since we are not filling up the mmap'ed buffers the upfront allocations in > constructor is cheap. This can avoid lock free allocation of arenas later in > the code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23476) [LLAP] Preallocate arenas for mmap case as well
[ https://issues.apache.org/jira/browse/HIVE-23476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23476: - Attachment: HIVE-23476.1.patch > [LLAP] Preallocate arenas for mmap case as well > --- > > Key: HIVE-23476 > URL: https://issues.apache.org/jira/browse/HIVE-23476 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23476.1.patch > > > BuddyAllocator pre-allocation of arenas does not happen for mmap cache case. > Since we are not filling up the mmap'ed buffers the upfront allocations in > constructor is cheap. This can avoid lock free allocation of arenas later in > the code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads
[ https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-23477: > [LLAP] mmap allocation interruptions fails to notify other threads > -- > > Key: HIVE-23477 > URL: https://issues.apache.org/jira/browse/HIVE-23477 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > BuddyAllocator always uses lazy allocation is mmap is enabled. If query > fragment is interrupted at the time of arena allocation > ClosedByInterruptionException is thrown. This exception artificially triggers > allocator OutOfMemoryError and fails to notify other threads waiting to > allocate arenas. > {code:java} > 2020-05-15 00:03:23.254 WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed > trying to allocate memory mapped arena > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740) > at > org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216) > at > org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160) > at > org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) > at > org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Nati
[jira] [Assigned] (HIVE-23476) [LLAP] Preallocate arenas for mmap case as well
[ https://issues.apache.org/jira/browse/HIVE-23476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-23476: > [LLAP] Preallocate arenas for mmap case as well > --- > > Key: HIVE-23476 > URL: https://issues.apache.org/jira/browse/HIVE-23476 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > BuddyAllocator pre-allocation of arenas does not happen for mmap cache case. > Since we are not filling up the mmap'ed buffers the upfront allocations in > constructor is cheap. This can avoid lock free allocation of arenas later in > the code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107565#comment-17107565 ] Prasanth Jayachandran commented on HIVE-23443: -- I was able to repro the issue with unit test. Included that in .2 patch. [~pgaref] The guaranteed updates is hairy piece to touch for now, so not doing it in this ticket. .2 patch is same as .1 with added junit tests. Could you please take a look? > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch > > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23443: - Attachment: HIVE-23443.2.patch > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch > > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107490#comment-17107490 ] Prasanth Jayachandran commented on HIVE-23443: -- Created HIVE-23472 to handle the guaranteed state update which is tied to WLM. For now keeping the WLM issue separate and will be handled in HIVE-23472. In this ticket I will specifically handle the finishable state updates. Will add more unit tests to the .1 patch. > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23443.1.patch > > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23466) ZK registry base should remove only specific instance instead of host
[ https://issues.apache.org/jira/browse/HIVE-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-23466: > ZK registry base should remove only specific instance instead of host > - > > Key: HIVE-23466 > URL: https://issues.apache.org/jira/browse/HIVE-23466 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > When ZKRegistryBase detects new ZK nodes it maintains path based cache and > host based cache. The host based cached already handles multiple instances > running in same host. But even if single instance is removed all instances > belonging to the host are removed. > Another issue is that, if single host has multiple instances it returns a Set > with no ordering. Ideally, we want the newest instance to be top of the set > (use TreeSet maybe?). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105031#comment-17105031 ] Prasanth Jayachandran commented on HIVE-23443: -- [~gopalv]/[~pgaref] I simplified logic of pre-emption queue handling to the following 2 conditions 1) If guaranteed or finishable, the task should not be in pre-emption queue 2) if speculative or non-finishable, the task should be in pre-emption queue I hope I am not missing any other conditions. Could you please take another look? [~pgaref] i changed the test cases based on the above conditions. Let me know if I missed any case. > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23443.1.patch > > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104926#comment-17104926 ] Prasanth Jayachandran commented on HIVE-23443: -- Good catch. I will update the PR and pull in the test case. Thanks! > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23443.1.patch > > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104868#comment-17104868 ] Prasanth Jayachandran commented on HIVE-23443: -- The patch is still pending testing with some workloads where the issue is reproducible. I will update here once it is done. The patch is ready for review though. cc/ [~gopalv] [~rbalamohan] [~pgaref] > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23443.1.patch > > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23443: - Status: Patch Available (was: Open) > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23443.1.patch > > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23443: - Attachment: HIVE-23443.1.patch > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Attachments: HIVE-23443.1.patch > > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23443: - Description: I think after HIVE-23210 we are getting a stable sort order and it is causing pre-emption to not work in certain cases. {code:java} "attempt_1589167813851__119_01_08_0 (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started at 2020-05-11 05:59:22, in preemption queue, can finish)", "attempt_1589167813851_0008_84_01_08_1 (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} Scheduler only peek's at the pre-emption queue and looks at whether it is non-finishable. [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] In the above case, all tasks are speculative but state change is not triggering pre-emption queue re-ordering so peek() always returns canFinish task even though non-finishable tasks are in the queue. was: I think after HIVE-23210 we are getting a stable sort order in pre-emption queue and it is causing pre-emption to not work in certain cases. {code:java} "attempt_1589167813851__119_01_08_0 (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started at 2020-05-11 05:59:22, in preemption queue, can finish)", "attempt_1589167813851_0008_84_01_08_1 (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} Scheduler only peek's at the pre-emption queue and looks at whether it is non-finishable. [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] In the above case, all tasks are speculative but state change is not triggering pre-emption queue re-ordering so peek() always returns canFinish task even though non-finishable tasks are in the queue. > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23443) LLAP speculative task pre-emption seems to be not working
[ https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-23443: Assignee: Prasanth Jayachandran > LLAP speculative task pre-emption seems to be not working > - > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > I think after HIVE-23210 we are getting a stable sort order in pre-emption > queue and it is causing pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851__119_01_08_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_08_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23441) Support foreground option for running llap scripts
[ https://issues.apache.org/jira/browse/HIVE-23441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-23441: > Support foreground option for running llap scripts > -- > > Key: HIVE-23441 > URL: https://issues.apache.org/jira/browse/HIVE-23441 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > > Llap scripts are always running in background. To make it container friendly, > support foreground execution of the script as an option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23151) LLAP: default hive.llap.file.cleanup.delay.seconds=0s
[ https://issues.apache.org/jira/browse/HIVE-23151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077453#comment-17077453 ] Prasanth Jayachandran commented on HIVE-23151: -- +1, pending test. > LLAP: default hive.llap.file.cleanup.delay.seconds=0s > - > > Key: HIVE-23151 > URL: https://issues.apache.org/jira/browse/HIVE-23151 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23151.01.patch > > > The current default value (300s) reflects more a debugging scenario, let's > set this to 0s in order to make shuffle local files be cleaned up immediately > after dag complete. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23111) MsckPartitionExpressionProxy should filter partitions
[ https://issues.apache.org/jira/browse/HIVE-23111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076788#comment-17076788 ] Prasanth Jayachandran commented on HIVE-23111: -- +1, pending tests > MsckPartitionExpressionProxy should filter partitions > - > > Key: HIVE-23111 > URL: https://issues.apache.org/jira/browse/HIVE-23111 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sam An >Assignee: Sam An >Priority: Major > Attachments: Hive-23111.1.patch, Hive-23111.2.patch, > Hive-23111.3.patch, Hive-23111.4.patch > > > Currently MsckPartitionExpressionProxy does not filter partition names, this > causes problem for partition auto discovery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23144) LLAP: Let QueryTracker cleanup on serviceStop
[ https://issues.apache.org/jira/browse/HIVE-23144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076528#comment-17076528 ] Prasanth Jayachandran commented on HIVE-23144: -- +1, pending tests > LLAP: Let QueryTracker cleanup on serviceStop > - > > Key: HIVE-23144 > URL: https://issues.apache.org/jira/browse/HIVE-23144 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: HIVE-23144.01.patch > > > QueryTracker's executor service basically runs cleanup tasks: > ExternalQueryCleanerCallable, DagMapCleanerCallable, FileCleanerCallable. > Changing the shutdown behavior to > [.shutdown()|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdown--] > from > [.shutdownNow()|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdownNow--] > would let QueryTracker to cleanup its garbage, for example shuffle local > files: > https://github.com/apache/hive/blob/c3ec20dd4f5b5fbde4007041844f6aed8c262ca1/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java#L440 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23144) LLAP: Let QueryTracker cleanup on serviceStop
[ https://issues.apache.org/jira/browse/HIVE-23144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076499#comment-17076499 ] Prasanth Jayachandran commented on HIVE-23144: -- The interrupted exception e is being dropped. can you log the interrupted exception as well? To know the stacktrace that caused interrupted exception. > LLAP: Let QueryTracker cleanup on serviceStop > - > > Key: HIVE-23144 > URL: https://issues.apache.org/jira/browse/HIVE-23144 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Attachments: HIVE-23144.01.patch > > > QueryTracker's executor service basically runs cleanup tasks: > ExternalQueryCleanerCallable, DagMapCleanerCallable, > FileCleanerCallable...changing the shutdown behavior to > [.shutdown()|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdown--] > from > [.shutdownNow()|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdownNow--] > would let QueryTracker to cleanup it's garbage, for example shuffle local > files: > https://github.com/apache/hive/blob/c3ec20dd4f5b5fbde4007041844f6aed8c262ca1/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java#L440 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23095) NDV might be overestimated for a table with ~70 value
[ https://issues.apache.org/jira/browse/HIVE-23095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076480#comment-17076480 ] Prasanth Jayachandran commented on HIVE-23095: -- I think fastutil was deliberately not included in the past as it is 18+MB jar. Can you use Map instead? Also can you also include merge() performance to JMH as it is also in the critical hot path? > NDV might be overestimated for a table with ~70 value > - > > Key: HIVE-23095 > URL: https://issues.apache.org/jira/browse/HIVE-23095 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23095.01.patch, HIVE-23095.02.patch, > HIVE-23095.03.patch, HIVE-23095.04.patch, HIVE-23095.04.patch, > HIVE-23095.04.patch, HIVE-23095.05.patch, hll-bench.md > > Time Spent: 0.5h > Remaining Estimate: 0h > > uncovered during looking into HIVE-23082 > https://issues.apache.org/jira/browse/HIVE-23082?focusedCommentId=17067773&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17067773 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters
[ https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23118: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Committed to master. Thanks for the review! > Option for exposing compile time counters as tez counters > - > > Key: HIVE-23118 > URL: https://issues.apache.org/jira/browse/HIVE-23118 > Project: Hive > Issue Type: Improvement >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23118.1.patch, HIVE-23118.2.patch, > HIVE-23118.3.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > TezCounters currently are runtime only. Some compile time information from > optimizer can be exposed as counters which can then be used by workload > management to make runtime decisions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23082) PK/FK stat rescale doesn't work in some cases
[ https://issues.apache.org/jira/browse/HIVE-23082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074992#comment-17074992 ] Prasanth Jayachandran commented on HIVE-23082: -- Left a comment in github PR. I think we should remove tempList from sparse register and directly insert to sparsemap, so that getSize() becomes constant time operation and less branching/branch misses. > PK/FK stat rescale doesn't work in some cases > - > > Key: HIVE-23082 > URL: https://issues.apache.org/jira/browse/HIVE-23082 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-23082.01.patch, HIVE-23082.02.patch, > HIVE-23082.03.patch, HIVE-23082.03.patch, HIVE-23082.03.patch, > HIVE-23082.03.patch > > > As a result in Joins may retain the original estimate; see MAPJOIN_33 in this > plan ; which retained the estimate of SEL_32 > {code} > ++ > | Explain | > ++ > | Plan optimized by CBO. | > || > | Vertex dependency in root stage| > | Map 1 <- Map 2 (BROADCAST_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:12 | > | Stage-1| > | Map 1 vectorized | > | File Output Operator [FS_36] | > | Limit [LIM_35] (rows=12 width=4) | > | Number of rows:12| > | Select Operator [SEL_34] (rows=5040 width=4) | > | Output:["_col0"] | > | Map Join Operator [MAPJOIN_33] (rows=5040 width=8) | > | Conds:SEL_32._col0=RS_30._col0(Inner) | > | <-Map 2 [BROADCAST_EDGE] vectorized| > | BROADCAST [RS_30]| > | PartitionCols:_col0| > | Select Operator [SEL_29] (rows=1 width=8) | > | Output:["_col0"] | > | Filter Operator [FIL_28] (rows=1 width=108) | > | predicate:((r_reason_id = 'reason 66') and r_reason_sk > is not null) | > | TableScan [TS_3] (rows=2 width=108) | > | > default@rx0,reason,Tbl:COMPLETE,Col:COMPLETE,Output:["r_reason_id","r_reason_sk"] > | > | <-Select Operator [SEL_32] (rows=5040 width=7) | > | Output:["_col0"] | > | Filter Operator [FIL_31] (rows=5040 width=7) | > | predicate:sr_reason_sk is not null | > | TableScan [TS_0] (rows=5112 width=7) | > | > default@sr0,store_returns,Tbl:COMPLETE,Col:COMPLETE,Output:["sr_reason_sk"] | > || > ++ > {code} > repro: > {code} > set hive.query.results.cache.enabled=false; > set hive.explain.user=true; > drop table if exists default.rx0; > drop table if exists default.sr0; > create table rx0 (r_reason_id string, r_reason_sk bigint); > create table sr0 (sr_reason_sk bigint); > insert into rx0 values ('',1),('GEAA',70); > insert into sr0 values (NULL),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10), > (11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25), > (26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40), > (41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55), > (56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70); > insert into sr0 select a.* from sr0 a,sr0 b; > -- |sr0| ~ 5112 > explain select 1 > from default.sr0 store_returns , default.rx0 reason > where sr_reason_sk = r_reason_sk > and r_reason_id = 'reason 66' > limit 12; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters
[ https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23118: - Attachment: HIVE-23118.3.patch > Option for exposing compile time counters as tez counters > - > > Key: HIVE-23118 > URL: https://issues.apache.org/jira/browse/HIVE-23118 > Project: Hive > Issue Type: Improvement >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-23118.1.patch, HIVE-23118.2.patch, > HIVE-23118.3.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > TezCounters currently are runtime only. Some compile time information from > optimizer can be exposed as counters which can then be used by workload > management to make runtime decisions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters
[ https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23118: - Attachment: HIVE-23118.2.patch > Option for exposing compile time counters as tez counters > - > > Key: HIVE-23118 > URL: https://issues.apache.org/jira/browse/HIVE-23118 > Project: Hive > Issue Type: Improvement >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-23118.1.patch, HIVE-23118.2.patch > > Time Spent: 20m > Remaining Estimate: 0h > > TezCounters currently are runtime only. Some compile time information from > optimizer can be exposed as counters which can then be used by workload > management to make runtime decisions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted
[ https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072509#comment-17072509 ] Prasanth Jayachandran commented on HIVE-23110: -- I have partial logs {code:java} hiveserver2 <14>1 2020-03-31T20:52:24.702Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="ql.Driver" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] Executing command(queryId=hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e) has been interrupted after 133.75 seconds hiveserver2 <14>1 2020-03-31T20:52:24.702Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="ql.Driver" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] OK hiveserver2 <15>1 2020-03-31T20:52:24.702Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="log.PerfLogger" level="DEBUG" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] hiveserver2 <14>1 2020-03-31T20:52:24.711Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="common.LogUtils" level="INFO" thread="HiveServer2-Background-Pool: Thread-74"] Unregistered logging context. hiveserver2 <14>1 2020-03-31T20:52:24.702Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="lockmgr.DbLockManager" level="INFO" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] releaseLocks: hiveserver2 <15>1 2020-03-31T20:52:24.703Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="log.PerfLogger" level="DEBUG" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] hiveserver2 <11>1 2020-03-31T20:52:24.711Z hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="operation.Operation" level="ERROR" operationLogLevel="EXECUTION" queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" thread="HiveServer2-Background-Pool: Thread-74"] Error running hive query: org.apache.hive.service.cli.HiveSQLException: Illegal Operation state transition from CANCELED to FINISHED at org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:97) at org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:103) at org.apache.hive.service.cli.operation.Operation.setState(Operation.java:161) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:248) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) hiveserver2 2020-03-31 20:52:24,710 Log4j2-TF-1-AsyncLogger[AsyncContext@18b4aac2]-1 ERROR /tmp/hive/operation_logs/94e0ab1a-e5ca-4237-9713-235b5dd2559a/hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e w
[jira] [Commented] (HIVE-23118) Option for exposing compile time counters as tez counters
[ https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072409#comment-17072409 ] Prasanth Jayachandran commented on HIVE-23118: -- [~Sreenath] These are hive side counters merged with dag counters on the client side. These counters will be added to any tez task during hive query compilation. I don't think this will be available at tez side as it does not attach to any tez context. It will accessible to hive hooks though (hive proto hook can dump it). > Option for exposing compile time counters as tez counters > - > > Key: HIVE-23118 > URL: https://issues.apache.org/jira/browse/HIVE-23118 > Project: Hive > Issue Type: Improvement >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-23118.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > TezCounters currently are runtime only. Some compile time information from > optimizer can be exposed as counters which can then be used by workload > management to make runtime decisions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters
[ https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23118: - Status: Patch Available (was: Open) > Option for exposing compile time counters as tez counters > - > > Key: HIVE-23118 > URL: https://issues.apache.org/jira/browse/HIVE-23118 > Project: Hive > Issue Type: Improvement >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-23118.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > TezCounters currently are runtime only. Some compile time information from > optimizer can be exposed as counters which can then be used by workload > management to make runtime decisions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters
[ https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-23118: - Attachment: HIVE-23118.1.patch > Option for exposing compile time counters as tez counters > - > > Key: HIVE-23118 > URL: https://issues.apache.org/jira/browse/HIVE-23118 > Project: Hive > Issue Type: Improvement >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-23118.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > TezCounters currently are runtime only. Some compile time information from > optimizer can be exposed as counters which can then be used by workload > management to make runtime decisions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23118) Option for exposing compile time counters as tez counters
[ https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-23118: > Option for exposing compile time counters as tez counters > - > > Key: HIVE-23118 > URL: https://issues.apache.org/jira/browse/HIVE-23118 > Project: Hive > Issue Type: Improvement >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > > TezCounters currently are runtime only. Some compile time information from > optimizer can be exposed as counters which can then be used by workload > management to make runtime decisions. -- This message was sent by Atlassian Jira (v8.3.4#803005)