[jira] [Resolved] (HIVE-26522) Test for HIVE-22033 and backport to 3.1 and 2.3

2022-11-14 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-26522.
--
Fix Version/s: 2.3.9
   3.2.0
   4.0.0
   Resolution: Fixed

Thanks for the contribution [~planka] ! Patch merged to all branches. 

> Test for HIVE-22033 and backport to 3.1 and 2.3
> ---
>
> Key: HIVE-26522
> URL: https://issues.apache.org/jira/browse/HIVE-26522
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 2.3.8, 3.1.3
>Reporter: Pavan Lanka
>Assignee: Pavan Lanka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.9, 3.2.0, 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> HIVE-22033 fixes the issue with Hive Delegation tokens so that the renewal 
> time is effective.
> This looks at adding a test for HIVE-22033 and backporting this fix to 3.1 
> and 2.3 branches in Hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-25646) Thrift metastore URI reverse resolution could fail in some environments

2022-01-19 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-25646.
--
Fix Version/s: 3.2.0
   4.0.0
   Resolution: Fixed

> Thrift metastore URI reverse resolution could fail in some environments
> ---
>
> Key: HIVE-25646
> URL: https://issues.apache.org/jira/browse/HIVE-25646
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When custom URI resolver is not specified, the default thrift metastore URI 
> goes through DNS reverse resolution (getCanonicalHostname) which is unlikely 
> to resolve correctly when the HMS is sitting behind LBs and proxies. This is 
> a change in behaviour from hive 2.x branch which isn't required. If reverse 
> resolution is required, custom URI resolver can be implemented.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25646) Thrift metastore URI reverse resolution could fail in some environments

2022-01-10 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-25646:


Assignee: Prasanth Jayachandran

> Thrift metastore URI reverse resolution could fail in some environments
> ---
>
> Key: HIVE-25646
> URL: https://issues.apache.org/jira/browse/HIVE-25646
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When custom URI resolver is not specified, the default thrift metastore URI 
> goes through DNS reverse resolution (getCanonicalHostname) which is unlikely 
> to resolve correctly when the HMS is sitting behind LBs and proxies. This is 
> a change in behaviour from hive 2.x branch which isn't required. If reverse 
> resolution is required, custom URI resolver can be implemented.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-24866) FileNotFoundException during alter table concat

2021-03-10 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-24866:



> FileNotFoundException during alter table concat
> ---
>
> Key: HIVE-24866
> URL: https://issues.apache.org/jira/browse/HIVE-24866
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.4.0, 3.2.0, 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> Because of the way combinefile IF groups files based on node and rack 
> locality, there are cases where single big orc file gets spread across 2 or 
> more combine hive split. When first task completes, as part of jobCloseOp the 
> source orc file of concatenation is moved/renamed which can lead to 
> FileNotFoundException in subsequent mappers that has partial split of that 
> file. 
> A simple fix would be for the mapper with start of the split to own the 
> entire orc file for concatenation. If a mapper gets partial split which is 
> not the start then it can skip the entire file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods

2021-02-18 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24786:
-
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> JDBC HttpClient should retry for idempotent and unsent http methods
> ---
>
> Key: HIVE-24786
> URL: https://issues.apache.org/jira/browse/HIVE-24786
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When hiveserver2 is behind multiple proxies there is possibility of "broken 
> pipe", "connect timeout" and "read timeout" exceptions if one of the 
> intermediate proxies or load balancers decided to reset the underlying tcp 
> socket after idle timeout. When the connection is broken and when the query 
> is submitted after idle timeout from beeline (or client) perspective the 
> connection is open but http methods (POST/GET) fails with socket related 
> exceptions. Since these methods are not sent to the server these are safe for 
> client side retries. 
>  
> Also HIVE-12371 seems to apply the socket timeout only to binary transport. 
> Same can be passed on to http client as well to avoid retry hang issues with 
> infinite timeouts. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods

2021-02-16 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24786:
-
Description: 
When hiveserver2 is behind multiple proxies there is possibility of "broken 
pipe", "connect timeout" and "read timeout" exceptions if one of the 
intermediate proxies or load balancers decided to reset the underlying tcp 
socket after idle timeout. When the connection is broken and when the query is 
submitted after idle timeout from beeline (or client) perspective the 
connection is open but http methods (POST/GET) fails with socket related 
exceptions. Since these methods are not sent to the server these are safe for 
client side retries. 

 

Also HIVE-12371 seems to apply the socket timeout only to binary transport. 
Same can be passed on to http client as well to avoid retry hang issues with 
infinite timeouts. 

  was:When hiveserver2 is behind multiple proxies there is possibility of 
"broken pipe", "connect timeout" and "read timeout" exceptions if one of the 
intermediate proxies or load balancers decided to reset the underlying tcp 
socket after idle timeout. When the connection is broken and when the query is 
submitted after idle timeout from beeline (or client) perspective the 
connection is open but http methods (POST/GET) fails with socket related 
exceptions. Since these methods are not sent to the server these are safe for 
client side retries. 


> JDBC HttpClient should retry for idempotent and unsent http methods
> ---
>
> Key: HIVE-24786
> URL: https://issues.apache.org/jira/browse/HIVE-24786
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When hiveserver2 is behind multiple proxies there is possibility of "broken 
> pipe", "connect timeout" and "read timeout" exceptions if one of the 
> intermediate proxies or load balancers decided to reset the underlying tcp 
> socket after idle timeout. When the connection is broken and when the query 
> is submitted after idle timeout from beeline (or client) perspective the 
> connection is open but http methods (POST/GET) fails with socket related 
> exceptions. Since these methods are not sent to the server these are safe for 
> client side retries. 
>  
> Also HIVE-12371 seems to apply the socket timeout only to binary transport. 
> Same can be passed on to http client as well to avoid retry hang issues with 
> infinite timeouts. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-22196) Socket timeouts happen when other drivers set DriverManager.loginTimeout

2021-02-16 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-22196.
--
Resolution: Fixed

Fixed by HIVE-12371

> Socket timeouts happen when other drivers set DriverManager.loginTimeout
> 
>
> Key: HIVE-22196
> URL: https://issues.apache.org/jira/browse/HIVE-22196
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC, Thrift API
>Affects Versions: 1.2.1, 2.0.0, 3.1.2
> Environment: Any Hive JDBC client that uses other SQL clients besides 
> Hive, or any other kind of JDBC driver (e.g. connection pooling). This can 
> only happen if the other driver writes values to 
> {{DriverManager.setLoginTimeout()}}. HikariCP is one suspect, there are 
> probably others as well.
>Reporter: Nathan Clark
>Priority: Major
>
> There are a few somewhat sketchy things happening in Hive/Thrift code in the 
> JDBC client that result in intermittent "read timed out" (and subsequently 
> "out of sequence") errors when other JDBC drivers are active in the same 
> client JVM that set {{DriverManager.loginTimeout}}.
>  # The login timeout used to initialize a {{HiveConnection}} is populated 
> from {{DriverManager.loginTimeout}} in the core Java JDBC library. This 
> sounds like a nice, orthodox place to get a login timeout from, but it's 
> fundamentally problematic and really shouldn't be used. The reason is that 
> it's a *global* singleton value, and any JDBC Driver (or any other piece of 
> code for that matter) can write to it at will (and is implicitly invited to). 
> The Hive JDBC stack _itself_ writes values to this global setting in a couple 
> of places seemingly unrelated to the client connection setup.
>  # The _read_ timeout for Thrift _socket-level_ reads is actually populated 
> from this _login_ timeout (a.k.a. "connect timeout") setting. (See Thrift's 
> {{TSocket(String host, int port, int timeout)}} and its callers in 
> {{HiveAuthFactory}}. Also note the numerous code comments that speak of 
> setting {{SO_TIMEOUT}} (the socket read timeout) while the actual code 
> references a variable called {{loginTimeout}}.) Socket reads can occur 
> thousands of times in an application that does lots of Hive queries, and 
> their individual workloads are each individually less predictable than simply 
> getting a connection, which typically happens at most a few times. So you 
> have a huge probability that a login timeout setting, which seems to usually 
> receive a reasonable value of 30 seconds if constrained at all, will 
> occasionally (way too often) be inadequate for a socket read.
>  # There seems to be no option to set this login timeout (or the actual read 
> timeout) explicitly as an externalized override setting (but see HIVE-12371). 
> *Summary:* {\{DriverManager.loginTimeout}} can be innocently set by any JDBC 
> driver present in the JVM, you can't override it, and it's misused by Hive as 
> a socket read timeout. There's no way to prevent intermittent read timeouts 
> in this scenario unless you're lucky enough to find the JDBC driver and 
> reconfigure its timeout setting to something workable for Hive socket reads.
> An easy, crude patch:
> modify the first line of {{HiveConnection.setupLoginTimeout()}} from:
> {{long timeOut = TimeUnit.SECONDS.toMillis(DriverManager.getLoginTimeout());}}
> to:
> {{long timeOut = TimeUnit.SECONDS.toMillis(0);}}
> This is of course not a robust fix, as server issues during socket reads can 
> result in a hung client thread. Some other hardcoded value might be more 
> advisable, as long as it's long enough to prevent spurious read timeouts.
> The right approach is to prioritize HIVE-12371 (proposed socket timeout 
> override setting that doesn't depend on {{DriverManager.loginTimeout}}) and 
> implement it in all possible versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-2357) Support connection timeout in hive JDBC

2021-02-16 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-2357.
-
Resolution: Fixed

Fixed by HIVE-12371

> Support connection timeout in hive JDBC
> ---
>
> Key: HIVE-2357
> URL: https://issues.apache.org/jira/browse/HIVE-2357
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-14517) Hive JDBC driver login timeout used as socket timeout

2021-02-16 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-14517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-14517.
--
Resolution: Fixed

Fixed by HIVE-12371

> Hive JDBC driver login timeout used as socket timeout
> -
>
> Key: HIVE-14517
> URL: https://issues.apache.org/jira/browse/HIVE-14517
> Project: Hive
>  Issue Type: Bug
>Reporter: Mark Kidwell
>Priority: Major
>
> HIVE-5351 added client timeout support by setting the transport socket read 
> timeout to the JDBC DriverManager login timeout. While useful as a global 
> network IO timeout, it isn't the expected behavior for this timeout setting. 
> It also makes it impossible to require logins to complete quickly, for 
> example, but allow queries to run for longer periods.
> Ideally multiple timeouts (connect, login and socket read) would be supported 
> as in other JDBC drivers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-12371) Adding a timeout connection parameter for JDBC

2021-02-16 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-12371.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

PR merged to master. Thanks for the contribution!

> Adding a timeout connection parameter for JDBC
> --
>
> Key: HIVE-12371
> URL: https://issues.apache.org/jira/browse/HIVE-12371
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Nemon Lou
>Assignee: Xi Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are some timeout settings from server side:
> HIVE-4766
> HIVE-6679
> Adding a timeout connection parameter for JDBC is useful in some scenario:
> 1,beeline (which can not set timeout manually)
> 2,customize timeout for different connections (among hive or RDBs,which can 
> not be done via DriverManager.setLoginTimeout())
> Just like postgresql,
> {noformat}
> jdbc:postgresql://localhost/test?user=fred&password=secret&ssl=true&connectTimeout=0
> {noformat}
> or mysql
> {noformat}
> jdbc:mysql://xxx.xx.xxx.xxx:3306/database?connectTimeout=6&socketTimeout=6
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods

2021-02-16 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285169#comment-17285169
 ] 

Prasanth Jayachandran commented on HIVE-24786:
--

[~thejas] [~ngangam] Can you please help with reviewing this PR?

> JDBC HttpClient should retry for idempotent and unsent http methods
> ---
>
> Key: HIVE-24786
> URL: https://issues.apache.org/jira/browse/HIVE-24786
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When hiveserver2 is behind multiple proxies there is possibility of "broken 
> pipe", "connect timeout" and "read timeout" exceptions if one of the 
> intermediate proxies or load balancers decided to reset the underlying tcp 
> socket after idle timeout. When the connection is broken and when the a query 
> is submitted after idle timeout from beeline (or client) perspective the 
> connection is open but http methods (POST/GET) fails with socket related 
> exceptions. Since these methods are not sent to the server these are safe for 
> client side retries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods

2021-02-16 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24786:
-
Description: When hiveserver2 is behind multiple proxies there is 
possibility of "broken pipe", "connect timeout" and "read timeout" exceptions 
if one of the intermediate proxies or load balancers decided to reset the 
underlying tcp socket after idle timeout. When the connection is broken and 
when the query is submitted after idle timeout from beeline (or client) 
perspective the connection is open but http methods (POST/GET) fails with 
socket related exceptions. Since these methods are not sent to the server these 
are safe for client side retries.   (was: When hiveserver2 is behind multiple 
proxies there is possibility of "broken pipe", "connect timeout" and "read 
timeout" exceptions if one of the intermediate proxies or load balancers 
decided to reset the underlying tcp socket after idle timeout. When the 
connection is broken and when the a query is submitted after idle timeout from 
beeline (or client) perspective the connection is open but http methods 
(POST/GET) fails with socket related exceptions. Since these methods are not 
sent to the server these are safe for client side retries. )

> JDBC HttpClient should retry for idempotent and unsent http methods
> ---
>
> Key: HIVE-24786
> URL: https://issues.apache.org/jira/browse/HIVE-24786
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When hiveserver2 is behind multiple proxies there is possibility of "broken 
> pipe", "connect timeout" and "read timeout" exceptions if one of the 
> intermediate proxies or load balancers decided to reset the underlying tcp 
> socket after idle timeout. When the connection is broken and when the query 
> is submitted after idle timeout from beeline (or client) perspective the 
> connection is open but http methods (POST/GET) fails with socket related 
> exceptions. Since these methods are not sent to the server these are safe for 
> client side retries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods

2021-02-16 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24786:
-
Status: Patch Available  (was: Open)

> JDBC HttpClient should retry for idempotent and unsent http methods
> ---
>
> Key: HIVE-24786
> URL: https://issues.apache.org/jira/browse/HIVE-24786
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When hiveserver2 is behind multiple proxies there is possibility of "broken 
> pipe", "connect timeout" and "read timeout" exceptions if one of the 
> intermediate proxies or load balancers decided to reset the underlying tcp 
> socket after idle timeout. When the connection is broken and when the a query 
> is submitted after idle timeout from beeline (or client) perspective the 
> connection is open but http methods (POST/GET) fails with socket related 
> exceptions. Since these methods are not sent to the server these are safe for 
> client side retries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods

2021-02-16 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-24786:



> JDBC HttpClient should retry for idempotent and unsent http methods
> ---
>
> Key: HIVE-24786
> URL: https://issues.apache.org/jira/browse/HIVE-24786
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> When hiveserver2 is behind multiple proxies there is possibility of "broken 
> pipe", "connect timeout" and "read timeout" exceptions if one of the 
> intermediate proxies or load balancers decided to reset the underlying tcp 
> socket after idle timeout. When the connection is broken and when the a query 
> is submitted after idle timeout from beeline (or client) perspective the 
> connection is open but http methods (POST/GET) fails with socket related 
> exceptions. Since these methods are not sent to the server these are safe for 
> client side retries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24501) UpdateInputAccessTimeHook should not update stats

2021-02-09 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24501:
-
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> UpdateInputAccessTimeHook should not update stats
> -
>
> Key: HIVE-24501
> URL: https://issues.apache.org/jira/browse/HIVE-24501
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> UpdateInputAccessTimeHook can fail for transactional tables with following 
> exception.
> The hook should skip updating the stats and only update the access time.
> {code:java}
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70)
>  at 
> org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
> at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at 
> org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at 
> org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> MetaException(message:Cannot change stats state for a transactional table 
> default.test without providing the transactional write state for verification 
> (new write ID 0, valid write IDs 
> default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.had

[jira] [Commented] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders

2021-01-15 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266271#comment-17266271
 ] 

Prasanth Jayachandran commented on HIVE-24569:
--

Left a question about how the Idle purging is triggered (we wanted to avoid 
files being closed too frequently). Looks good otherwise, +1. Thanks for adding 
tests for it! 

> LLAP daemon leaks file descriptors/log4j appenders
> --
>
> Key: HIVE-24569
> URL: https://issues.apache.org/jira/browse/HIVE-24569
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: llap-appender-gc-roots.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With HIVE-9756 query logs in LLAP are directed to different files (file per 
> query) using a Log4j2 routing appender. Without a purge policy in place, 
> appenders are created dynamically by the routing appender, one for each 
> query, and remain in memory forever. The dynamic appenders write to files so 
> each appender holds to a file descriptor. 
> Further work HIVE-14224 has mitigated the issue by introducing a custom 
> purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic 
> appenders (and closes the respective files) when the query is completed 
> (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion).
>  
> However, in the presence of multiple threads appending to the logs there are 
> race conditions. In an internal Hive cluster the number of file descriptors 
> started going up approx one descriptor leaking per query. After some 
> debugging it turns out that one thread (running the 
> QueryTracker#handleLogOnQueryCompletion) signals that the query has finished 
> and thus the purge policy should get rid of the respective appender (and 
> close the file) while another (Task-Executor-0) attempts to append another 
> log message for the same query. The initial appender is closed after the 
> request from the query tracker but a new one is created to accomodate the 
> message from the task executor and the latter is never removed thus creating 
> a leak. 
> Similar leaks have been identified and fixed for HS2 with the most similar 
> one being that described 
> [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041].
>  
> The problem relies on the timing of threads so it may not manifestate in all 
> versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via 
> lsof (or other similar command) with the following output:
> {noformat}
> # 1494391 is the PID of the LLAP daemon process
> ls -ltr /proc/1494391/fd
> ...
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567_0008_198.log
> lrwx-- 1 hive hadoop 64 Dec 24 12:08 980 -> 
> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121744_fcf37921-4351-4368-95ee-b5be2592d89a-dag_1608659125567_0008_195.log
> lrwx-- 1 hive hadoop 64 Dec 24 12

[jira] [Resolved] (HIVE-24514) UpdateMDatabaseURI does not update managed location URI

2021-01-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24514.
--
Resolution: Fixed

Thanks for the review! Merged to master.

> UpdateMDatabaseURI does not update managed location URI
> ---
>
> Key: HIVE-24514
> URL: https://issues.apache.org/jira/browse/HIVE-24514
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When FS Root is updated using metatool, if the DB has managed location 
> defined, 
> updateMDatabaseURI API should update the managed location as well. Currently 
> it only updates location uri.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24514) UpdateMDatabaseURI does not update managed location URI

2021-01-11 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262611#comment-17262611
 ] 

Prasanth Jayachandran commented on HIVE-24514:
--

[~ngangam] can you please take another look? addressed your review comment.

> UpdateMDatabaseURI does not update managed location URI
> ---
>
> Key: HIVE-24514
> URL: https://issues.apache.org/jira/browse/HIVE-24514
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When FS Root is updated using metatool, if the DB has managed location 
> defined, 
> updateMDatabaseURI API should update the managed location as well. Currently 
> it only updates location uri.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24514) UpdateMDatabaseURI does not update managed location URI

2020-12-10 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247111#comment-17247111
 ] 

Prasanth Jayachandran commented on HIVE-24514:
--

[~ngangam] can you please review this change? 
[https://github.com/apache/hive/pull/1761/files]

 

> UpdateMDatabaseURI does not update managed location URI
> ---
>
> Key: HIVE-24514
> URL: https://issues.apache.org/jira/browse/HIVE-24514
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When FS Root is updated using metatool, if the DB has managed location 
> defined, 
> updateMDatabaseURI API should update the managed location as well. Currently 
> it only updates location uri.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24514) UpdateMDatabaseURI does not update managed location URI

2020-12-10 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-24514:



> UpdateMDatabaseURI does not update managed location URI
> ---
>
> Key: HIVE-24514
> URL: https://issues.apache.org/jira/browse/HIVE-24514
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> When FS Root is updated using metatool, if the DB has managed location 
> defined, 
> updateMDatabaseURI API should update the managed location as well. Currently 
> it only updates location uri.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment

2020-12-09 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24497.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master! Thanks for your contribution!

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout in cloud environment
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: hive-24497.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24501) UpdateInputAccessTimeHook should not update stats

2020-12-08 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245807#comment-17245807
 ] 

Prasanth Jayachandran commented on HIVE-24501:
--

[~ashutoshc] [~jcamachorodriguez] could someone please help with reviewing this 
small change? 

> UpdateInputAccessTimeHook should not update stats
> -
>
> Key: HIVE-24501
> URL: https://issues.apache.org/jira/browse/HIVE-24501
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UpdateInputAccessTimeHook can fail for transactional tables with following 
> exception.
> The hook should skip updating the stats and only update the access time.
> {code:java}
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70)
>  at 
> org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
> at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at 
> org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at 
> org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> MetaException(message:Cannot change stats state for a transactional table 
> default.test without providing the transactional write state for verification 
> (new write ID 0, valid write IDs 
> default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  

[jira] [Updated] (HIVE-24501) UpdateInputAccessTimeHook should not update stats

2020-12-08 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24501:
-
Status: Patch Available  (was: Open)

> UpdateInputAccessTimeHook should not update stats
> -
>
> Key: HIVE-24501
> URL: https://issues.apache.org/jira/browse/HIVE-24501
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> UpdateInputAccessTimeHook can fail for transactional tables with following 
> exception.
> The hook should skip updating the stats and only update the access time.
> {code:java}
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70)
>  at 
> org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
> at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at 
> org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at 
> org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> MetaException(message:Cannot change stats state for a transactional table 
> default.test without providing the transactional write state for verification 
> (new write ID 0, valid write IDs 
> default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.ja

[jira] [Assigned] (HIVE-24501) UpdateInputAccessTimeHook should not update stats

2020-12-08 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-24501:



> UpdateInputAccessTimeHook should not update stats
> -
>
> Key: HIVE-24501
> URL: https://issues.apache.org/jira/browse/HIVE-24501
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> UpdateInputAccessTimeHook can fail for transactional tables with following 
> exception.
> The hook should skip updating the stats and only update the access time.
> {code:java}
> ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)ERROR : FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.HiveException(Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null)org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.test without 
> providing the transactional write state for verification (new write ID 0, 
> valid write IDs default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:821) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:769) at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:756) at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:70)
>  at 
> org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296) 
> at org.apache.hadoop.hive.ql.HookRunner.runPreHooks(HookRunner.java:273) at 
> org.apache.hadoop.hive.ql.Executor.preExecutionActions(Executor.java:155) at 
> org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> MetaException(message:Cannot change stats state for a transactional table 
> default.test without providing the transactional write state for verification 
> (new write ID 0, valid write IDs 
> default.test:8:9223372036854775807::1,2,3,4,7; current state 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","name":"true"}}; new state 
> null) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result$alter_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_req_result.read(ThriftHiveMetastore.java)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveM

[jira] [Resolved] (HIVE-24426) Spark job fails with fixed LlapTaskUmbilicalServer port

2020-11-30 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24426.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged the PR. Thanks [~ayushtkn] for the contribution!

> Spark job fails with fixed LlapTaskUmbilicalServer port
> ---
>
> Key: HIVE-24426
> URL: https://issues.apache.org/jira/browse/HIVE-24426
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In case of cloud deployments, multiple executors are launched on name node, 
> and incase a fixed umbilical port is specified using 
> {{spark.hadoop.hive.llap.daemon.umbilical.port=30006}}
> The job fails with BindException.
> {noformat}
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:30006] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:840)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:741)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:605)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:1169)
>   at org.apache.hadoop.ipc.Server.(Server.java:3032)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1039)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server.(WritableRpcEngine.java:438)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:332)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848)
>   at 
> org.apache.hadoop.hive.llap.tezplugins.helpers.LlapTaskUmbilicalServer.(LlapTaskUmbilicalServer.java:67)
>   at 
> org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient$SharedUmbilicalServer.(LlapTaskUmbilicalExternalClient.java:122)
>   ... 26 more
> Caused by: java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:588)
>   ... 34 more{noformat}
> To counter this, better to provide a range of ports



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24188) CTLT from MM to External or External to MM are failing with hive.strict.managed.tables & hive.create.as.acid

2020-09-23 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24188.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master. Thanks [~nareshpr] for the contribution!

> CTLT from MM to External or External to MM are failing with 
> hive.strict.managed.tables & hive.create.as.acid
> 
>
> Key: HIVE-24188
> URL: https://issues.apache.org/jira/browse/HIVE-24188
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Repro steps
>  
> {code:java}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create table test_mm(age int, name string) partitioned by(dept string) stored 
> as orc tblproperties('transactional'='true', 
> 'transactional_properties'='default');
> create external table test_external like test_mm LOCATION 
> '${system:test.tmp.dir}/create_like_mm_to_external';
> {code}
> Fails with below exception
> {code:java}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:default.test_external cannot be declared transactional 
> because it's an external table) (state=08S01,code=1){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-22290) ObjectStore.cleanWriteNotificationEvents and ObjectStore.cleanupEvents OutOfMemory on large number of pending events

2020-09-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-22290.
--
Resolution: Fixed

Merged to master. Thanks for the contribution [~nareshpr] !

> ObjectStore.cleanWriteNotificationEvents and ObjectStore.cleanupEvents 
> OutOfMemory on large number of pending events
> 
>
> Key: HIVE-22290
> URL: https://issues.apache.org/jira/browse/HIVE-22290
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, repl
>Affects Versions: 4.0.0
>Reporter: Thomas Prelle
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As in [https://jira.apache.org/jira/browse/HIVE-19430] if there are large 
> number of events that haven't been cleaned up for some reason, then 
> ObjectStore.cleanWriteNotificationEvents() and ObjectStore.cleanupEvents can 
> run out of memory while it loads all the events to be deleted.
> It should fetch events in batches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-26 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185392#comment-17185392
 ] 

Prasanth Jayachandran commented on HIVE-24020:
--

Merged to master. Thanks [~vpnvishv] !

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24061) Improve llap task scheduling for better cache hit rate

2020-08-26 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24061.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master. Thanks [~rajesh.balamohan] !

> Improve llap task scheduling for better cache hit rate 
> ---
>
> Key: HIVE-24061
> URL: https://issues.apache.org/jira/browse/HIVE-24061
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
>  Labels: perfomance, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TaskInfo is initialized with the "requestTime and locality delay". When lots 
> of vertices are in the same level, "taskInfo" details would be available 
> upfront. By the time, it gets to scheduling, "requestTime + localityDelay" 
> won't be higher than current time. Due to this, it misses scheduling delay 
> details and ends up choosing random node. This ends up missing cache hits and 
> reads data from remote storage.
> E.g Observed this pattern in Q75 of tpcds.
> Related lines of interest in scheduler: 
> [https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
>  
> |https://github.com/apache/hive/blob/master/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java]
> {code:java}
>boolean shouldDelayForLocality = 
> request.shouldDelayForLocality(schedulerAttemptTime);
> ..
> ..
> boolean shouldDelayForLocality(long schedulerAttemptTime) {
>   return localityDelayTimeout > schedulerAttemptTime;
> }
> {code}
>  
> Ideally, "localityDelayTimeout" should be adjusted based on it's first 
> scheduling opportunity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-26 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24020.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures

2020-08-26 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24068:
-
Fix Version/s: 4.0.0

> Add re-execution plugin for handling DAG submission and unmanaged AM failures
> -
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DAG submission failure can also happen in environments where AM container 
> died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet. There are retries at getSession and submitDAG level 
> individually but some submitDAG failure has to retry getSession as well as AM 
> could be unreachable, this can be handled in re-execution plugin.
> There is already AM loss retry execution plugin but it only handles managed 
> AMs. It can be extended to handle unmanaged AMs as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures

2020-08-26 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24068.
--
Resolution: Fixed

PR merged to master. Thanks [~kgyrtkirk]  for the review!

> Add re-execution plugin for handling DAG submission and unmanaged AM failures
> -
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DAG submission failure can also happen in environments where AM container 
> died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet. There are retries at getSession and submitDAG level 
> individually but some submitDAG failure has to retry getSession as well as AM 
> could be unreachable, this can be handled in re-execution plugin.
> There is already AM loss retry execution plugin but it only handles managed 
> AMs. It can be extended to handle unmanaged AMs as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures

2020-08-25 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24068:
-
Description: 
DAG submission failure can also happen in environments where AM container died 
causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started 
execution yet. There are retries at getSession and submitDAG level individually 
but some submitDAG failure has to retry getSession as well as AM could be 
unreachable, this can be handled in re-execution plugin.

There is already AM loss retry execution plugin but it only handles managed 
AMs. It can be extended to handle unmanaged AMs as well.

  was:DAG submission failure can also happen in environments where AM container 
died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
started execution yet. There are retries at getSession and submitDAG level 
individually but some submitDAG failure has to retry getSession as well as AM 
could be unreachable, this can be handled in re-execution plugin.


> Add re-execution plugin for handling DAG submission and unmanaged AM failures
> -
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DAG submission failure can also happen in environments where AM container 
> died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet. There are retries at getSession and submitDAG level 
> individually but some submitDAG failure has to retry getSession as well as AM 
> could be unreachable, this can be handled in re-execution plugin.
> There is already AM loss retry execution plugin but it only handles managed 
> AMs. It can be extended to handle unmanaged AMs as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission and unmanaged AM failures

2020-08-25 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24068:
-
Summary: Add re-execution plugin for handling DAG submission and unmanaged 
AM failures  (was: Add re-execution plugin for handling DAG submission failures)

> Add re-execution plugin for handling DAG submission and unmanaged AM failures
> -
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DAG submission failure can also happen in environments where AM container 
> died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet. There are retries at getSession and submitDAG level 
> individually but some submitDAG failure has to retry getSession as well as AM 
> could be unreachable, this can be handled in re-execution plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission failures

2020-08-24 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24068:
-
Description: DAG submission failure can also happen in environments where 
AM container died causing DNS issues. DAG submissions are safe to retry as the 
DAG hasn't started execution yet. There are retries at getSession and submitDAG 
level individually but some submitDAG failure has to retry getSession as well 
as AM could be unreachable, this can be handled in re-execution plugin.  (was: 
ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG 
submission failure can also happen in environments where AM container died 
causing DNS issues. DAG submissions are safe to retry as the DAG hasn't started 
execution yet.)

> Add re-execution plugin for handling DAG submission failures
> 
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> DAG submission failure can also happen in environments where AM container 
> died causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet. There are retries at getSession and submitDAG level 
> individually but some submitDAG failure has to retry getSession as well as AM 
> could be unreachable, this can be handled in re-execution plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24068) Add re-execution plugin for handling DAG submission failures

2020-08-24 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-24068:
-
Summary: Add re-execution plugin for handling DAG submission failures  
(was: ReExecutionOverlayPlugin can handle DAG submission failures as well)

> Add re-execution plugin for handling DAG submission failures
> 
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG 
> submission failure can also happen in environments where AM container died 
> causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24068) ReExecutionOverlayPlugin can handle DAG submission failures as well

2020-08-24 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-24068:



> ReExecutionOverlayPlugin can handle DAG submission failures as well
> ---
>
> Key: HIVE-24068
> URL: https://issues.apache.org/jira/browse/HIVE-24068
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> ReExecutionOverlayPlugin handles cases where there is a vertex failure. DAG 
> submission failure can also happen in environments where AM container died 
> causing DNS issues. DAG submissions are safe to retry as the DAG hasn't 
> started execution yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23776) Retire quickstats autocollection

2020-06-29 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148114#comment-17148114
 ] 

Prasanth Jayachandran commented on HIVE-23776:
--

[~pvary] I understand the performance concerns that the basicstats brings esp. 
on the cloud environments. But I would like to discuss the alternatives instead 
of just removing it as there are certainly dependencies on file sizes and 
number of files which cannot be removed. The rawDataSize is good but only 
represents the in-memory representation which is certainly good for most 
optimizations but not for all.. The totalFileSize vs rawDataSize gives 
approximately the compression ratio which still is beneficial for some 
optimizations (totalFileSize can be used for estimating the splits, estimating 
the number of containers/nodes required without running the scans etc.). It is 
better to pay the cost of it once upfront during ETL when compared to every 
time when we run a query or desc formatted. If the basicstats are published as 
counters from the tasks then tez AM can aggregate it at DAG level 
(https://github.com/apache/hive/blob/6440d93981e6d6aab59ecf2e77ffa45cd84d47de/ql/src/test/results/clientpositive/llap/tez_compile_counters.q.out#L1524-L1530)
 which HS2 can use to store it into the metastore without ever doing file 
listing. This is one such approach and this can be abstracted out if this 
required for other engines. We could explore alternative approaches as well. I 
do not think it is good idea to remove it just because it is slow on one cloud 
filesystem.

> Retire quickstats autocollection
> 
>
> Key: HIVE-23776
> URL: https://issues.apache.org/jira/browse/HIVE-23776
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23776) Retire quickstats autocollection

2020-06-29 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148054#comment-17148054
 ] 

Prasanth Jayachandran commented on HIVE-23776:
--

Yes. I know the quickstats part. The workload management triggers can define 
*any* hive counters that includes the following counters newly added.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CompileTimeCounters.java]
 

If text files land in some staging table and if there are workload management 
trigger/guardrails that says "if query scans > 10TB kill query" then removing 
these quick stats will break the functionality. These staging tables are not 
going to get analyzed in some cases for it to collect statistics. 

Just searching the hive code base, unit testing will alone not be sufficient to 
know if customers are using it or not. If there is a specific need to remove 
this put it behind a config, deprecate and remove in iterations before removing 
it in one go. 

 

> Retire quickstats autocollection
> 
>
> Key: HIVE-23776
> URL: https://issues.apache.org/jira/browse/HIVE-23776
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23776) Retire quickstats autocollection

2020-06-29 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147939#comment-17147939
 ] 

Prasanth Jayachandran commented on HIVE-23776:
--

{quote}I don't think they are really in use...
{quote}
It is used in many places. There is stats annotation fallback which relies on 
this. There are compile time counters added for this which can be used for 
workload management guardrails. There are some existing pre-hooks which relies 
on this or could be relying on this. I am -1 on removing this without having 
substantial evidence that this is not used. 

> Retire quickstats autocollection
> 
>
> Key: HIVE-23776
> URL: https://issues.apache.org/jira/browse/HIVE-23776
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-06-22 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142241#comment-17142241
 ] 

Prasanth Jayachandran commented on HIVE-23737:
--

cc/ [~rajesh.balamohan]

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22687) Query hangs indefinitely if LLAP daemon registers after the query is submitted

2020-06-13 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134956#comment-17134956
 ] 

Prasanth Jayachandran commented on HIVE-22687:
--

Since it is not a very common scenario, I guess it is ok to commit the patch. 
We can revisit in a follow up if we observe it under different scenarios. 

> Query hangs indefinitely if LLAP daemon registers after the query is submitted
> --
>
> Key: HIVE-22687
> URL: https://issues.apache.org/jira/browse/HIVE-22687
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.1.0
>Reporter: Himanshu Mishra
>Assignee: Himanshu Mishra
>Priority: Major
> Attachments: HIVE-22687.01.patch, HIVE-22687.02.patch
>
>
> If a query is submitted and no LLAP daemon is running, it waits for 1 minute 
> and times out with error {{SERVICE_UNAVAILABLE}}.
> While waiting, if a new LLAP Daemon starts, then the timeout is cancelled, 
> and the tasks do not get scheduled as well. As a result, the query hangs 
> indefinitely.
> This is due to the race condition where LLAP Daemon first registers the LLAP 
> instance at {{.../workers/worker-}}, and afterwards registers 
> {{.../workers/slot-}}. In the gap between two, Tez AM gets notified of 
> worker zk node and while processing it checks if slot zk node is present, if 
> not it rejects the LLAP Daemon. Error in Tez AM is:
> {code:java}
> [INFO] [LlapScheduler] |impl.LlapZookeeperRegistryImpl|: Unknown slot for 
> 8ebfdc45-0382-4757-9416-52898885af90{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22687) Query hangs indefinitely if LLAP daemon registers after the query is submitted

2020-06-13 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134947#comment-17134947
 ] 

Prasanth Jayachandran commented on HIVE-22687:
--

The issue I saw was a corner case, with just only one node. With >1 node I 
didn’t see this issue. 

> Query hangs indefinitely if LLAP daemon registers after the query is submitted
> --
>
> Key: HIVE-22687
> URL: https://issues.apache.org/jira/browse/HIVE-22687
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.1.0
>Reporter: Himanshu Mishra
>Assignee: Himanshu Mishra
>Priority: Major
> Attachments: HIVE-22687.01.patch, HIVE-22687.02.patch
>
>
> If a query is submitted and no LLAP daemon is running, it waits for 1 minute 
> and times out with error {{SERVICE_UNAVAILABLE}}.
> While waiting, if a new LLAP Daemon starts, then the timeout is cancelled, 
> and the tasks do not get scheduled as well. As a result, the query hangs 
> indefinitely.
> This is due to the race condition where LLAP Daemon first registers the LLAP 
> instance at {{.../workers/worker-}}, and afterwards registers 
> {{.../workers/slot-}}. In the gap between two, Tez AM gets notified of 
> worker zk node and while processing it checks if slot zk node is present, if 
> not it rejects the LLAP Daemon. Error in Tez AM is:
> {code:java}
> [INFO] [LlapScheduler] |impl.LlapZookeeperRegistryImpl|: Unknown slot for 
> 8ebfdc45-0382-4757-9416-52898885af90{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable

2020-06-06 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23582:
-
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged to master. Thanks Gopal for the review!

> LLAP: Make SplitLocationProvider impl pluggable
> ---
>
> Key: HIVE-23582
> URL: https://issues.apache.org/jira/browse/HIVE-23582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23582.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP uses HostAffinitySplitLocationProvider implementation by default. For 
> non zookeeper based environments, a different split location provider may be 
> used. To facilitate that make the SplitLocationProvider implementation class 
> a pluggable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-19926) Remove deprecated hcatalog streaming

2020-06-02 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-19926.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master. Thanks Ashutosh for reviving this patch and uploading it 
for tests. Thanks Zoltan for ptest run and review. 

> Remove deprecated hcatalog streaming
> 
>
> Key: HIVE-19926
> URL: https://issues.apache.org/jira/browse/HIVE-19926
> Project: Hive
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-19926.1.patch, HIVE-19926.2.patch, 
> HIVE-19926.3.patch, HIVE-19926.4.patch, HIVE-19926.5.patch, HIVE-19926.6.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> hcatalog streaming is deprecated in 3.0.0. We should remove it in 4.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21624) LLAP: Cpu metrics at thread level is broken

2020-06-01 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21624:
-
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Ashutosh for the review!

> LLAP: Cpu metrics at thread level is broken
> ---
>
> Key: HIVE-21624
> URL: https://issues.apache.org/jira/browse/HIVE-21624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21624.1.patch, HIVE-21624.2.patch, 
> HIVE-21624.3.patch, HIVE-21624.4.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu 
> metrics when available. At some point, the thread name which the metrics 
> publisher looks for has changed causing no metrics to be published for these 
> counters.  
> The above counters looks for thread with name starting with 
> "ContainerExecutor" but the llap task executor thread got changed to 
> "Task-Executor"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21624) LLAP: Cpu metrics at thread level is broken

2020-05-29 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21624:
-
Attachment: HIVE-21624.4.patch

> LLAP: Cpu metrics at thread level is broken
> ---
>
> Key: HIVE-21624
> URL: https://issues.apache.org/jira/browse/HIVE-21624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21624.1.patch, HIVE-21624.2.patch, 
> HIVE-21624.3.patch, HIVE-21624.4.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu 
> metrics when available. At some point, the thread name which the metrics 
> publisher looks for has changed causing no metrics to be published for these 
> counters.  
> The above counters looks for thread with name starting with 
> "ContainerExecutor" but the llap task executor thread got changed to 
> "Task-Executor"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable

2020-05-29 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120032#comment-17120032
 ] 

Prasanth Jayachandran commented on HIVE-23582:
--

[~hashutosh] [~gopalv] could you please help review this change?

> LLAP: Make SplitLocationProvider impl pluggable
> ---
>
> Key: HIVE-23582
> URL: https://issues.apache.org/jira/browse/HIVE-23582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23582.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP uses HostAffinitySplitLocationProvider implementation by default. For 
> non zookeeper based environments, a different split location provider may be 
> used. To facilitate that make the SplitLocationProvider implementation class 
> a pluggable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable

2020-05-29 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23582:
-
Status: Patch Available  (was: Open)

> LLAP: Make SplitLocationProvider impl pluggable
> ---
>
> Key: HIVE-23582
> URL: https://issues.apache.org/jira/browse/HIVE-23582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23582.1.patch
>
>
> LLAP uses HostAffinitySplitLocationProvider implementation by default. For 
> non zookeeper based environments, a different split location provider may be 
> used. To facilitate that make the SplitLocationProvider implementation class 
> a pluggable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable

2020-05-29 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23582:
-
Attachment: HIVE-23582.1.patch

> LLAP: Make SplitLocationProvider impl pluggable
> ---
>
> Key: HIVE-23582
> URL: https://issues.apache.org/jira/browse/HIVE-23582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23582.1.patch
>
>
> LLAP uses HostAffinitySplitLocationProvider implementation by default. For 
> non zookeeper based environments, a different split location provider may be 
> used. To facilitate that make the SplitLocationProvider implementation class 
> a pluggable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable

2020-05-29 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-23582:



> LLAP: Make SplitLocationProvider impl pluggable
> ---
>
> Key: HIVE-23582
> URL: https://issues.apache.org/jira/browse/HIVE-23582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> LLAP uses HostAffinitySplitLocationProvider implementation by default. For 
> non zookeeper based environments, a different split location provider may be 
> used. To facilitate that make the SplitLocationProvider implementation class 
> a pluggable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23068) Error when submitting fragment to LLAP via external client: IllegalStateException: Only a single registration allowed per entity

2020-05-29 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119742#comment-17119742
 ] 

Prasanth Jayachandran commented on HIVE-23068:
--

lgtm, +1.
{quote}(for example speculative execution of a query fragment).
{quote}
Can external clients with speculative execution generate the same fragment id? 
Not sure how external clients generates the full id but I would expect it to 
have different attempt numbers atleast just so that the different attempts does 
not step on each other during speculative execution. 

> Error when submitting fragment to LLAP via external client: 
> IllegalStateException: Only a single registration allowed per entity
> 
>
> Key: HIVE-23068
> URL: https://issues.apache.org/jira/browse/HIVE-23068
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-23068.1.patch
>
>
> LLAP external client (via hive-warehouse-connector) somehow seems to be 
> sending duplicate submissions for the same fragment/attempt. When the 2nd 
> request is sent this results in the following error:
> {noformat}
> 2020-03-17T06:49:11,239 WARN  [IPC Server handler 2 on 15001 ()] 
> org.apache.hadoop.ipc.Server: IPC Server handler 2 on 15001, call Call#75 
> Retry#0 
> org.apache.hadoop.hive.llap.protocol.LlapProtocolBlockingPB.submitWork from 
> 19.40.252.114:33906
> java.lang.IllegalStateException: Only a single registration allowed per 
> entity. Duplicate for 
> TaskWrapper{task=attempt_1854104024183112753_6052_0_00_000128_1, 
> inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, 
> canFinish=true, canFinish(in queue)=true, isGuaranteed=false, 
> firstAttemptStartTime=1584442003327, dagStartTime=1584442003327, 
> withinDagPriority=0, vertexParallelism= 2132, selfAndUpstreamParallelism= 
> 2132, selfAndUpstreamComplete= 0}
> at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.registerForUpdates(QueryInfo.java:233)
>  ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.registerForFinishableStateUpdates(QueryInfo.java:205)
>  ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryFragmentInfo.registerForFinishableStateUpdates(QueryFragmentInfo.java:160)
>  ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.maybeRegisterForFinishedStateNotifications(TaskExecutorService.java:1167)
>  ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.schedule(TaskExecutorService.java:564)
>  ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.schedule(TaskExecutorService.java:93)
>  ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:292)
>  ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:610)
>  ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:122)
>  ~[hive-llap-server-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.26-3]
> at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:22695)
>  ~[hive-exec-3.1.0.3.1.4.26-3.jar:3.1.0.3.1.4.32-1]
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?]
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) 
> ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?]
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) 
> ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?]
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) 
> ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_191]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_191]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?]
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) 
> ~[hadoop-common-3.1.1.3.1.4.26-3.jar:?]
> {noformat}
> I think the issue here is that this error o

[jira] [Commented] (HIVE-21624) LLAP: Cpu metrics at thread level is broken

2020-05-21 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113529#comment-17113529
 ] 

Prasanth Jayachandran commented on HIVE-21624:
--

This feature still needs JDK support for thread CPU metrics. 

> LLAP: Cpu metrics at thread level is broken
> ---
>
> Key: HIVE-21624
> URL: https://issues.apache.org/jira/browse/HIVE-21624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21624.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu 
> metrics when available. At some point, the thread name which the metrics 
> publisher looks for has changed causing no metrics to be published for these 
> counters.  
> The above counters looks for thread with name starting with 
> "ContainerExecutor" but the llap task executor thread got changed to 
> "Task-Executor"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21624) LLAP: Cpu metrics at thread level is broken

2020-05-21 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21624:
-
Attachment: HIVE-21624.1.patch

> LLAP: Cpu metrics at thread level is broken
> ---
>
> Key: HIVE-21624
> URL: https://issues.apache.org/jira/browse/HIVE-21624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21624.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu 
> metrics when available. At some point, the thread name which the metrics 
> publisher looks for has changed causing no metrics to be published for these 
> counters.  
> The above counters looks for thread with name starting with 
> "ContainerExecutor" but the llap task executor thread got changed to 
> "Task-Executor"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21624) LLAP: Cpu metrics at thread level is broken

2020-05-21 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21624:
-
Status: Patch Available  (was: Open)

> LLAP: Cpu metrics at thread level is broken
> ---
>
> Key: HIVE-21624
> URL: https://issues.apache.org/jira/browse/HIVE-21624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21624.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ExecutorThreadCPUTime and ExecutorThreadUserTime relies on thread mx bean cpu 
> metrics when available. At some point, the thread name which the metrics 
> publisher looks for has changed causing no metrics to be published for these 
> counters.  
> The above counters looks for thread with name starting with 
> "ContainerExecutor" but the llap task executor thread got changed to 
> "Task-Executor"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23477) LLAP : mmap allocation interruptions fails to notify other threads

2020-05-21 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23477:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master. Thanks Gopal for the review!

> LLAP : mmap allocation interruptions fails to notify other threads
> --
>
> Key: HIVE-23477
> URL: https://issues.apache.org/jira/browse/HIVE-23477
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23477.1.patch, HIVE-23477.2.patch, 
> HIVE-23477.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> BuddyAllocator always uses lazy allocation if mmap is enabled. If query 
> fragment is interrupted at the time of arena allocation, 
> ClosedByInterruptionException is thrown. This exception artificially triggers 
> allocator OutOfMemoryError and fails to notify other threads waiting to 
> allocate arenas. 
> {code:java}
> 2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
> trying to allocate memory mapped arena
> java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
> at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.r

[jira] [Commented] (HIVE-23500) [Kubernetes] Use Extend NodeId for LLAP registration

2020-05-19 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110955#comment-17110955
 ] 

Prasanth Jayachandran commented on HIVE-23500:
--

HIVE-23466 is the same?

> [Kubernetes] Use Extend NodeId for LLAP registration
> 
>
> Key: HIVE-23500
> URL: https://issues.apache.org/jira/browse/HIVE-23500
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> In kubernetes environment where pods can have same host name and port, there 
> can be situations where node trackers could be retaining old instance of the 
> pod in its cache. In case of Hive LLAP, where the llap tez task scheduler 
> maintains the membership of nodes based on zookeeper registry events there 
> can be cases where NODE_ADDED followed by NODE_REMOVED event could end up 
> removing the node/host from node trackers because of stable hostname and 
> service port. The NODE_REMOVED event in this case is old stale event of the 
> already dead pod but ZK will send only after session timeout (in case of 
> non-graceful shutdown). If this sequence of events happen, a node/host is 
> completely lost form the schedulers perspective. 
> To support this scenario, tez can extend yarn's NodeId to include 
> uniqueIdentifier. Llap task scheduler can construct the container object with 
> this new NodeId that includes uniqueIdentifier as well so that stale events 
> like above will only remove the host/node that matches the old 
> uniqueIdentifier. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23466) ZK registry base should remove only specific instance instead of host

2020-05-18 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23466:
-
Attachment: HIVE-23466.1.patch

> ZK registry base should remove only specific instance instead of host
> -
>
> Key: HIVE-23466
> URL: https://issues.apache.org/jira/browse/HIVE-23466
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23466.1.patch
>
>
> When ZKRegistryBase detects new ZK nodes it maintains path based cache and 
> host based cache. The host based cached already handles multiple instances 
> running in same host. But even if single instance is removed all instances 
> belonging to the host are removed. 
> Another issue is that, if single host has multiple instances it returns a Set 
> with no ordering. Ideally, we want the newest instance to be top of the set 
> (use TreeSet maybe?). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23466) ZK registry base should remove only specific instance instead of host

2020-05-18 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110708#comment-17110708
 ] 

Prasanth Jayachandran commented on HIVE-23466:
--

This patch requires TEZ-4179 and new tez release to make use of ExtendedNodeId 
API. 

> ZK registry base should remove only specific instance instead of host
> -
>
> Key: HIVE-23466
> URL: https://issues.apache.org/jira/browse/HIVE-23466
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23466.1.patch
>
>
> When ZKRegistryBase detects new ZK nodes it maintains path based cache and 
> host based cache. The host based cached already handles multiple instances 
> running in same host. But even if single instance is removed all instances 
> belonging to the host are removed. 
> Another issue is that, if single host has multiple instances it returns a Set 
> with no ordering. Ideally, we want the newest instance to be top of the set 
> (use TreeSet maybe?). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23477) LLAP : mmap allocation interruptions fails to notify other threads

2020-05-17 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23477:
-
Attachment: HIVE-23477.3.patch

> LLAP : mmap allocation interruptions fails to notify other threads
> --
>
> Key: HIVE-23477
> URL: https://issues.apache.org/jira/browse/HIVE-23477
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23477.1.patch, HIVE-23477.2.patch, 
> HIVE-23477.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> BuddyAllocator always uses lazy allocation if mmap is enabled. If query 
> fragment is interrupted at the time of arena allocation, 
> ClosedByInterruptionException is thrown. This exception artificially triggers 
> allocator OutOfMemoryError and fails to notify other threads waiting to 
> allocate arenas. 
> {code:java}
> 2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
> trying to allocate memory mapped arena
> java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
> at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.

[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23443:
-
Attachment: HIVE-23443.3.patch

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch, 
> HIVE-23443.3.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-15 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108798#comment-17108798
 ] 

Prasanth Jayachandran commented on HIVE-23443:
--

[~pgaref] non-finishable to finishable is not a problem. But there is concern 
in the line that you pinged in PR that double/multiple addition could be 
possible with pre-emption queue and I was able to unit test it. Could you look 
at the diff in PR again?

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23477:
-
Description: 
BuddyAllocator always uses lazy allocation if mmap is enabled. If query 
fragment is interrupted at the time of arena allocation 
ClosedByInterruptionException is thrown. This exception artificially triggers 
allocator OutOfMemoryError and fails to notify other threads waiting to 
allocate arenas. 
{code:java}
2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
trying to allocate memory mapped arena
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
at 
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingT

[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23477:
-
Description: 
BuddyAllocator always uses lazy allocation if mmap is enabled. If query 
fragment is interrupted at the time of arena allocation, 
ClosedByInterruptionException is thrown. This exception artificially triggers 
allocator OutOfMemoryError and fails to notify other threads waiting to 
allocate arenas. 
{code:java}
2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
trying to allocate memory mapped arena
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
at 
org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
at 
org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
at 
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecording

[jira] [Commented] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads

2020-05-15 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108776#comment-17108776
 ] 

Prasanth Jayachandran commented on HIVE-23477:
--

[~ashutoshc] / [~gopalv] can you please help review this change?

> [LLAP] mmap allocation interruptions fails to notify other threads
> --
>
> Key: HIVE-23477
> URL: https://issues.apache.org/jira/browse/HIVE-23477
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23477.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> BuddyAllocator always uses lazy allocation is mmap is enabled. If query 
> fragment is interrupted at the time of arena allocation 
> ClosedByInterruptionException is thrown. This exception artificially triggers 
> allocator OutOfMemoryError and fails to notify other threads waiting to 
> allocate arenas. 
> {code:java}
> 2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
> trying to allocate memory mapped arena
> java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
> at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)

[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23477:
-
Attachment: (was: HIVE-23476.1.patch)

> [LLAP] mmap allocation interruptions fails to notify other threads
> --
>
> Key: HIVE-23477
> URL: https://issues.apache.org/jira/browse/HIVE-23477
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23477.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> BuddyAllocator always uses lazy allocation is mmap is enabled. If query 
> fragment is interrupted at the time of arena allocation 
> ClosedByInterruptionException is thrown. This exception artificially triggers 
> allocator OutOfMemoryError and fails to notify other threads waiting to 
> allocate arenas. 
> {code:java}
> 2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
> trying to allocate memory mapped arena
> java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
> at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRun

[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23477:
-
Attachment: HIVE-23477.1.patch

> [LLAP] mmap allocation interruptions fails to notify other threads
> --
>
> Key: HIVE-23477
> URL: https://issues.apache.org/jira/browse/HIVE-23477
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23477.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> BuddyAllocator always uses lazy allocation is mmap is enabled. If query 
> fragment is interrupted at the time of arena allocation 
> ClosedByInterruptionException is thrown. This exception artificially triggers 
> allocator OutOfMemoryError and fails to notify other threads waiting to 
> allocate arenas. 
> {code:java}
> 2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
> trying to allocate memory mapped arena
> java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
> at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callabl

[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23477:
-
Status: Patch Available  (was: Open)

> [LLAP] mmap allocation interruptions fails to notify other threads
> --
>
> Key: HIVE-23477
> URL: https://issues.apache.org/jira/browse/HIVE-23477
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23476.1.patch
>
>
> BuddyAllocator always uses lazy allocation is mmap is enabled. If query 
> fragment is interrupted at the time of arena allocation 
> ClosedByInterruptionException is thrown. This exception artificially triggers 
> allocator OutOfMemoryError and fails to notify other threads waiting to 
> allocate arenas. 
> {code:java}
> 2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
> trying to allocate memory mapped arena
> java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
> at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2C

[jira] [Updated] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23477:
-
Attachment: HIVE-23476.1.patch

> [LLAP] mmap allocation interruptions fails to notify other threads
> --
>
> Key: HIVE-23477
> URL: https://issues.apache.org/jira/browse/HIVE-23477
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23476.1.patch
>
>
> BuddyAllocator always uses lazy allocation is mmap is enabled. If query 
> fragment is interrupted at the time of arena allocation 
> ClosedByInterruptionException is thrown. This exception artificially triggers 
> allocator OutOfMemoryError and fails to notify other threads waiting to 
> allocate arenas. 
> {code:java}
> 2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
> trying to allocate memory mapped arena
> java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
> at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callabl

[jira] [Commented] (HIVE-23476) [LLAP] Preallocate arenas for mmap case as well

2020-05-15 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108739#comment-17108739
 ] 

Prasanth Jayachandran commented on HIVE-23476:
--

[~hashutosh]/[~gopalv] can you please review the change?

> [LLAP] Preallocate arenas for mmap case as well
> ---
>
> Key: HIVE-23476
> URL: https://issues.apache.org/jira/browse/HIVE-23476
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23476.1.patch
>
>
> BuddyAllocator pre-allocation of arenas does not happen for mmap cache case. 
> Since we are not filling up the mmap'ed buffers the upfront allocations in 
> constructor is cheap. This can avoid lock free allocation of arenas later in 
> the code. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23476) [LLAP] Preallocate arenas for mmap case as well

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23476:
-
Status: Patch Available  (was: Open)

> [LLAP] Preallocate arenas for mmap case as well
> ---
>
> Key: HIVE-23476
> URL: https://issues.apache.org/jira/browse/HIVE-23476
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23476.1.patch
>
>
> BuddyAllocator pre-allocation of arenas does not happen for mmap cache case. 
> Since we are not filling up the mmap'ed buffers the upfront allocations in 
> constructor is cheap. This can avoid lock free allocation of arenas later in 
> the code. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23476) [LLAP] Preallocate arenas for mmap case as well

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23476:
-
Attachment: HIVE-23476.1.patch

> [LLAP] Preallocate arenas for mmap case as well
> ---
>
> Key: HIVE-23476
> URL: https://issues.apache.org/jira/browse/HIVE-23476
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23476.1.patch
>
>
> BuddyAllocator pre-allocation of arenas does not happen for mmap cache case. 
> Since we are not filling up the mmap'ed buffers the upfront allocations in 
> constructor is cheap. This can avoid lock free allocation of arenas later in 
> the code. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23477) [LLAP] mmap allocation interruptions fails to notify other threads

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-23477:



> [LLAP] mmap allocation interruptions fails to notify other threads
> --
>
> Key: HIVE-23477
> URL: https://issues.apache.org/jira/browse/HIVE-23477
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> BuddyAllocator always uses lazy allocation is mmap is enabled. If query 
> fragment is interrupted at the time of arena allocation 
> ClosedByInterruptionException is thrown. This exception artificially triggers 
> allocator OutOfMemoryError and fails to notify other threads waiting to 
> allocate arenas. 
> {code:java}
> 2020-05-15 00:03:23.254  WARN [TezTR-128417_1_3_1_1_0] LlapIoImpl: Failed 
> trying to allocate memory mapped arena
> java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:970)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.preallocateArenaBuffer(BuddyAllocator.java:867)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.access$1100(BuddyAllocator.java:69)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.init(BuddyAllocator.java:900)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.allocateWithExpand(BuddyAllocator.java:1458)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator$Arena.access$800(BuddyAllocator.java:884)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateWithExpand(BuddyAllocator.java:740)
> at 
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(BuddyAllocator.java:330)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.wrapBbForFile(MetadataCache.java:257)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:216)
> at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:49)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:343)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:238)
> at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.(VectorizedParquetRecordReader.java:160)
> at 
> org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:427)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:82)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:532)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:178)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
> at java.security.AccessController.doPrivileged(Nati

[jira] [Assigned] (HIVE-23476) [LLAP] Preallocate arenas for mmap case as well

2020-05-15 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-23476:



> [LLAP] Preallocate arenas for mmap case as well
> ---
>
> Key: HIVE-23476
> URL: https://issues.apache.org/jira/browse/HIVE-23476
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> BuddyAllocator pre-allocation of arenas does not happen for mmap cache case. 
> Since we are not filling up the mmap'ed buffers the upfront allocations in 
> constructor is cheap. This can avoid lock free allocation of arenas later in 
> the code. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-14 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107565#comment-17107565
 ] 

Prasanth Jayachandran commented on HIVE-23443:
--

I was able to repro the issue with unit test. Included that in .2 patch.

[~pgaref] The guaranteed updates is hairy piece to touch for now, so not doing 
it in this ticket. 

.2 patch is same as .1 with added junit tests. Could you please take a look?

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch
>
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-14 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23443:
-
Attachment: HIVE-23443.2.patch

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch
>
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-14 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107490#comment-17107490
 ] 

Prasanth Jayachandran commented on HIVE-23443:
--

Created HIVE-23472 to handle the guaranteed state update which is tied to WLM. 
For now keeping the WLM issue separate and will be handled in HIVE-23472. 

In this ticket I will specifically handle the finishable state updates. Will 
add more unit tests to the .1 patch. 

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23443.1.patch
>
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23466) ZK registry base should remove only specific instance instead of host

2020-05-13 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-23466:



> ZK registry base should remove only specific instance instead of host
> -
>
> Key: HIVE-23466
> URL: https://issues.apache.org/jira/browse/HIVE-23466
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> When ZKRegistryBase detects new ZK nodes it maintains path based cache and 
> host based cache. The host based cached already handles multiple instances 
> running in same host. But even if single instance is removed all instances 
> belonging to the host are removed. 
> Another issue is that, if single host has multiple instances it returns a Set 
> with no ordering. Ideally, we want the newest instance to be top of the set 
> (use TreeSet maybe?). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-11 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105031#comment-17105031
 ] 

Prasanth Jayachandran commented on HIVE-23443:
--

[~gopalv]/[~pgaref] I simplified logic of pre-emption queue handling to the 
following 2 conditions

1) If guaranteed or finishable, the task should not be in pre-emption queue

2) if speculative or non-finishable, the task should be in pre-emption queue

I hope I am not missing any other conditions. Could you please take another 
look?

[~pgaref] i changed the test cases based on the above conditions. Let me know 
if I missed any case. 

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23443.1.patch
>
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-11 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104926#comment-17104926
 ] 

Prasanth Jayachandran commented on HIVE-23443:
--

Good catch. I will update the PR and pull in the test case. Thanks!

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23443.1.patch
>
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-11 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104868#comment-17104868
 ] 

Prasanth Jayachandran commented on HIVE-23443:
--

The patch is still pending testing with some workloads where the issue is 
reproducible. I will update here once it is done. The patch is ready for review 
though. 

cc/ [~gopalv] [~rbalamohan] [~pgaref]

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23443.1.patch
>
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-11 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23443:
-
Status: Patch Available  (was: Open)

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23443.1.patch
>
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-11 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23443:
-
Attachment: HIVE-23443.1.patch

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-23443.1.patch
>
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-11 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23443:
-
Description: 
I think after HIVE-23210 we are getting a stable sort order and it is causing 
pre-emption to not work in certain cases.
{code:java}
"attempt_1589167813851__119_01_08_0 
(hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started at 
2020-05-11 05:59:22, in preemption queue, can finish)", 
"attempt_1589167813851_0008_84_01_08_1 
(hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started at 
2020-05-11 06:00:23, in preemption queue, can finish)" {code}
Scheduler only peek's at the pre-emption queue and looks at whether it is 
non-finishable. 

[https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]

In the above case, all tasks are speculative but state change is not triggering 
pre-emption queue re-ordering so peek() always returns canFinish task even 
though non-finishable tasks are in the queue. 

  was:
I think after HIVE-23210 we are getting a stable sort order in pre-emption 
queue and it is causing pre-emption to not work in certain cases.
{code:java}
"attempt_1589167813851__119_01_08_0 
(hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started at 
2020-05-11 05:59:22, in preemption queue, can finish)", 
"attempt_1589167813851_0008_84_01_08_1 
(hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started at 
2020-05-11 06:00:23, in preemption queue, can finish)" {code}
Scheduler only peek's at the pre-emption queue and looks at whether it is 
non-finishable. 

[https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]

In the above case, all tasks are speculative but state change is not triggering 
pre-emption queue re-ordering so peek() always returns canFinish task even 
though non-finishable tasks are in the queue. 


> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> I think after HIVE-23210 we are getting a stable sort order and it is causing 
> pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23443) LLAP speculative task pre-emption seems to be not working

2020-05-11 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-23443:


Assignee: Prasanth Jayachandran

> LLAP speculative task pre-emption seems to be not working
> -
>
> Key: HIVE-23443
> URL: https://issues.apache.org/jira/browse/HIVE-23443
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> I think after HIVE-23210 we are getting a stable sort order in pre-emption 
> queue and it is causing pre-emption to not work in certain cases.
> {code:java}
> "attempt_1589167813851__119_01_08_0 
> (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started 
> at 2020-05-11 05:59:22, in preemption queue, can finish)", 
> "attempt_1589167813851_0008_84_01_08_1 
> (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started 
> at 2020-05-11 06:00:23, in preemption queue, can finish)" {code}
> Scheduler only peek's at the pre-emption queue and looks at whether it is 
> non-finishable. 
> [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420]
> In the above case, all tasks are speculative but state change is not 
> triggering pre-emption queue re-ordering so peek() always returns canFinish 
> task even though non-finishable tasks are in the queue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23441) Support foreground option for running llap scripts

2020-05-11 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-23441:



> Support foreground option for running llap scripts
> --
>
> Key: HIVE-23441
> URL: https://issues.apache.org/jira/browse/HIVE-23441
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> Llap scripts are always running in background. To make it container friendly, 
> support foreground execution of the script as an option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23151) LLAP: default hive.llap.file.cleanup.delay.seconds=0s

2020-04-07 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077453#comment-17077453
 ] 

Prasanth Jayachandran commented on HIVE-23151:
--

+1, pending test. 

> LLAP: default hive.llap.file.cleanup.delay.seconds=0s
> -
>
> Key: HIVE-23151
> URL: https://issues.apache.org/jira/browse/HIVE-23151
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23151.01.patch
>
>
> The current default value (300s) reflects more a debugging scenario, let's 
> set this to 0s in order to make shuffle local files be cleaned up immediately 
> after dag complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23111) MsckPartitionExpressionProxy should filter partitions

2020-04-06 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076788#comment-17076788
 ] 

Prasanth Jayachandran commented on HIVE-23111:
--

+1, pending tests

> MsckPartitionExpressionProxy should filter partitions
> -
>
> Key: HIVE-23111
> URL: https://issues.apache.org/jira/browse/HIVE-23111
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
> Attachments: Hive-23111.1.patch, Hive-23111.2.patch, 
> Hive-23111.3.patch, Hive-23111.4.patch
>
>
> Currently MsckPartitionExpressionProxy does not filter partition names, this 
> causes problem for partition auto discovery. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23144) LLAP: Let QueryTracker cleanup on serviceStop

2020-04-06 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076528#comment-17076528
 ] 

Prasanth Jayachandran commented on HIVE-23144:
--

+1, pending tests

> LLAP: Let QueryTracker cleanup on serviceStop
> -
>
> Key: HIVE-23144
> URL: https://issues.apache.org/jira/browse/HIVE-23144
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: HIVE-23144.01.patch
>
>
> QueryTracker's executor service basically runs cleanup tasks: 
> ExternalQueryCleanerCallable, DagMapCleanerCallable, FileCleanerCallable. 
> Changing the shutdown behavior to 
> [.shutdown()|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdown--]
>  from 
> [.shutdownNow()|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdownNow--]
>  would let QueryTracker to cleanup its garbage, for example shuffle local 
> files:
> https://github.com/apache/hive/blob/c3ec20dd4f5b5fbde4007041844f6aed8c262ca1/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java#L440



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23144) LLAP: Let QueryTracker cleanup on serviceStop

2020-04-06 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076499#comment-17076499
 ] 

Prasanth Jayachandran commented on HIVE-23144:
--

The interrupted exception e is being dropped. can you log the interrupted 
exception as well? To know the stacktrace that caused interrupted exception. 

 

> LLAP: Let QueryTracker cleanup on serviceStop
> -
>
> Key: HIVE-23144
> URL: https://issues.apache.org/jira/browse/HIVE-23144
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
> Attachments: HIVE-23144.01.patch
>
>
> QueryTracker's executor service basically runs cleanup tasks: 
> ExternalQueryCleanerCallable, DagMapCleanerCallable, 
> FileCleanerCallable...changing the shutdown behavior to 
> [.shutdown()|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdown--]
>  from 
> [.shutdownNow()|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdownNow--]
>  would let QueryTracker to cleanup it's garbage, for example shuffle local 
> files:
> https://github.com/apache/hive/blob/c3ec20dd4f5b5fbde4007041844f6aed8c262ca1/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java#L440



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23095) NDV might be overestimated for a table with ~70 value

2020-04-06 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076480#comment-17076480
 ] 

Prasanth Jayachandran commented on HIVE-23095:
--

I think fastutil was deliberately not included in the past as it is 18+MB jar.

Can you use Map instead?

Also can you also include merge() performance to JMH as it is also in the 
critical hot path?  

> NDV might be overestimated for a table with ~70 value
> -
>
> Key: HIVE-23095
> URL: https://issues.apache.org/jira/browse/HIVE-23095
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23095.01.patch, HIVE-23095.02.patch, 
> HIVE-23095.03.patch, HIVE-23095.04.patch, HIVE-23095.04.patch, 
> HIVE-23095.04.patch, HIVE-23095.05.patch, hll-bench.md
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> uncovered during looking into HIVE-23082
> https://issues.apache.org/jira/browse/HIVE-23082?focusedCommentId=17067773&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17067773



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters

2020-04-03 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23118:
-
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> Option for exposing compile time counters as tez counters
> -
>
> Key: HIVE-23118
> URL: https://issues.apache.org/jira/browse/HIVE-23118
> Project: Hive
>  Issue Type: Improvement
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23118.1.patch, HIVE-23118.2.patch, 
> HIVE-23118.3.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TezCounters currently are runtime only. Some compile time information from 
> optimizer can be exposed as counters which can then be used by workload 
> management to make runtime decisions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23082) PK/FK stat rescale doesn't work in some cases

2020-04-03 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074992#comment-17074992
 ] 

Prasanth Jayachandran commented on HIVE-23082:
--

Left a comment in github PR. I think we should remove tempList from sparse 
register and directly insert to sparsemap, so that getSize() becomes constant 
time operation and less branching/branch misses. 

> PK/FK stat rescale doesn't work in some cases
> -
>
> Key: HIVE-23082
> URL: https://issues.apache.org/jira/browse/HIVE-23082
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-23082.01.patch, HIVE-23082.02.patch, 
> HIVE-23082.03.patch, HIVE-23082.03.patch, HIVE-23082.03.patch, 
> HIVE-23082.03.patch
>
>
> As a result in Joins may retain the original estimate; see MAPJOIN_33 in this 
> plan ; which retained the estimate of SEL_32
> {code}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Map 1 <- Map 2 (BROADCAST_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:12   |
> | Stage-1|
> |   Map 1 vectorized |
> |   File Output Operator [FS_36] |
> | Limit [LIM_35] (rows=12 width=4)   |
> |   Number of rows:12|
> |   Select Operator [SEL_34] (rows=5040 width=4) |
> | Output:["_col0"]   |
> | Map Join Operator [MAPJOIN_33] (rows=5040 width=8) |
> |   Conds:SEL_32._col0=RS_30._col0(Inner) |
> | <-Map 2 [BROADCAST_EDGE] vectorized|
> |   BROADCAST [RS_30]|
> | PartitionCols:_col0|
> | Select Operator [SEL_29] (rows=1 width=8) |
> |   Output:["_col0"] |
> |   Filter Operator [FIL_28] (rows=1 width=108) |
> | predicate:((r_reason_id = 'reason 66') and r_reason_sk 
> is not null) |
> | TableScan [TS_3] (rows=2 width=108) |
> |   
> default@rx0,reason,Tbl:COMPLETE,Col:COMPLETE,Output:["r_reason_id","r_reason_sk"]
>  |
> | <-Select Operator [SEL_32] (rows=5040 width=7) |
> | Output:["_col0"]   |
> | Filter Operator [FIL_31] (rows=5040 width=7) |
> |   predicate:sr_reason_sk is not null |
> |   TableScan [TS_0] (rows=5112 width=7) |
> | 
> default@sr0,store_returns,Tbl:COMPLETE,Col:COMPLETE,Output:["sr_reason_sk"] |
> ||
> ++
> {code}
> repro:
> {code}
> set hive.query.results.cache.enabled=false;
> set hive.explain.user=true;
> drop table if exists default.rx0;
> drop table if exists default.sr0;
> create table rx0 (r_reason_id string, r_reason_sk bigint);
> create table sr0 (sr_reason_sk bigint);
> insert into rx0 values ('',1),('GEAA',70);
> insert into sr0 values (NULL),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
> (11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),
> (26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),
> (41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),
> (56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70);
> insert into sr0 select a.* from sr0 a,sr0 b;
> -- |sr0| ~ 5112
> explain select 1
> from default.sr0  store_returns , default.rx0 reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_id = 'reason 66'
> limit 12;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters

2020-04-02 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23118:
-
Attachment: HIVE-23118.3.patch

> Option for exposing compile time counters as tez counters
> -
>
> Key: HIVE-23118
> URL: https://issues.apache.org/jira/browse/HIVE-23118
> Project: Hive
>  Issue Type: Improvement
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-23118.1.patch, HIVE-23118.2.patch, 
> HIVE-23118.3.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TezCounters currently are runtime only. Some compile time information from 
> optimizer can be exposed as counters which can then be used by workload 
> management to make runtime decisions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters

2020-04-02 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23118:
-
Attachment: HIVE-23118.2.patch

> Option for exposing compile time counters as tez counters
> -
>
> Key: HIVE-23118
> URL: https://issues.apache.org/jira/browse/HIVE-23118
> Project: Hive
>  Issue Type: Improvement
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-23118.1.patch, HIVE-23118.2.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TezCounters currently are runtime only. Some compile time information from 
> optimizer can be exposed as counters which can then be used by workload 
> management to make runtime decisions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23110) Prevent NPE in ReExecDriver if the processing is aborted

2020-04-01 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072509#comment-17072509
 ] 

Prasanth Jayachandran commented on HIVE-23110:
--

I have partial logs
{code:java}
hiveserver2 <14>1 2020-03-31T20:52:24.702Z 
hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local 
hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="ql.Driver" 
level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" 
sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" 
thread="HiveServer2-Background-Pool: Thread-74"] Executing 
command(queryId=hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e) has 
been interrupted after 133.75 seconds
hiveserver2 <14>1 2020-03-31T20:52:24.702Z 
hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local 
hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 class="ql.Driver" 
level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" 
sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" 
thread="HiveServer2-Background-Pool: Thread-74"] OK
hiveserver2 <15>1 2020-03-31T20:52:24.702Z 
hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local 
hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 
class="log.PerfLogger" level="DEBUG" operationLogLevel="EXECUTION" 
queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" 
sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" 
thread="HiveServer2-Background-Pool: Thread-74"] 
hiveserver2 <14>1 2020-03-31T20:52:24.711Z 
hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local 
hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 
class="common.LogUtils" level="INFO" thread="HiveServer2-Background-Pool: 
Thread-74"] Unregistered logging context.
hiveserver2 <14>1 2020-03-31T20:52:24.702Z 
hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local 
hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 
class="lockmgr.DbLockManager" level="INFO" operationLogLevel="EXECUTION" 
queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" 
sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" 
thread="HiveServer2-Background-Pool: Thread-74"] releaseLocks: 
hiveserver2 <15>1 2020-03-31T20:52:24.703Z 
hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local 
hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 
class="log.PerfLogger" level="DEBUG" operationLogLevel="EXECUTION" 
queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" 
sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" 
thread="HiveServer2-Background-Pool: Thread-74"] 
hiveserver2 <11>1 2020-03-31T20:52:24.711Z 
hiveserver2-0.hiveserver2-service.compute-1585643974-lwrg.svc.cluster.local 
hiveserver2 1 6ba03ff1-251f-4878-81ea-1ba72d36c465 [mdc@18060 
class="operation.Operation" level="ERROR" operationLogLevel="EXECUTION" 
queryId="hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e" 
sessionId="94e0ab1a-e5ca-4237-9713-235b5dd2559a" 
thread="HiveServer2-Background-Pool: Thread-74"] Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: Illegal Operation state 
transition from CANCELED to FINISHED
at 
org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:97)
at 
org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:103)
at 
org.apache.hive.service.cli.operation.Operation.setState(Operation.java:161)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:248)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
hiveserver2 2020-03-31 20:52:24,710 
Log4j2-TF-1-AsyncLogger[AsyncContext@18b4aac2]-1 ERROR 
/tmp/hive/operation_logs/94e0ab1a-e5ca-4237-9713-235b5dd2559a/hive_20200331205007_6397e486-03a9-41ec-a56b-e0c4ff1ff26e
 w

[jira] [Commented] (HIVE-23118) Option for exposing compile time counters as tez counters

2020-03-31 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072409#comment-17072409
 ] 

Prasanth Jayachandran commented on HIVE-23118:
--

[~Sreenath] These are hive side counters merged with dag counters on the client 
side. These counters will be added to any tez task during hive query 
compilation. I don't think this will be available at tez side as it does not 
attach to any tez context. It will accessible to hive hooks though (hive proto 
hook can dump it). 

> Option for exposing compile time counters as tez counters
> -
>
> Key: HIVE-23118
> URL: https://issues.apache.org/jira/browse/HIVE-23118
> Project: Hive
>  Issue Type: Improvement
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-23118.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TezCounters currently are runtime only. Some compile time information from 
> optimizer can be exposed as counters which can then be used by workload 
> management to make runtime decisions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters

2020-03-31 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23118:
-
Status: Patch Available  (was: Open)

> Option for exposing compile time counters as tez counters
> -
>
> Key: HIVE-23118
> URL: https://issues.apache.org/jira/browse/HIVE-23118
> Project: Hive
>  Issue Type: Improvement
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-23118.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TezCounters currently are runtime only. Some compile time information from 
> optimizer can be exposed as counters which can then be used by workload 
> management to make runtime decisions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23118) Option for exposing compile time counters as tez counters

2020-03-31 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-23118:
-
Attachment: HIVE-23118.1.patch

> Option for exposing compile time counters as tez counters
> -
>
> Key: HIVE-23118
> URL: https://issues.apache.org/jira/browse/HIVE-23118
> Project: Hive
>  Issue Type: Improvement
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-23118.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TezCounters currently are runtime only. Some compile time information from 
> optimizer can be exposed as counters which can then be used by workload 
> management to make runtime decisions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23118) Option for exposing compile time counters as tez counters

2020-03-31 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-23118:



> Option for exposing compile time counters as tez counters
> -
>
> Key: HIVE-23118
> URL: https://issues.apache.org/jira/browse/HIVE-23118
> Project: Hive
>  Issue Type: Improvement
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>
> TezCounters currently are runtime only. Some compile time information from 
> optimizer can be exposed as counters which can then be used by workload 
> management to make runtime decisions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >