[jira] [Resolved] (HIVE-26677) Constrain available processors to Jetty during test runs to prevent thread exhaustion.

2022-11-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-26677.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

+1, merged to master. Thanks for the contribution Chris.

> Constrain available processors to Jetty during test runs to prevent thread 
> exhaustion.
> --
>
> Key: HIVE-26677
> URL: https://issues.apache.org/jira/browse/HIVE-26677
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As described during a [release candidate 
> vote|https://lists.apache.org/thread/8qjf7x9t9v09d79hlzh712ls4zthdwrh]:
> HIVE-24484 introduced a change to limit {{hive.server2.webui.max.threads}} to 
> 4. Jetty enforces thread leasing to warn or abort if there aren't enough 
> threads available [1]. During startup, it attempts to lease a thread per NIO 
> selector [2]. By default, the number of NIO selectors to use is determined 
> based on available CPUs [3]. This is mostly a passthrough to 
> {{Runtime.availableProcessors()}} [4]. In my case, running on a machine with 
> 16 CPUs, this ended up creating more than 4 selectors, therefore requiring 
> more than 4 threads and violating the lease check. I was able to work around 
> this by passing the {{JETTY_AVAILABLE_PROCESSORS}} system property to 
> constrain the number of CPUs available to Jetty.
> Since we are intentionally constraining the pool to 4 threads during itests, 
> let's also limit {{JETTY_AVAILABLE_PROCESSORS}} in {{maven.test.jvm.args}} of 
> the root pom.xml, so that others don't run into this problem later.
> [1] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ThreadPoolBudget.java#L165
> [2] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L255
> [3] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L79
> [4] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/ProcessorUtils.java#L45



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26677) Constrain available processors to Jetty during test runs to prevent thread exhaustion.

2022-11-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26677?focusedWorklogId=825526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-825526
 ]

ASF GitHub Bot logged work on HIVE-26677:
-

Author: ASF GitHub Bot
Created on: 12/Nov/22 21:25
Start Date: 12/Nov/22 21:25
Worklog Time Spent: 10m 
  Work Description: szlta merged PR #3713:
URL: https://github.com/apache/hive/pull/3713




Issue Time Tracking
---

Worklog Id: (was: 825526)
Time Spent: 0.5h  (was: 20m)

> Constrain available processors to Jetty during test runs to prevent thread 
> exhaustion.
> --
>
> Key: HIVE-26677
> URL: https://issues.apache.org/jira/browse/HIVE-26677
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As described during a [release candidate 
> vote|https://lists.apache.org/thread/8qjf7x9t9v09d79hlzh712ls4zthdwrh]:
> HIVE-24484 introduced a change to limit {{hive.server2.webui.max.threads}} to 
> 4. Jetty enforces thread leasing to warn or abort if there aren't enough 
> threads available [1]. During startup, it attempts to lease a thread per NIO 
> selector [2]. By default, the number of NIO selectors to use is determined 
> based on available CPUs [3]. This is mostly a passthrough to 
> {{Runtime.availableProcessors()}} [4]. In my case, running on a machine with 
> 16 CPUs, this ended up creating more than 4 selectors, therefore requiring 
> more than 4 threads and violating the lease check. I was able to work around 
> this by passing the {{JETTY_AVAILABLE_PROCESSORS}} system property to 
> constrain the number of CPUs available to Jetty.
> Since we are intentionally constraining the pool to 4 threads during itests, 
> let's also limit {{JETTY_AVAILABLE_PROCESSORS}} in {{maven.test.jvm.args}} of 
> the root pom.xml, so that others don't run into this problem later.
> [1] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ThreadPoolBudget.java#L165
> [2] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L255
> [3] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L79
> [4] 
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/ProcessorUtils.java#L45



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26699) Iceberg: S3 fadvise can hurt JSON parsing significantly in DWX

2022-11-12 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632676#comment-17632676
 ] 

Steve Loughran commented on HIVE-26699:
---

you should be using the openFile() api call and set the read policy option to 
whole-file (assuming that is the intent), and ideally pass in the file 
status...or at least file length, which is enough for s3a to skip the HEAD, 
though not abfs.
see org.apache.hadoop.util.JsonSerialization for its max-performance json load, 
which the s3a and manifest committers both use

> Iceberg: S3 fadvise can hurt JSON parsing significantly in DWX
> --
>
> Key: HIVE-26699
> URL: https://issues.apache.org/jira/browse/HIVE-26699
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>
> Hive reads JSON metadata information (TableMetadataParser::read()) multiple 
> times; E.g during query compilation, AM split computation, stats computation, 
> during commits  etc.
>  
> With large JSON files (due to multiple inserts), it takes a lot longer time 
> with S3 FS with "fs.s3a.experimental.input.fadvise" set to "random". (e.g in 
> the order of 10x).To be on safer side, it will be good to set this to 
> "normal" mode in configs, when reading iceberg tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)