[ 
https://issues.apache.org/jira/browse/YARN-11530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11530:
----------------------------------
    Labels: pull-request-available  (was: )

> Server$Listener stating too many open files when setting 
> ipc.server.read.threadpool.size big enough
> ---------------------------------------------------------------------------------------------------
>
>                 Key: YARN-11530
>                 URL: https://issues.apache.org/jira/browse/YARN-11530
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: ConfX
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: reproduce.sh
>
>
> h2. What happened?
> Got an IOException stating "Too many open files" when running 
> org.apache.hadoop.yarn.TestRPCFactories#test
> h2. Where's the bug?
> In the constructor of org.apache.hadoop.ipc.Server$Listener, the listener 
> opens a bunch of readers:
> {code:java}
>       readers = new Reader[readThreads];
>       for (int i = 0; i < readThreads; i++) {
>         Reader reader = new Reader(
>             "Socket Reader #" + (i + 1) + " for port " + port);
>         readers[i] = reader;
>         reader.start();
>       }
> {code}
> without checking on the value readThreads. When the parameter 
> ipc.server.read.threadpool.size is set big enough, the system would run out 
> of new readers to open. The listener should try to catch exceptions thrown 
> during the creation of the reader.
> h3. Stacktrace
> {code}
> java.lang.ExceptionInInitializerError
>         ...
> Caused by: java.io.IOException: Too many open files
>       at java.base/sun.nio.ch.FileDispatcherImpl.init(Native Method)
>       at 
> java.base/sun.nio.ch.FileDispatcherImpl.<clinit>(FileDispatcherImpl.java:38)
>         ...
> {code}
> h2. How to reproduce?
> (1) set ipc.server.read.threadpool.size to 50000
> (2) run org.apache.hadoop.yarn.TestRPCFactories#test
> You can use the reproduce.sh in the attachment to easily reproduce the bug:
> We have tested this bug on both Ubuntu and MacOS. *The bug is volatile and 
> appears in different forms on the two OS we have tested*. On MacOS it outputs 
> the too many open files error in stderr. On Ubuntu the JVM crashes directly: 
> {code}
> [WARNING] Corrupted STDOUT by directly writing to native stream in forked JVM 
> 1.
> ...
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> ...
> Error occurred in starting fork, check output in log                          
>                                                                               
>                  
> Process Exit Code: 1                                                          
>                                                                               
>                  
> Crashed tests:                                                                
>                                                                               
>                  
> org.apache.hadoop.yarn.TestRPCFactories
> ...
> Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The 
> forked VM terminated without properly saying goodbye. VM crash or System.exit 
> called?
> {code}
> We are happy to provide a patch after this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to