Hi Adam,

comments inline...

On 9/23/25 10:46, Herring, Adam wrote:
Hi Apache MINA team,

I'm seeking advice on how to handle an apparent race condition encountered 
using mina 2.0.27, whether my analysis is correct and if this is a known issue 
(I haven't identified one open in 
https://issues.apache.org/jira/projects/DIRMINA/issues/DIRMINA-1186?filter=allopenissues).

Issue Description

Our application server uses MINA for NIO between a data cache and several 
client applications. Recently we have seen increased incidence of the following 
warning-level message in the data cache logs:

Create a new selector. Selected is 0, delta = 0

I found this is logged from the AbstractPollingIoProcessor and is accompanied 
by a comment block explaining that there is an Epoll race condition that can 
cause file descriptors not to be considered available and the selector must be 
closed and a new one registered to prevent 100% CPU use. It's not entirely 
clear whether this issue is in MINA, the JDK or the operating system, but 
either way this all seems fine and most of our application instances do not 
encounter a further problem.

It's a JDK problem, I do'nt think it was ever fixed: https://bugs.openjdk.org/browse/JDK-8011538



On one production instance of our application it looks like a second race condition 
occurs when the selector is closed and replaced. The greater the frequency of the 
"Create a new selector. Selected is 0, delta = 0" message, the more likely that 
we encounter a ClosedSelectorException on another thread. Full stack trace and logs 
before and after:

WARN 2025-09-10 21:58:29,212 [NioProcessor-2]-service.IoProcessor: Create a new 
selector. Selected is 0, delta = 0
WARN 2025-09-10 21:58:29,220 [pool-3-thread-2]-server.DsrvServerIoHandler: null
java.nio.channels.ClosedSelectorException: null
                 at 
sun.nio.ch.EPollSelectorImpl.ensureOpen(EPollSelectorImpl.java:98) ~[?:?]
                 at 
sun.nio.ch.EPollSelectorImpl.setEventOps(EPollSelectorImpl.java:243) ~[?:?]
                 at 
sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:101) ~[?:?]
                 at 
org.apache.mina.transport.socket.nio.NioProcessor.setInterestedInWrite(NioProcessor.java:371)
 ~[mina-core-2.0.27.jar:?]
                 at 
org.apache.mina.transport.socket.nio.NioProcessor.setInterestedInWrite(NioProcessor.java:47)
 ~[mina-core-2.0.27.jar:?]
                 at 
org.apache.mina.core.polling.AbstractPollingIoProcessor.updateTrafficControl(AbstractPollingIoProcessor.java:585)
 [mina-core-2.0.27.jar:?]
                 at 
org.apache.mina.core.polling.AbstractPollingIoProcessor.updateTrafficControl(AbstractPollingIoProcessor.java:68)
 [mina-core-2.0.27.jar:?]
                 at 
org.apache.mina.core.service.SimpleIoProcessorPool.updateTrafficControl(SimpleIoProcessorPool.java:294)
 [mina-core-2.0.27.jar:?]
                 at 
org.apache.mina.core.service.SimpleIoProcessorPool.updateTrafficControl(SimpleIoProcessorPool.java:80)
 [mina-core-2.0.27.jar:?]
                 at 
org.apache.mina.core.session.AbstractIoSession.resumeRead(AbstractIoSession.java:748)
 [mina-core-2.0.27.jar:?]
                 at 
j4sf.connect.dsrv.endpoint.IoSessionEndPoint.resumeRead(IoSessionEndPoint.java:73)
 [DataServer-11.0.2.jar:?]
                 at 
j4sf.connect.dsrv.server.SerialExecutor.scheduleNext(SerialExecutor.java:94) 
[DataServer-11.0.2.jar:?]
                 at 
j4sf.connect.dsrv.server.SerialExecutor$LocalTask.run(SerialExecutor.java:140) 
[DataServer-11.0.2.jar:?]
                 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
                 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
                 at java.lang.Thread.run(Thread.java:829) [?:?]
INFO 2025-09-10 21:58:29,235 [NioProcessor-2]-server.DsrvServerIoHandler: 
Session Closed

Our code in the j4sf.connect.dsrv.server.SerialExecutor and 
j4sf.connect.dsrv.endpoint.IoSessionEndpoint is handling reading incoming 
messages on the IoSession. There is a message processing queues with a high and 
low watermark. At some point we have exceeded the high watermark and called 
suspendRead() on the IoSession. The ClosedConnectorException above has thrown 
when we called ioSession.resumeRead().

Our DsrvServerIoHandler extends org.apache.mina.core.service.IoHandlerAdapter - 
the exception is caught here where we override 
IoHandlerAdapter.exceptionCaught(IoSession, Throwable). At this point we log 
the exception and close the session, resulting in the final log message above 
after the session has closed.

We've been unable to reproduce this on a hardware/software platform under our 
control.

I don't see any mention of ClosedSelectorException in your mail archive since 
handling of them was fixed in 
DIRMINA-978<https://issues.apache.org/jira/browse/DIRMINA-978>

Questions:


   1.  Could this be a race condition with the resumeRead causing the 
NioProcessor retrieving the old selection key just before it was closed, then 
attempting the read after it has been closed?


Yes, because the thread that deals with the selector replacement is different from the thread processing the messages.




   1.  The IoSession is already suspended when this happens - is there any 
reason inherent in MINA why we shouldn't ignore this exception, at least once, 
and simply attempt to resumeRead() again. If I could reproduce the issue 
outside prod, I would of course try this myself.
If not, do you have any other advice on how the issue might be worked around.


I *think* (but this is from the top of my head) that the necessary step to fix the issue is re-register the channel on the new selector, but it has to be present. So as soon as the previous identified race condition is fixed, this issue should bot occur. It would probably be enough to add a lock section where we span a new selector that blocks any operation on the selector. May be an AtomicReference to the selector itself could help?





   1.  Might upgrade to MINA 2.1 or 2.2 help? I see nothing obvious I the release 
notes that suggests it would. We have already upgraded from 2.0.21 to 2.0.27 which 
seems to have reduced the frequency of the issue - perhaps due to the inclusion on 
DIRMINA-1169<https://issues.apache.org/jira/browse/DIRMINA-1169> in 2.0.24.

No, I'm afraid that won't help. Although I encourage you to migrate, 2.0.X is pretty old... (6 years since 2.1 has been release)





   1.  Finally, do you know anything about likely causes of the initial Epoll 
race condition the MINA code is working around. If it's caused by java, is 
there an OpenJDK issue open related to it? I couldn't find anything that seem 
to match. FWIW the system the issue occurs on is running RHEL 9.5 and the JDK 
is also Red Hat's java 11: Red_Hat-11.0.17.0.8-2.el7openjdkportable

As I mentionned above, JDK bug. MINA is not the only NIO framework handling a spinning selector this way :/




Thanks!
Adam

This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to