Hello,
we've been using MINA for a huge game server application for 3 years, and so
far it has served us well. However, a day ago, it started acting weirdly. 1-3
hours after the application was started, it starts hogging the CPU (100% on all
cores).
Here are two example stack traces (approximately a third of all NioProcessors
are stuck/busy in the one or other wakeup method):
"NioProcessor-56" prio=6 tid=0x000000004264c000 nid=0xa9c runnable
[0x000000004c09f000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.WindowsSelectorImpl.resetWakeupSocket0(Native Method)
at
sun.nio.ch.WindowsSelectorImpl.resetWakeupSocket(WindowsSelectorImpl.java:473)
- locked <0x0000000200b91f40> (a java.lang.Object)
at sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:174)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x0000000200b90990> (a sun.nio.ch.Util$2)
- locked <0x0000000200b90980> (a java.util.Collections$UnmodifiableSet)
- locked <0x0000000200b7b8b0> (a sun.nio.ch.WindowsSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:72)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1077)
at
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Locked ownable synchronizers:
- <0x00000006bc378ec8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
-----------------
"NioProcessor-87" prio=6 tid=0x0000000042649000 nid=0xaa4 waiting for monitor
entry [0x0000000049b1e000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.WindowsSelectorImpl.wakeup(WindowsSelectorImpl.java:604)
- waiting to lock <0x00000002009bcc98> (a java.lang.Object)
at
org.apache.mina.transport.socket.nio.NioProcessor.wakeup(NioProcessor.java:88)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:430)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor.write(AbstractPollingIoProcessor.java:418)
at
org.apache.mina.core.polling.AbstractPollingIoProcessor.write(AbstractPollingIoProcessor.java:67)
at
org.apache.mina.core.service.SimpleIoProcessorPool.write(SimpleIoProcessorPool.java:251)
.... more write related instruction
.... write request issued by our application
-----------------
Configuration info:
OS: Windows Server 2012 Standard x64
Java: 1.7.0_51 (the JRE included in the server JDK)
MINA: 2.0.7
CPU: Intel Xeon E5-1650 @ 3.2 GHz (6 Cores, 12 Threads)
RAM: 32 GB, JVM's heap size is set to 24 GB. At the time of the crash it's
using about 10 GB.
-----------------
When it starts, there are about 750 active clients (usually the server is able
to hold 2500 easily, but because of the crashes the amount of concurrent
players is less). It doesn't stop hogging the CPU even when all connections are
dropped - it only ends when the process is terminated and restarted. We tried
rebooting the machine several times, it didn't help.
We haven't changed our software configuration recently, and there have been no
changes to the application source code related to networking. What did change,
though, is the hardware: We had our host install our drives into a different
server with the same specifications a few days ago (the old one had a hardware
defect that caused occasional BSODs).
It might also be noteworthy that script kiddies have been DDoS-attacking us
recently, but those attacks seem to focus on web services and not the actual
game server. The traffic is filtered by a DDoS protection and the OS shouldn't
see any of it. What we did suspect is that a different kind of attack is used
against the game service, but I was not able to find proof of that. netstat
shows no abnormal amount of connections, and it's not a synflood either. I ran
WireShark as well, and nothing seemed out of the ordinary when it started
happening (though I have to say, there's very much noise and I don't exactly
know what to look for).
I'd appreciate any hints about what could be causing this behaviour, and
possible solutions.
Regards,
TheHiddenOne