Hello,

we've been using MINA for a huge game server application for 3 years, and so 
far it has served us well. However, a day ago, it started acting weirdly. 1-3 
hours after the application was started, it starts hogging the CPU (100% on all 
cores).

Here are two example stack traces (approximately a third of all NioProcessors 
are stuck/busy in the one or other wakeup method):
"NioProcessor-56" prio=6 tid=0x000000004264c000 nid=0xa9c runnable 
[0x000000004c09f000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.WindowsSelectorImpl.resetWakeupSocket0(Native Method)
    at 
sun.nio.ch.WindowsSelectorImpl.resetWakeupSocket(WindowsSelectorImpl.java:473)
    - locked <0x0000000200b91f40> (a java.lang.Object)
    at sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:174)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
    - locked <0x0000000200b90990> (a sun.nio.ch.Util$2)
    - locked <0x0000000200b90980> (a java.util.Collections$UnmodifiableSet)
    - locked <0x0000000200b7b8b0> (a sun.nio.ch.WindowsSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
    at 
org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:72)
    at 
org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1077)
    at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

   Locked ownable synchronizers:
    - <0x00000006bc378ec8> (a java.util.concurrent.ThreadPoolExecutor$Worker)

-----------------

"NioProcessor-87" prio=6 tid=0x0000000042649000 nid=0xaa4 waiting for monitor 
entry [0x0000000049b1e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at sun.nio.ch.WindowsSelectorImpl.wakeup(WindowsSelectorImpl.java:604)
    - waiting to lock <0x00000002009bcc98> (a java.lang.Object)
    at 
org.apache.mina.transport.socket.nio.NioProcessor.wakeup(NioProcessor.java:88)
    at 
org.apache.mina.core.polling.AbstractPollingIoProcessor.flush(AbstractPollingIoProcessor.java:430)
    at 
org.apache.mina.core.polling.AbstractPollingIoProcessor.write(AbstractPollingIoProcessor.java:418)
    at 
org.apache.mina.core.polling.AbstractPollingIoProcessor.write(AbstractPollingIoProcessor.java:67)
    at 
org.apache.mina.core.service.SimpleIoProcessorPool.write(SimpleIoProcessorPool.java:251)
        .... more write related instruction
        .... write request issued by our application

-----------------

Configuration info:
OS: Windows Server 2012 Standard x64
Java: 1.7.0_51 (the JRE included in the server JDK)
MINA: 2.0.7
CPU: Intel Xeon E5-1650 @ 3.2 GHz (6 Cores, 12 Threads)
RAM: 32 GB, JVM's heap size is set to 24 GB. At the time of the crash it's 
using about 10 GB.

-----------------

When it starts, there are about 750 active clients (usually the server is able 
to hold 2500 easily, but because of the crashes the amount of concurrent 
players is less). It doesn't stop hogging the CPU even when all connections are 
dropped - it only ends when the process is terminated and restarted. We tried 
rebooting the machine several times, it didn't help.

We haven't changed our software configuration recently, and there have been no 
changes to the application source code related to networking. What did change, 
though, is the hardware: We had our host install our drives into a different 
server with the same specifications a few days ago (the old one had a hardware 
defect that caused occasional BSODs).

It might also be noteworthy that script kiddies have been DDoS-attacking us 
recently, but those attacks seem to focus on web services and not the actual 
game server. The traffic is filtered by a DDoS protection and the OS shouldn't 
see any of it. What we did suspect is that a different kind of attack is used 
against the game service, but I was not able to find proof of that. netstat 
shows no abnormal amount of connections, and it's not a synflood either. I ran 
WireShark as well, and nothing seemed out of the ordinary when it started 
happening (though I have to say, there's very much noise and I don't exactly 
know what to look for).

I'd appreciate any hints about what could be causing this behaviour, and 
possible solutions.

Regards,
TheHiddenOne

Reply via email to