can you reproduce this by, say, running nodeprobe ring in a bash while loop?
On Wed, Oct 20, 2010 at 3:09 PM, Bill Au <bill.w...@gmail.com> wrote: > One of my Cassandra server crashed with the following: > > ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419 > CassandraDaemon.java (line 82) Uncaught exception in thread > Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main] > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:597) > at > org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533) > > > I took threads dump in the JVM on all the other Cassandra severs in my > cluster. They all have thousand of threads looking like this: > > "JMX server connection timeout 183373" daemon prio=10 tid=0x00002aad230db800 > nid=0x5cf6 in Object.wait() [0x00002aad7a316000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at > com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150) > - locked <0x00002aab056ccee0> (a [I) > at java.lang.Thread.run(Thread.java:619) > > It seems to me that there is a JMX threads leak in Cassandra. NodeProbe > creates a JMXConnector but never calls its close() method. I tried setting > jmx.remote.x.server.connection.timeout to 0 hoping that would disable the > JMX server connection timeout threads. But that did not make any > difference. > > Has anyone else seen this? > > Bill > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com