Romain, yes, I had to kill -9 to stop it.
INFO [RequestResponseStage:54] 2013-10-31 11:59:10,413 Gossiper.java (line 789) InetAddress /192.168.0.197 is now UP INFO [GossipStage:1] 2013-10-31 11:59:10,706 StorageService.java (line 1298) Node /192.168.0.197 state jump to normal INFO [GossipStage:1] 2013-10-31 12:00:55,905 Gossiper.java (line 806) InetAddress /192.168.0.251 is now DOWN INFO [HANDSHAKE-/192.168.0.251] 2013-10-31 12:01:35,848 OutboundTcpConnection.java (line 386) Handshaking version with / 192.168.0.251 INFO [GossipStage:1] 2013-10-31 12:01:35,978 Gossiper.java (line 824) Node /192.168.0.251 has restarted, now UP INFO [MemoryMeter:1] 2013-10-31 12:01:35,980 Memtable.java (line 442) CFS(Keyspace='system', ColumnFamily='peers') liveRatio is 17.361702127659573 (just-counted was 17.361702127659573). calculation took 0m s for 10 columns INFO [HANDSHAKE-/192.168.0.251] 2013-10-31 12:01:35,989 OutboundTcpConnection.java (line 386) Handshaking version with / 192.168.0.251 INFO [RequestResponseStage:55] 2013-10-31 12:01:36,065 Gossiper.java (line 789) InetAddress /192.168.0.251 is now UP ERROR [MigrationStage:2] 2013-10-31 12:01:36,176 CassandraDaemon.java (line 185) Exception in thread Thread[MigrationStage:2,5,main] java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-jb-95-Index.db (Too many open files) at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:102) at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:90) at org.apache.cassandra.io.sstable.SSTableReader.openIndexReader(SSTableReader.java:1337) at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:68) at org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1115) at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:69) at org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1507) at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1626) at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1564) at org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:722) at org.apache.cassandra.db.SystemKeyspace.serializeSchema(SystemKeyspace.java:743) at org.apache.cassandra.db.SystemKeyspace.serializeSchema(SystemKeyspace.java:733) at org.apache.cassandra.db.MigrationRequestVerbHandler.doVerb(MigrationRequestVerbHandler.java:42) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.io.FileNotFoundException: /var/lib/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-jb-95-Index.db (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:58) at org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:98) ... 16 more INFO [GossipStage:1] 2013-10-31 12:01:36,549 StorageService.java (line 1298) Node /192.168.0.251 state jump to normal INFO [StorageServiceShutdownHook] 2013-10-31 12:02:38,067 ThriftServer.java (line 141) Stop listening to thrift clients INFO [StorageServiceShutdownHook] 2013-10-31 12:02:38,174 Server.java (line 174) Stop listening for CQL clients INFO [StorageServiceShutdownHook] 2013-10-31 12:02:38,175 Gossiper.java (line 1129) Announcing shutdown INFO [StorageServiceShutdownHook] 2013-10-31 12:02:40,175 MessagingService.java (line 665) Waiting for messaging service to quiesce INFO [ACCEPT-/192.168.0.232] 2013-10-31 12:02:40,176 MessagingService.java (line 846) MessagingService shutting down server thread. And there are a lot of such exceptions (the above one is the last before it hanged). There are no exceptions after upgrade and startup. Looks like C* is leaking file handles (nothing else is running on that VM). I will try to debug it. Thanks for pointing me to the logs. Regards, Mikhail. 2013/10/31 Romain HARDOUIN <romain.hardo...@urssaf.fr> > Hi, > > So you had to kill -9 the process? > Is there something interesting in system.log? > Can you restart the node or are there any errors on startup? > > Romain > > Mikhail Mazursky <ash...@gmail.com> a écrit sur 31/10/2013 08:02:22 : > > > De : Mikhail Mazursky <ash...@gmail.com> > > A : user@cassandra.apache.org, > > Date : 31/10/2013 08:04 > > Objet : Node hang on shutdown > > > > Hi. > > > I was upgrading my 3 node testing cluster from 2.0.1 to 2.0.2. I > > successfully upgraded two nodes but the last one did not shutdown > > properly. Does somebody see anything suspicious in the attached thread > dump? > > > Regards, > > Mikhail. >