RE: Cassandra nodes are down
Hello, Please check in debug logs for detailed trace, here exact reason can't be figure out. Try your luck there. From: Mokkapati, Bhargav (Nokia - IN/Chennai) [mailto:bhargav.mokkap...@nokia.com] Sent: Monday, January 29, 2018 11:09 PM To: user@cassandra.apache.org Cc: mbhargavna...@gmail.com Subject: Cassandra nodes are down Hi Team, I'm getting the below warnings. Please help me out to clear these issues. Apache Cassandra version : 3.0.13, 5 Node cluster INFO [main] 2018-01-29 16:58:19,487 NativeLibrary.java:167 - JNA mlockall successful WARN [main] 2018-01-29 16:58:19,488 StartupChecks.java:121 - jemalloc shared library could not be preloaded to speed up memory allocations INFO [main] 2018-01-29 16:58:19,488 StartupChecks.java:160 - JMX is enabled to receive remote connections on port: 8002 WARN [main] 2018-01-29 16:58:19,488 StartupChecks.java:178 - OpenJDK is not recommended. Please upgrade to the newest Oracle Java release INFO [main] 2018-01-29 16:58:19,490 SigarLibrary.java:44 - Initializing SIGAR library WARN [main] 2018-01-29 16:58:19,498 SigarLibrary.java:174 - Cassandra server running in degraded mode. Is swap disabled? : true, Address space adequate? : true, nofile limit adequate? : false, nproc limit adequate? : true WARN [main] 2018-01-29 16:58:19,500 StartupChecks.java:246 - Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low, recommended value: 1048575, you can change it with sysctl. WARN [main] 2018-01-29 17:05:07,844 SystemKeyspace.java:1042 - No host ID found, created 2dc59352-e98e-4e77-a5f2-289697e467c7 (Note: This should happen exactly once per node). INFO [main] 2018-01-29 17:05:16,421 Server.java:160 - Starting listening for CQL clients on /10.50.21.22:9042 (unencrypted)... INFO [main] 2018-01-29 17:05:16,449 CassandraDaemon.java:488 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it INFO [OptionalTasks:1] 2018-01-29 17:05:18,443 CassandraRoleManager.java:350 - Created default superuser role 'cassandra' INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,737 HintsService.java:212 - Paused hints dispatch INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Server.java:180 - Stop listening for CQL clients INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Gossiper.java:1490 - Announcing shutdown INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,741 StorageService.java:1991 - Node /10.50.21.22 state jump to shutdown INFO [StorageServiceShutdownHook] 2018-01-29 17:09:57,743 MessagingService.java:811 - Waiting for messaging service to quiesce INFO [ACCEPT-/10.50.21.22] 2018-01-29 17:09:57,743 MessagingService.java:1110 - MessagingService has terminated the accept() thread INFO [StorageServiceShutdownHook] 2018-01-29 17:09:57,797 HintsService.java:212 - Paused hints dispatch Thanks, Bhargav M.
Re: Cassandra nodes are down
Something is invoking the shutdown hook (calling kill). May be your config management or similar. -- Jeff Jirsa > On Jan 29, 2018, at 9:38 AM, Mokkapati, Bhargav (Nokia - IN/Chennai) >wrote: > > Hi Team, > > I’m getting the below warnings. Please help me out to clear these issues. > > Apache Cassandra version : 3.0.13, 5 Node cluster > > INFO [main] 2018-01-29 16:58:19,487 NativeLibrary.java:167 - JNA mlockall > successful > WARN [main] 2018-01-29 16:58:19,488 StartupChecks.java:121 - jemalloc shared > library could not be preloaded to speed up memory allocations > INFO [main] 2018-01-29 16:58:19,488 StartupChecks.java:160 - JMX is enabled > to receive remote connections on port: 8002 > WARN [main] 2018-01-29 16:58:19,488 StartupChecks.java:178 - OpenJDK is not > recommended. Please upgrade to the newest Oracle Java release > INFO [main] 2018-01-29 16:58:19,490 SigarLibrary.java:44 - Initializing > SIGAR library > WARN [main] 2018-01-29 16:58:19,498 SigarLibrary.java:174 - Cassandra server > running in degraded mode. Is swap disabled? : true, Address space adequate? > : true, nofile limit adequate? : false, nproc limit adequate? : true > WARN [main] 2018-01-29 16:58:19,500 StartupChecks.java:246 - Maximum number > of memory map areas per process (vm.max_map_count) 65530 is too low, > recommended value: 1048575, you can change it with sysctl. > > WARN [main] 2018-01-29 17:05:07,844 SystemKeyspace.java:1042 - No host ID > found, created 2dc59352-e98e-4e77-a5f2-289697e467c7 (Note: This should happen > exactly once per node). > INFO [main] 2018-01-29 17:05:16,421 Server.java:160 - Starting listening for > CQL clients on /10.50.21.22:9042 (unencrypted)... > INFO [main] 2018-01-29 17:05:16,449 CassandraDaemon.java:488 - Not starting > RPC server as requested. Use JMX (StorageService->startRPCServer()) or > nodetool (enablethrift) to start it > INFO [OptionalTasks:1] 2018-01-29 17:05:18,443 CassandraRoleManager.java:350 > - Created default superuser role 'cassandra' > INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,737 > HintsService.java:212 - Paused hints dispatch > INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Server.java:180 - > Stop listening for CQL clients > INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Gossiper.java:1490 > - Announcing shutdown > INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,741 > StorageService.java:1991 - Node /10.50.21.22 state jump to shutdown > INFO [StorageServiceShutdownHook] 2018-01-29 17:09:57,743 > MessagingService.java:811 - Waiting for messaging service to quiesce > INFO [ACCEPT-/10.50.21.22] 2018-01-29 17:09:57,743 > MessagingService.java:1110 - MessagingService has terminated the accept() > thread > INFO [StorageServiceShutdownHook] 2018-01-29 17:09:57,797 > HintsService.java:212 - Paused hints dispatch > > Thanks, > Bhargav M.
Cassandra nodes are down
Hi Team, I'm getting the below warnings. Please help me out to clear these issues. Apache Cassandra version : 3.0.13, 5 Node cluster INFO [main] 2018-01-29 16:58:19,487 NativeLibrary.java:167 - JNA mlockall successful WARN [main] 2018-01-29 16:58:19,488 StartupChecks.java:121 - jemalloc shared library could not be preloaded to speed up memory allocations INFO [main] 2018-01-29 16:58:19,488 StartupChecks.java:160 - JMX is enabled to receive remote connections on port: 8002 WARN [main] 2018-01-29 16:58:19,488 StartupChecks.java:178 - OpenJDK is not recommended. Please upgrade to the newest Oracle Java release INFO [main] 2018-01-29 16:58:19,490 SigarLibrary.java:44 - Initializing SIGAR library WARN [main] 2018-01-29 16:58:19,498 SigarLibrary.java:174 - Cassandra server running in degraded mode. Is swap disabled? : true, Address space adequate? : true, nofile limit adequate? : false, nproc limit adequate? : true WARN [main] 2018-01-29 16:58:19,500 StartupChecks.java:246 - Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low, recommended value: 1048575, you can change it with sysctl. WARN [main] 2018-01-29 17:05:07,844 SystemKeyspace.java:1042 - No host ID found, created 2dc59352-e98e-4e77-a5f2-289697e467c7 (Note: This should happen exactly once per node). INFO [main] 2018-01-29 17:05:16,421 Server.java:160 - Starting listening for CQL clients on /10.50.21.22:9042 (unencrypted)... INFO [main] 2018-01-29 17:05:16,449 CassandraDaemon.java:488 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it INFO [OptionalTasks:1] 2018-01-29 17:05:18,443 CassandraRoleManager.java:350 - Created default superuser role 'cassandra' INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,737 HintsService.java:212 - Paused hints dispatch INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Server.java:180 - Stop listening for CQL clients INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Gossiper.java:1490 - Announcing shutdown INFO [StorageServiceShutdownHook] 2018-01-29 17:09:55,741 StorageService.java:1991 - Node /10.50.21.22 state jump to shutdown INFO [StorageServiceShutdownHook] 2018-01-29 17:09:57,743 MessagingService.java:811 - Waiting for messaging service to quiesce INFO [ACCEPT-/10.50.21.22] 2018-01-29 17:09:57,743 MessagingService.java:1110 - MessagingService has terminated the accept() thread INFO [StorageServiceShutdownHook] 2018-01-29 17:09:57,797 HintsService.java:212 - Paused hints dispatch Thanks, Bhargav M.
Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?
Hi, I am setting up a cluster on a linux box. Everything seems to be working great and I am watching the ring with: watch -d -n 2 nodetool -h localhost ring Suddenly, I see that one of the nodes just went down (at 14:07): Status changed from Up to Down. 13 minutes later (without any intervention) the node comes back Up (by itself). I check the logs (see at end of text) on that node and see that there is nothing in the log from 14:07 until 14:20 (13 minutes later). I also notice the GC ConcurrentMarkSweep took 13 minutes. Here are my questions: [1] Is this behavior normal? [2] Has it been observed by someone else before? [3] The node being down means that nodetool, and any other client, wont be able to connect to it (clients should use other nodes in cluster to write data). Correct? [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the JVM cannot do anything else? Hence then node is technically Down? Correct? [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow). [6] Any JMV Args (switches) I can use to prevent this? -- JVM_OPTS= \ -Dprog=Cassandra \ -ea \ -Xms12G \ -Xmx12G \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false Log Extract ## INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296 used; max is 13005881344 INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java (line 357)HintsColumnFamily has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log', position=55517352) INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java (line 609) Enqueuing flush of memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations) INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations) INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208 used; max is 13005881344 INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed leaving 196692400 used; max is 13005881344 --
Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?
GCs never take that long unless you're swapping. On Sun, Aug 22, 2010 at 2:11 PM, Moleza Moleza mole...@gmail.com wrote: Hi, I am setting up a cluster on a linux box. Everything seems to be working great and I am watching the ring with: watch -d -n 2 nodetool -h localhost ring Suddenly, I see that one of the nodes just went down (at 14:07): Status changed from Up to Down. 13 minutes later (without any intervention) the node comes back Up (by itself). I check the logs (see at end of text) on that node and see that there is nothing in the log from 14:07 until 14:20 (13 minutes later). I also notice the GC ConcurrentMarkSweep took 13 minutes. Here are my questions: [1] Is this behavior normal? [2] Has it been observed by someone else before? [3] The node being down means that nodetool, and any other client, wont be able to connect to it (clients should use other nodes in cluster to write data). Correct? [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the JVM cannot do anything else? Hence then node is technically Down? Correct? [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow). [6] Any JMV Args (switches) I can use to prevent this? -- JVM_OPTS= \ -Dprog=Cassandra \ -ea \ -Xms12G \ -Xmx12G \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false Log Extract ## INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296 used; max is 13005881344 INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java (line 357)HintsColumnFamily has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log', position=55517352) INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java (line 609) Enqueuing flush of memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations) INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations) INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208 used; max is 13005881344 INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed leaving 196692400 used; max is 13005881344 -- -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?
http://riptano.blip.tv/file/4012133/ On Sun, Aug 22, 2010 at 12:11 PM, Moleza Moleza mole...@gmail.com wrote: Hi, I am setting up a cluster on a linux box. Everything seems to be working great and I am watching the ring with: watch -d -n 2 nodetool -h localhost ring Suddenly, I see that one of the nodes just went down (at 14:07): Status changed from Up to Down. 13 minutes later (without any intervention) the node comes back Up (by itself). I check the logs (see at end of text) on that node and see that there is nothing in the log from 14:07 until 14:20 (13 minutes later). I also notice the GC ConcurrentMarkSweep took 13 minutes. Here are my questions: [1] Is this behavior normal? [2] Has it been observed by someone else before? [3] The node being down means that nodetool, and any other client, wont be able to connect to it (clients should use other nodes in cluster to write data). Correct? [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the JVM cannot do anything else? Hence then node is technically Down? Correct? [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow). [6] Any JMV Args (switches) I can use to prevent this? -- JVM_OPTS= \ -Dprog=Cassandra \ -ea \ -Xms12G \ -Xmx12G \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false Log Extract ## INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296 used; max is 13005881344 INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java (line 357)HintsColumnFamily has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log', position=55517352) INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java (line 609) Enqueuing flush of memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations) INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations) INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208 used; max is 13005881344 INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed leaving 196692400 used; max is 13005881344 --
Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?
[4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the JVM cannot do anything else? Hence then node is technically Down? Correct? No; the concurrent mark/sweep phase runs concurrently with your application. CMS will cause a stop-the-world full pause it it fails to complete a CMS sweep in time and you hit the maximum heap size, but unless that happens, CMS will run concurrently (though there are stop-the-world pauses involved, that are typically very short, the mark/sweep phase is concurrent). As jbellis pointed out, you're almost certainly swapping. -- / Peter Schuller