RE: Cassandra nodes are down

2018-01-30 Thread Amit Singh
Hello,

 

Please check in debug logs for detailed trace, here exact reason can't be
figure out. Try your luck there.

 

From: Mokkapati, Bhargav (Nokia - IN/Chennai)
[mailto:bhargav.mokkap...@nokia.com] 
Sent: Monday, January 29, 2018 11:09 PM
To: user@cassandra.apache.org
Cc: mbhargavna...@gmail.com
Subject: Cassandra nodes are down

 

Hi Team,

 

I'm getting the below warnings. Please help me out to clear these issues.

 

Apache Cassandra version : 3.0.13, 5 Node cluster

 

INFO  [main] 2018-01-29 16:58:19,487 NativeLibrary.java:167 - JNA mlockall
successful

WARN  [main] 2018-01-29 16:58:19,488 StartupChecks.java:121 - jemalloc
shared library could not be preloaded to speed up memory allocations

INFO  [main] 2018-01-29 16:58:19,488 StartupChecks.java:160 - JMX is enabled
to receive remote connections on port: 8002

WARN  [main] 2018-01-29 16:58:19,488 StartupChecks.java:178 - OpenJDK is not
recommended. Please upgrade to the newest Oracle Java release

INFO  [main] 2018-01-29 16:58:19,490 SigarLibrary.java:44 - Initializing
SIGAR library

WARN  [main] 2018-01-29 16:58:19,498 SigarLibrary.java:174 - Cassandra
server running in degraded mode. Is swap disabled? : true,  Address space
adequate? : true,  nofile limit adequate? : false, nproc limit adequate? :
true

WARN  [main] 2018-01-29 16:58:19,500 StartupChecks.java:246 - Maximum number
of memory map areas per process (vm.max_map_count) 65530 is too low,
recommended value: 1048575, you can change it with sysctl.

 

WARN  [main] 2018-01-29 17:05:07,844 SystemKeyspace.java:1042 - No host ID
found, created 2dc59352-e98e-4e77-a5f2-289697e467c7 (Note: This should
happen exactly once per node).

INFO  [main] 2018-01-29 17:05:16,421 Server.java:160 - Starting listening
for CQL clients on /10.50.21.22:9042 (unencrypted)...

INFO  [main] 2018-01-29 17:05:16,449 CassandraDaemon.java:488 - Not starting
RPC server as requested. Use JMX (StorageService->startRPCServer()) or
nodetool (enablethrift) to start it

INFO  [OptionalTasks:1] 2018-01-29 17:05:18,443
CassandraRoleManager.java:350 - Created default superuser role 'cassandra'

INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,737
HintsService.java:212 - Paused hints dispatch

INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Server.java:180 -
Stop listening for CQL clients

INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,740
Gossiper.java:1490 - Announcing shutdown

INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,741
StorageService.java:1991 - Node /10.50.21.22 state jump to shutdown

INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:57,743
MessagingService.java:811 - Waiting for messaging service to quiesce

INFO  [ACCEPT-/10.50.21.22] 2018-01-29 17:09:57,743
MessagingService.java:1110 - MessagingService has terminated the accept()
thread

INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:57,797
HintsService.java:212 - Paused hints dispatch

 

Thanks,

Bhargav M.



Re: Cassandra nodes are down

2018-01-29 Thread Jeff Jirsa
Something is invoking the shutdown hook (calling kill). May be your config 
management or similar.

-- 
Jeff Jirsa


> On Jan 29, 2018, at 9:38 AM, Mokkapati, Bhargav (Nokia - IN/Chennai) 
>  wrote:
> 
> Hi Team,
>  
> I’m getting the below warnings. Please help me out to clear these issues.
>  
> Apache Cassandra version : 3.0.13, 5 Node cluster
>  
> INFO  [main] 2018-01-29 16:58:19,487 NativeLibrary.java:167 - JNA mlockall 
> successful
> WARN  [main] 2018-01-29 16:58:19,488 StartupChecks.java:121 - jemalloc shared 
> library could not be preloaded to speed up memory allocations
> INFO  [main] 2018-01-29 16:58:19,488 StartupChecks.java:160 - JMX is enabled 
> to receive remote connections on port: 8002
> WARN  [main] 2018-01-29 16:58:19,488 StartupChecks.java:178 - OpenJDK is not 
> recommended. Please upgrade to the newest Oracle Java release
> INFO  [main] 2018-01-29 16:58:19,490 SigarLibrary.java:44 - Initializing 
> SIGAR library
> WARN  [main] 2018-01-29 16:58:19,498 SigarLibrary.java:174 - Cassandra server 
> running in degraded mode. Is swap disabled? : true,  Address space adequate? 
> : true,  nofile limit adequate? : false, nproc limit adequate? : true
> WARN  [main] 2018-01-29 16:58:19,500 StartupChecks.java:246 - Maximum number 
> of memory map areas per process (vm.max_map_count) 65530 is too low, 
> recommended value: 1048575, you can change it with sysctl.
>  
> WARN  [main] 2018-01-29 17:05:07,844 SystemKeyspace.java:1042 - No host ID 
> found, created 2dc59352-e98e-4e77-a5f2-289697e467c7 (Note: This should happen 
> exactly once per node).
> INFO  [main] 2018-01-29 17:05:16,421 Server.java:160 - Starting listening for 
> CQL clients on /10.50.21.22:9042 (unencrypted)...
> INFO  [main] 2018-01-29 17:05:16,449 CassandraDaemon.java:488 - Not starting 
> RPC server as requested. Use JMX (StorageService->startRPCServer()) or 
> nodetool (enablethrift) to start it
> INFO  [OptionalTasks:1] 2018-01-29 17:05:18,443 CassandraRoleManager.java:350 
> - Created default superuser role 'cassandra'
> INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,737 
> HintsService.java:212 - Paused hints dispatch
> INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Server.java:180 - 
> Stop listening for CQL clients
> INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Gossiper.java:1490 
> - Announcing shutdown
> INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,741 
> StorageService.java:1991 - Node /10.50.21.22 state jump to shutdown
> INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:57,743 
> MessagingService.java:811 - Waiting for messaging service to quiesce
> INFO  [ACCEPT-/10.50.21.22] 2018-01-29 17:09:57,743 
> MessagingService.java:1110 - MessagingService has terminated the accept() 
> thread
> INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:57,797 
> HintsService.java:212 - Paused hints dispatch
>  
> Thanks,
> Bhargav M.


Cassandra nodes are down

2018-01-29 Thread Mokkapati, Bhargav (Nokia - IN/Chennai)
Hi Team,

I'm getting the below warnings. Please help me out to clear these issues.

Apache Cassandra version : 3.0.13, 5 Node cluster

INFO  [main] 2018-01-29 16:58:19,487 NativeLibrary.java:167 - JNA mlockall 
successful
WARN  [main] 2018-01-29 16:58:19,488 StartupChecks.java:121 - jemalloc shared 
library could not be preloaded to speed up memory allocations
INFO  [main] 2018-01-29 16:58:19,488 StartupChecks.java:160 - JMX is enabled to 
receive remote connections on port: 8002
WARN  [main] 2018-01-29 16:58:19,488 StartupChecks.java:178 - OpenJDK is not 
recommended. Please upgrade to the newest Oracle Java release
INFO  [main] 2018-01-29 16:58:19,490 SigarLibrary.java:44 - Initializing SIGAR 
library
WARN  [main] 2018-01-29 16:58:19,498 SigarLibrary.java:174 - Cassandra server 
running in degraded mode. Is swap disabled? : true,  Address space adequate? : 
true,  nofile limit adequate? : false, nproc limit adequate? : true
WARN  [main] 2018-01-29 16:58:19,500 StartupChecks.java:246 - Maximum number of 
memory map areas per process (vm.max_map_count) 65530 is too low, recommended 
value: 1048575, you can change it with sysctl.

WARN  [main] 2018-01-29 17:05:07,844 SystemKeyspace.java:1042 - No host ID 
found, created 2dc59352-e98e-4e77-a5f2-289697e467c7 (Note: This should happen 
exactly once per node).
INFO  [main] 2018-01-29 17:05:16,421 Server.java:160 - Starting listening for 
CQL clients on /10.50.21.22:9042 (unencrypted)...
INFO  [main] 2018-01-29 17:05:16,449 CassandraDaemon.java:488 - Not starting 
RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool 
(enablethrift) to start it
INFO  [OptionalTasks:1] 2018-01-29 17:05:18,443 CassandraRoleManager.java:350 - 
Created default superuser role 'cassandra'
INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,737 
HintsService.java:212 - Paused hints dispatch
INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Server.java:180 - 
Stop listening for CQL clients
INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,740 Gossiper.java:1490 - 
Announcing shutdown
INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:55,741 
StorageService.java:1991 - Node /10.50.21.22 state jump to shutdown
INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:57,743 
MessagingService.java:811 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.50.21.22] 2018-01-29 17:09:57,743 MessagingService.java:1110 
- MessagingService has terminated the accept() thread
INFO  [StorageServiceShutdownHook] 2018-01-29 17:09:57,797 
HintsService.java:212 - Paused hints dispatch

Thanks,
Bhargav M.


Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Moleza Moleza
Hi,
I am setting up a cluster on a linux box.
Everything seems to be working great and I am watching the ring with:
watch -d -n 2 nodetool -h localhost ring
Suddenly, I see that one of the nodes just went down (at 14:07):
Status changed from Up to Down.
13 minutes later (without any intervention) the node comes back Up (by itself).
I check the logs (see at end of text) on that node and see that there
is nothing in the log from 14:07 until 14:20 (13 minutes later).
I also notice the GC ConcurrentMarkSweep took 13 minutes.
Here are my questions:
[1] Is this behavior normal?
[2] Has it been observed by someone else before?
[3] The node being down means that nodetool, and any other client,
wont be able to connect to it (clients should use other nodes in
cluster to write data). Correct?
[4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
JVM cannot do anything else? Hence then node is technically Down?
Correct?
[5] Why is this GC taking such a long time? (see JMV ARGS posted bellow).
[6] Any JMV Args (switches) I can use to prevent this?
--
JVM_OPTS= \
   -Dprog=Cassandra \
   -ea \
   -Xms12G \
   -Xmx12G \
   -XX:+UseParNewGC \
   -XX:+UseConcMarkSweepGC \
   -XX:+CMSParallelRemarkEnabled \
   -XX:SurvivorRatio=8 \
   -XX:MaxTenuringThreshold=1 \
   -XX:+HeapDumpOnOutOfMemoryError \
   -Dcom.sun.management.jmxremote.port=8080 \
   -Dcom.sun.management.jmxremote.ssl=false \
   -Dcom.sun.management.jmxremote.authenticate=false


 Log Extract ##
INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line
116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296
used; max is 13005881344
INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
(line 357)HintsColumnFamily has reached its threshold; switching in a
fresh Memtable at
CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log',
position=55517352)
INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
(line 609) Enqueuing flush of
memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations)
INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line
148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433
operations)
INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line
116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208
used; max is 13005881344
INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line
116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed
leaving 196692400 used; max is 13005881344
--


Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Jonathan Ellis
GCs never take that long unless you're swapping.

On Sun, Aug 22, 2010 at 2:11 PM, Moleza Moleza mole...@gmail.com wrote:
 Hi,
 I am setting up a cluster on a linux box.
 Everything seems to be working great and I am watching the ring with:
 watch -d -n 2 nodetool -h localhost ring
 Suddenly, I see that one of the nodes just went down (at 14:07):
 Status changed from Up to Down.
 13 minutes later (without any intervention) the node comes back Up (by 
 itself).
 I check the logs (see at end of text) on that node and see that there
 is nothing in the log from 14:07 until 14:20 (13 minutes later).
 I also notice the GC ConcurrentMarkSweep took 13 minutes.
 Here are my questions:
 [1] Is this behavior normal?
 [2] Has it been observed by someone else before?
 [3] The node being down means that nodetool, and any other client,
 wont be able to connect to it (clients should use other nodes in
 cluster to write data). Correct?
 [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
 JVM cannot do anything else? Hence then node is technically Down?
 Correct?
 [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow).
 [6] Any JMV Args (switches) I can use to prevent this?
 --
 JVM_OPTS= \
       -Dprog=Cassandra \
       -ea \
       -Xms12G \
       -Xmx12G \
       -XX:+UseParNewGC \
       -XX:+UseConcMarkSweepGC \
       -XX:+CMSParallelRemarkEnabled \
       -XX:SurvivorRatio=8 \
       -XX:MaxTenuringThreshold=1 \
       -XX:+HeapDumpOnOutOfMemoryError \
       -Dcom.sun.management.jmxremote.port=8080 \
       -Dcom.sun.management.jmxremote.ssl=false \
       -Dcom.sun.management.jmxremote.authenticate=false

 
  Log Extract ##
 INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line
 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296
 used; max is 13005881344
 INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
 (line 357)HintsColumnFamily has reached its threshold; switching in a
 fresh Memtable at
 CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log',
 position=55517352)
 INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
 (line 609) Enqueuing flush of
 memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations)
 INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line
 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433
 operations)
 INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line
 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208
 used; max is 13005881344
 INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line
 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed
 leaving 196692400 used; max is 13005881344
 --




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Benjamin Black
http://riptano.blip.tv/file/4012133/

On Sun, Aug 22, 2010 at 12:11 PM, Moleza Moleza mole...@gmail.com wrote:
 Hi,
 I am setting up a cluster on a linux box.
 Everything seems to be working great and I am watching the ring with:
 watch -d -n 2 nodetool -h localhost ring
 Suddenly, I see that one of the nodes just went down (at 14:07):
 Status changed from Up to Down.
 13 minutes later (without any intervention) the node comes back Up (by 
 itself).
 I check the logs (see at end of text) on that node and see that there
 is nothing in the log from 14:07 until 14:20 (13 minutes later).
 I also notice the GC ConcurrentMarkSweep took 13 minutes.
 Here are my questions:
 [1] Is this behavior normal?
 [2] Has it been observed by someone else before?
 [3] The node being down means that nodetool, and any other client,
 wont be able to connect to it (clients should use other nodes in
 cluster to write data). Correct?
 [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
 JVM cannot do anything else? Hence then node is technically Down?
 Correct?
 [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow).
 [6] Any JMV Args (switches) I can use to prevent this?
 --
 JVM_OPTS= \
       -Dprog=Cassandra \
       -ea \
       -Xms12G \
       -Xmx12G \
       -XX:+UseParNewGC \
       -XX:+UseConcMarkSweepGC \
       -XX:+CMSParallelRemarkEnabled \
       -XX:SurvivorRatio=8 \
       -XX:MaxTenuringThreshold=1 \
       -XX:+HeapDumpOnOutOfMemoryError \
       -Dcom.sun.management.jmxremote.port=8080 \
       -Dcom.sun.management.jmxremote.ssl=false \
       -Dcom.sun.management.jmxremote.authenticate=false

 
  Log Extract ##
 INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line
 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296
 used; max is 13005881344
 INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
 (line 357)HintsColumnFamily has reached its threshold; switching in a
 fresh Memtable at
 CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log',
 position=55517352)
 INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
 (line 609) Enqueuing flush of
 memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations)
 INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line
 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433
 operations)
 INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line
 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208
 used; max is 13005881344
 INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line
 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed
 leaving 196692400 used; max is 13005881344
 --



Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Peter Schuller
 [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
 JVM cannot do anything else? Hence then node is technically Down?
 Correct?

No; the concurrent mark/sweep phase runs concurrently with your
application. CMS will cause a stop-the-world full pause it it fails to
complete a CMS sweep in time and you hit the maximum heap size, but
unless that happens, CMS will run concurrently (though there are
stop-the-world pauses involved, that are typically very short, the
mark/sweep phase is concurrent).

As jbellis pointed out, you're almost certainly swapping.


-- 
/ Peter Schuller