Hi Amit,

I’m seeing “not marking as down” in the logs like this one,

WARN  [GossipTasks:1] 2016-12-29 08:48:02,665 FailureDetector.java:287 - Not 
marking nodes down due to local pause of 6641241564 > 5000000000

Now the end of the system.log files on all three nodes in one of the data 
centers are full of NullPointerExceptions and AssertionErrors like these below, 
would these errors be the cause or a symptom?


WARN  [SharedPool-Worker-1] 2017-01-02 07:13:56,441 
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-1,5,main]: {}
java.lang.NullPointerException: null
WARN  [SharedPool-Worker-1] 2017-01-02 07:15:02,865 
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-1,5,main]: {}
java.lang.AssertionError: null
                at 
org.apache.cassandra.db.rows.BufferCell.<init>(BufferCell.java:49) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:88) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:83) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.BufferCell.purge(BufferCell.java:175) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.ComplexColumnData.lambda$purge$107(ComplexColumnData.java:165)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:693) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:668) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.ComplexColumnData.transformAndFilter(ComplexColumnData.java:170)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.ComplexColumnData.purge(ComplexColumnData.java:165)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.ComplexColumnData.purge(ComplexColumnData.java:43) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.BTreeRow.lambda$purge$102(BTreeRow.java:333) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:693) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:668) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.BTreeRow.transformAndFilter(BTreeRow.java:338) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.BTreeRow.purge(BTreeRow.java:333) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.partitions.PurgeFunction.applyToRow(PurgeFunction.java:88)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:116) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:133)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:294)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:292) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:50)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_111]
                at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
 [apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[apache-cassandra-3.3.0.jar:3.3.0]
                at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
WARN  [SharedPool-Worker-2] 2017-01-02 07:15:03,132 
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-2,5,main]: {}
java.lang.RuntimeException: java.lang.NullPointerException
                at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2461)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_111]
                at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
 [apache-cassandra-3.3.0.jar:3.3.0]
                at 
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[apache-cassandra-3.3.0.jar:3.3.0]
                at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Caused by: java.lang.NullPointerException: null


RICHARD NEY
TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
+1 (978) 848.6640 WORK
+1 (916) 846.2353 MOBILE
UNITED STATES
richard....@aspect.com<mailto:richard....@aspect.com>
aspect.com<http://www.aspect.com/>

[mailSigLogo-rev.jpg]

From: Amit Singh F <amit.f.si...@ericsson.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, January 2, 2017 at 4:34 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: Trying to find cause of exception

Hello,

Few pointers :


a.)    Can you check in system.log for similar msgs like “marking as down”  on 
the node which gives err msg if yes, then please check for GC pause . Heavy 
load is one of the reason for this.

b.)    Can you try connecting cqlsh to that node once you get this kind of 
msgs. Are you able to connect?


Regards
Amit

From: Ney, Richard [mailto:richard....@aspect.com]
Sent: Monday, January 02, 2017 3:30 PM
To: user@cassandra.apache.org
Subject: Trying to find cause of exception

My development team has been trying to track down the cause of this Read 
timeout (30 seconds or more at times) exception below. We’re running a 2 data 
center deployment with 3 nodes in each data center. Our tables are setup with 
replication factor = 2 and we have 16G dedicated to the heap with the G1GC for 
garbage collection. Our systems are AWS M4.2xlarge with 8 CPUs and 32GB of RAM 
and we have 2 general purpose EBS volumes on each node of 500GB each. Once we 
start getting these timeouts the cluster doesn’t recover and we are required to 
shut all Cassandra node down and restart. If anyone has any tips on where to 
look or what commands to run to help us diagnose this issue we’d be eternally 
grateful.

2017-01-02 04:33:35.161 [ERROR] 
[report-compute.ffbec924-ce44-11e6-9e21-0adb9d2dd624] [reportCompute] 
[ahlworkerslave2.bos.manhattan.aspect-cloud.net:31312] [WorktypeMetrics] 
Persistence failure when replaying events for persistenceId 
[/fsms/pens/worktypes/bmwbpy.314]. Last known sequence number [0]
java.util.concurrent.ExecutionException: 
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency ONE (1 responses were required but only 0 
replica responded)
    at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
    at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
    at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at 
akka.persistence.cassandra.package$$anon$1$$anonfun$run$1.apply(package.scala:17)
    at scala.util.Try$.apply(Try.scala:192)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency ONE (1 responses were required but 
only 0 replica responded)
    at 
com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:115)
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
    at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:477)
    at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005)
    at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency ONE (1 responses were required but 
only 0 replica responded)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:62)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
    at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:266)
    at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:246)
    at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)


RICHARD NEY
TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
+1 (978) 848.6640 WORK
+1 (916) 846.2353 MOBILE
UNITED STATES
richard....@aspect.com<mailto:richard....@aspect.com>
aspect.com<http://www.aspect.com/>

[ailSigLogo-rev.jpg]
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

Reply via email to