Re: how to debug slowdowns from these log snippets-more info 2

aaron morton Thu, 21 Feb 2013 09:15:49 -0800

Some things to consider: 

Check for contention around the switch lock. This can happen if you get a lot 
of tables flushing at the same time, or if you have a lot of secondary indexes. 
It shows up as a pattern in the logs. As soon a the writer starts flushing a 
memtable another will be queued. Probably not happening here but can be a pain 
when a lot of memtables are flushed.


I would turn on GC logging in cassandra-env.sh and watch that. After a full CMS 
flush how full / empty is the tenured heap ? If it is still got a lot in it 
then you are running with too much cache / bloom filter / index sampling. 

You can also experiment with the Max Tenuring Threshold, try turning it up to 4 
to start with. The GC logs will show you how much data is at each tenuring 
level. You can then see how much data is being tenuring, and if premature 
tenuring was an issue. I've seen premature tenuring cause issues with wide rows 
/ long reads. 

Hope that helps. 


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/02/2013, at 4:35 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:

> Oh, and my startup command that cassandra logged was
> 
> a2.bigde.nrel.gov: xss =  -ea -javaagent:/opt/cassandra/lib/jamm-0.2.5.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8021M -Xmx8021M
> -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
> 
> And I remember from docs you don't want to go above 8G or java GC doesn't
> work out so well.  I am not sure why this is not working out though.
> 
> Dean
> 
> On 2/20/13 7:16 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:
> 
>> Here is the printout before that log which is probably important as
>> wellŠ..
>> 
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 GCInspector.java (line
>> 122) GC for ConcurrentMarkSweep: 3618 ms for 2 collections, 7038159096
>> used; max is 8243904512
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
>> 57) Pool Name                    Active   Pending   Blocked
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 StatusLogger.java (line
>> 72) ReadStage                        11       264         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) RequestResponseStage              0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) ReadRepairStage                   0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) MutationStage                    12        88         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) ReplicateOnWriteStage             0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) GossipStage                       1         7         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,376 StatusLogger.java (line
>> 72) AntiEntropyStage                  0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) MigrationStage                    0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) StreamStage                       0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) MemtablePostFlusher               0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) FlushWriter                       0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) MiscStage                         0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,377 StatusLogger.java (line
>> 72) commitlog_archiver                0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 72) InternalResponseStage             0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 72) HintedHandoff                     0         0         0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 77) CompactionManager                 4         5
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 89) MessagingService                n/a    10,127
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 99) Cache Type                     Size                 Capacity
>>   KeysToSave    
>> Provider
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 100) KeyCache                    1310719                  1310719
>>           all
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 106) RowCache                          0                        0
>>           all   
>> org.apache.cassandra.cache.SerializingCacheProvider
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,378 StatusLogger.java (line
>> 113) ColumnFamily                Memtable ops,data
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.ServersThatLog                   0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.DecimalIndice             971,885311
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.EntityGroupXref                  0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.IntegerIndice            3256,620947
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.KeyToTableName                   0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.SdiTable                         0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.DboTableMeta               339,41537
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,379 StatusLogger.java (line
>> 116) databus5.nreldata                4528,3261763
>> INFO [GossipTasks:1] 2013-02-20 07:14:00,375 Gossiper.java (line 830)
>> InetAddress /10.10.43.5 is now dead.
>> INFO [GossipTasks:1] 2013-02-20 07:14:00,380 Gossiper.java (line 830)
>> InetAddress /10.10.43.8 is now dead.
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
>> 116) databus5.DboColumnMeta             809,112672
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,380 StatusLogger.java (line
>> 116) databus5.DboDatabaseMeta                  0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.WebNodeDbo                 2310,4470
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.User                             0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.RoleMapping                      0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.LogEvent                         0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,381 StatusLogger.java (line
>> 116) databus5.SecureResourceGroupXref                 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.SecurityGroup                    0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.StreamAggregation                 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.AppProperty                      0,0
>> INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
>> InetAddress /10.10.43.3 is now dead.
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.StringIndice               290,27231
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.MonitorDbo                   75,2983
>> INFO [GossipTasks:1] 2013-02-20 07:14:00,382 Gossiper.java (line 830)
>> InetAddress /10.10.43.7 is now dead.
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,382 StatusLogger.java (line
>> 116) databus5.SdiColumn                        0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.NodeIdInfo                         0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.IndexInfo                          0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.LocationInfo                       0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.Versions                           0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.schema_keyspaces                   0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.Migrations                         0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.schema_columnfamilies                 0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,383 StatusLogger.java (line
>> 116) system.schema_columns                     0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
>> 116) system.HintsColumnFamily                  0,0
>> INFO [ScheduledTasks:1] 2013-02-20 07:14:00,384 StatusLogger.java (line
>> 116) system.Schema                             0,0
>> WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 GCInspector.java (line
>> 145) Heap is 0.8537409774403752 full.  You may need to reduce memtable
>> and/or cache sizes.  Cassandra will now flush up to the two largest
>> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
>> in cassandra.yaml if you don't want Cassandra to do this automatically
>> WARN [ScheduledTasks:1] 2013-02-20 07:14:00,384 StorageService.java (line
>> 2855) Flushing CFS(Keyspace='databus5', ColumnFamily='nreldata') to
>> relieve memory pressure
>> 
>> 
>> 
>> On 2/20/13 7:10 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:
>> 
>>> Cassandra version 1.1.4
>>> 
>>> I captured all the logs of node causing timeouts (in a 6 node cluster).
>>> We seem to get these slowdowns every once in a while and it causes our
>>> whole website to be 10 times slower.  Since PlayOrm actually logs the
>>> rows being accessed we know exactly which row the timeout occurred on and
>>> asytanax logged the node which was nice.
>>> 
>>> Each time we get this website being 10 times slower, this is in the logs
>>> 
>>> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 GCInspector.java (line
>>> 145) Heap is 0.8532751700072093 full.  You may need to reduce memtable
>>> and/or cache sizes.  Cassandra will now flush up to the two largest
>>> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
>>> in cassandra.yaml if you don't want Cassandra to do this automatically
>>> WARN [ScheduledTasks:1] 2013-02-20 06:52:15,929 StorageService.java
>>> (line 2855) Flushing CFS(Keyspace='databus5',
>>> ColumnFamily='IntegerIndice') to relieve memory pressure
>>> 
>>> I seem to remember reading about we should turn off swap which I have not
>>> gotten around to at this point :(Šwill that help in that the node is
>>> taken out of the cluster or?  As you can see from top, this node has been
>>> running a 5,000,000 hours below(What's up with thatŠ.maybe the h does not
>>> actually mean hours) though I know it has been running for months.
>>> 
>>> Here is a top commandŠ.
>>> 
>>> top - 07:02:37 up 148 days, 21:11,  1 user,  load average: 1.10, 1.09,
>>> 1.09
>>> Tasks: 379 total,   2 running, 377 sleeping,   0 stopped,   0 zombie
>>> Cpu(s): 15.9%us,  1.0%sy,  2.9%ni, 79.9%id,  0.1%wa,  0.0%hi,  0.2%si,
>>> 0.0%st
>>> Mem:  32854680k total, 32582688k used,   271992k free,       44k buffers
>>> Swap: 33554424k total,    52180k used, 33502244k free, 22653528k cached
>>> 
>>> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 701 cassandr  20   0 63.3g  10g 1.9g S 838.4 32.2  5125794h java
>>> 
>>> What can I dump to see why we need to "relieve memory pressure"?
>>> 
>>> I ran jmap like so but it can't attach for some reason
>>> 
>>> [cassandra@a4 ~]$ jmap 701 -heap:format=b
>>> Attaching to core -heap:format=b from executable 701, please wait...
>>> Error attaching to core file: Can't attach to the core file
>>> 
>>> Jstack is not helping much either Š.
>>> 
>>> [cassandra@a4 ~]$ jstack -l 701 > threads.txt
>>> 701: Unable to open socket file: target process not responding or HotSpot
>>> VM not loaded
>>> The -F option can be used when the target process is not responding
>>> [cassandra@a4 ~]$ jstack -F -l 701 > threads.txt
>>> Attaching to process ID 701, please wait...
>>> Debugger attached successfully.
>>> Server compiler detected.
>>> JVM version is 20.7-b02
>>> java.lang.NullPointerException
>>> at 
>>> sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:42
>>> 6
>>> )
>>> at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
>>> at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
>>> at 
>>> sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
>>> at 
>>> sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
>>> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
>>> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
>>> at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
>>> at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
>>> at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
>>> :
>>> 39)
>>> at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
>>> m
>>> pl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>> at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
>>> at sun.tools.jstack.JStack.main(JStack.java:84)
>>> 
>>> Thanks,
>>> Dean
>> 
>

Re: how to debug slowdowns from these log snippets-more info 2

Reply via email to