Love the Mars lander analogies :) On Sep 23, 2012, at 5:39 PM, aaron morton wrote:
>> To put in other words, Cassandra will lock down all tables until all pending >> flush requests fit in the pending queue. > This was the first issue I looked at in my Cassandra SF talk > http://www.datastax.com/events/cassandrasummit2012/presentations > > I've seen it occur more often with lots-o-secondary indexes. > > >> We had plenty of write I/O available. We also had free memory. I increased >> the memtable_flush_writers to "2" and memtable_flush_queue_size to "8". We >> haven't had any timeouts for a number of days now. > Cool. > > Cheers > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 24/09/2012, at 6:09 AM, Michael Theroux <mthero...@yahoo.com> wrote: > >> There were no errors in the log (other than the messages dropped exception >> pasted below), and the node does recover. We have only a small number of >> secondary indexes (3 in the whole system). >> >> However, I went through the cassandra code, and I believe I've worked >> through this problem. >> >> Just to finish out this thread, I realized that when you see: >> >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line >> 72) FlushWriter 1 5 0 >> >> It is an issue. Cassandra will at various times enqueue many memtables for >> flushing. By default, the queue size for this is 4. If more than 5 >> memtables get queued for flushing (4 + 1 for the one currently being >> flushed), a lock will be acquired and held across all tables until all >> memtables that need to be flushed are enqueued. If it takes more than >> rpc_timeout_time_in_ms time to flush enough information to allow all the >> pending memtables to be enqueued, a "messages dropped" will occur. To put >> in other words, Cassandra will lock down all tables until all pending flush >> requests fit in the pending queue. If your queue size is 4, and 8 tables >> need to be flushed, Cassandra will lock down all tables until a minimum of 3 >> memtables are flushed. >> >> With this in mind, I went through the cassandra log and found this was >> indeed the case looking at log entries similar to these: >> >> INFO [OptionalTasks:1] 2012-09-16 05:54:29,750 ColumnFamilyStore.java (line >> 643) Enqueuing flush of Memtable-p@1525015234(18686281/341486464 >> serialized/live bytes, 29553 ops) >> ... >> INFO [FlushWriter:29] 2012-09-16 05:54:29,768 Memtable.java (line 266) >> Writing Memtable-p@1525015234(18686281/341486464 serialized/live bytes, >> 29553 ops) >> ... >> INFO [FlushWriter:29] 2012-09-16 05:54:30,254 Memtable.java (line 307) >> Completed flushing /data/cassandra/data/open/people/open-p-hd-441-Data.db >> >> I was able to figure out what the rpc_timeout_in_ms needed to be to >> temporarily prevent the problem. >> >> We had plenty of write I/O available. We also had free memory. I increased >> the memtable_flush_writers to "2" and memtable_flush_queue_size to "8". We >> haven't had any timeouts for a number of days now. >> >> Thanks for your help, >> -Mike >> >> On Sep 18, 2012, at 5:14 AM, aaron morton wrote: >> >>> Any errors in the log ? >>> >>> The node recovers ? >>> >>> Do you use secondary indexes ? If so check comments for >>> memtable_flush_queue_size in the yaml. if this value is too low writes may >>> back up. But I would not expect it to cause dropped messages. >>> >>>> nodetool info also shows we have over a gig of available memory on the JVM >>>> heap of each node. >>> >>> Not all memory is created equal :) >>> ParNew is kicking in to GC the Eden space in the New Heap. >>> >>> It may just be that the node is getting hammered by something and IO is >>> getting overwhelmed. If you can put the logs up someone might take a look. >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 18/09/2012, at 3:46 PM, Michael Theroux <mthero...@yahoo.com> wrote: >>> >>>> Thanks for the response. >>>> >>>> We are on version 1.1.2. We don't see the MutationStage back up. The >>>> dump from the messages dropped error doesn't show a backup, but also >>>> watching "nodetool tpstats" doesn't show any backup there. >>>> >>>> nodetool info also shows we have over a gig of available memory on the JVM >>>> heap of each node. >>>> >>>> The earliest GCInspector traces I see before one of the more recent >>>> incidents in which messages were dropped are: >>>> >>>> INFO [ScheduledTasks:1] 2012-09-18 02:25:53,928 GCInspector.java (line >>>> 122) GC for ParNew: 396 ms for 1 collections, 2064505088 used; max is >>>> 4253024256 >>>> >>>> NFO [ScheduledTasks:1] 2012-09-18 02:25:55,929 GCInspector.java (line >>>> 122) GC for ParNew: 485 ms for 1 collections, 1961875064 used; max is >>>> 4253024256 >>>> >>>> INFO [ScheduledTasks:1] 2012-09-18 02:25:57,930 GCInspector.java (line >>>> 122) GC for ParNew: 265 ms for 1 collections, 1968074096 used; max is >>>> 4253024256 >>>> >>>> But this was 45 minutes before messages were dropped. >>>> >>>> It's appreciated, >>>> -Mike >>>> >>>> On Sep 17, 2012, at 11:27 PM, aaron morton wrote: >>>> >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line >>>>>> 72) MemtablePostFlusher 1 5 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line >>>>>> 72) FlushWriter 1 5 0 >>>>> Looks suspiciously like >>>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3c9fb0e801-b1ed-41c4-9939-bafbddf15...@thelastpickle.com%3E >>>>> >>>>> What version are you on ? >>>>> >>>>> Are there any ERROR log messages before this ? >>>>> >>>>> Are you seeing MutationStage back up ? >>>>> >>>>> Are you see log messages from GCInspector ? >>>>> >>>>> Cheers >>>>> >>>>> ----------------- >>>>> Aaron Morton >>>>> Freelance Developer >>>>> @aaronmorton >>>>> http://www.thelastpickle.com >>>>> >>>>> On 18/09/2012, at 2:16 AM, Michael Theroux <mthero...@yahoo.com> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> While under load, we have occasionally been seeing "messages dropped" >>>>>> errors in our cassandra log. Doing some research, I understand this is >>>>>> part of Cassandra's design to shed load, and we should look at the >>>>>> tpstats-like output to determine what should be done to resolve the >>>>>> situation. Typically, you will see lots of messages blocked or pending, >>>>>> and that might be an indicator that a specific part of hardware needs to >>>>>> be improved/tuned/upgraded. >>>>>> >>>>>> However, looking at the output we are getting, I'm finding it difficult >>>>>> to see what needs to be tuned, as it looks to me cassandra is handling >>>>>> the load within the mutation stage: >>>>>> >>>>>> NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java >>>>>> (line 658) 119 MUTATION messages dropped in last 5000ms >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line >>>>>> 57) Pool Name Active Pending Blocked >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line >>>>>> 72) ReadStage 3 3 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line >>>>>> 72) RequestResponseStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line >>>>>> 72) ReadRepairStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line >>>>>> 72) MutationStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line >>>>>> 72) ReplicateOnWriteStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line >>>>>> 72) GossipStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line >>>>>> 72) AntiEntropyStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line >>>>>> 72) MigrationStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line >>>>>> 72) StreamStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line >>>>>> 72) MemtablePostFlusher 1 5 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line >>>>>> 72) FlushWriter 1 5 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line >>>>>> 72) MiscStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line >>>>>> 72) commitlog_archiver 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line >>>>>> 72) InternalResponseStage 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line >>>>>> 72) AntiEntropySessions 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line >>>>>> 72) HintedHandoff 0 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line >>>>>> 77) CompactionManager 0 0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line >>>>>> 89) MessagingService n/a 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line >>>>>> 99) Cache Type Size Capacity >>>>>> KeysToSave >>>>>> Provider >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line >>>>>> 100) KeyCache 2184533 2184533 >>>>>> all >>>>>> >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line >>>>>> 106) RowCache 0 0 >>>>>> all >>>>>> org.apache.cassandra.cache.SerializingCacheProvider >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line >>>>>> 113) ColumnFamily Memtable ops,data >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line >>>>>> 116) system.NodeIdInfo 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line >>>>>> 116) system.IndexInfo 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line >>>>>> 116) system.LocationInfo 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line >>>>>> 116) system.Versions 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line >>>>>> 116) system.schema_keyspaces 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line >>>>>> 116) system.Migrations 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line >>>>>> 116) system.schema_columnfamilies 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line >>>>>> 116) system.schema_columns 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line >>>>>> 116) system.HintsColumnFamily 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line >>>>>> 116) system.Schema 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line >>>>>> 116) prod.comp 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line >>>>>> 116) prod.byp 224,61863 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line >>>>>> 116) prod.byn 30,6106 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line >>>>>> 116) prod.p 3354,3792954 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java (line >>>>>> 116) prod.rel 788,153294 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java (line >>>>>> 116) prod.i 46,271517 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line >>>>>> 116) prod.users 18489,5650405 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line >>>>>> 116) prod.caches 40,7512998 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line >>>>>> 116) prod.sessions 8085,13063121 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line >>>>>> 116) prod.content 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java (line >>>>>> 116) prod.enr 436,341190 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java (line >>>>>> 116) prod.byt 23800,9880339 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line >>>>>> 116) prod.aliases 63603,10475769 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line >>>>>> 116) prod.ran 1500,329387 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line >>>>>> 116) prod.cmap 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line >>>>>> 116) prod.pmap 73032,61237450 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java (line >>>>>> 116) prod.pic 5421,4438066 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java (line >>>>>> 116) prod.watchdog 0,0 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java (line >>>>>> 116) prod.s 31,40924 >>>>>> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java (line >>>>>> 116) prod.m 21690,64262284 >>>>>> >>>>>> >>>>>> We are using a write consistency level of LOCAL_QUROUM over a six node >>>>>> cluster. Any observations are welcome, >>>>>> >>>>>> Thanks, >>>>>> -Mike >>>>> >>>> >>> >> >