Hi, This one is old, do you still need help there? Sorry we missed it.
1. What Cassandra version do you use? 2. What does "nodetool tpstats" show you. Any dropped or pending message? 3. Is your error a full heap memory issue or native one? 4. What configurations did you change from default and you think might be related to this (memtable size, GC config, bloom filters...)? the cluster starts flapping between being down and up Using AWS, even more with small instances make sure to use phi_convict_threshold set to about 12, just in case. This will prevent node from flapping that much. C*heers, ----------------------- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-04-12 19:53 GMT+02:00 Bo Finnerup Madsen <bo.gunder...@gmail.com>: > Hi, > > We have an application that reads data from a set of external sources and > loads them into our cassandra cluster. The load goes ok for some time > (~24h) and then some servers in the cluster starts flapping between being > down and up, and finally they go out of memory. > The cluster consists of 5 m4.xlarge machines with 16gb memory, cassandra > has an 8gb heap. All machines have a high load while data is being written, > with a load between 6 and 20. > > I have tried sifting through the information available from nodetool, but > I am unable to find anything helping me determine what is causing the oom. > I am quite new to cassandra, so I might very well overlook the obvious. So > any pointers on how to proceed with identifying the problem will be much > appriciated :) > > In the following I have included information from 10.61.70.110 when it was > flapping. > > Status for ddp keyspace(only keyspace containing any real data): > > ----------------------------------------------------------------------------- > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.61.70.108 65.97 GB 256 59,1% > de79a554-9296-4575-8b79-2089f92069cd rack1 > UN 10.61.70.110 58.95 GB 256 63,3% > 310460f6-b7ce-45a7-be63-a7dd409f6b17 rack1 > UN 10.61.70.72 58.17 GB 256 60,3% > 44fd4f8e-18cd-4487-8174-3a22fb9ed24f rack1 > UN 10.61.70.107 58.69 GB 256 58,5% > f8118fc2-e340-45db-a06e-a5842107d6c8 rack1 > UN 10.61.70.64 68 GB 256 58,7% > 84bee9fe-2adc-48aa-915c-f43d972f5a2f rack1 > > ----------------------------------------------------------------------------- > > > Snippet from system.log: > > ----------------------------------------------------------------------------- > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,547 MessagingService.java:980 > - MUTATION messages were dropped in last 5000 ms: 5776 for internal timeout > and 0 for cross node timeout > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,547 StatusLogger.java:52 - > Pool Name Active Pending Completed Blocked All > Time Blocked > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,551 StatusLogger.java:56 - > MutationStage 32 4881705 17826061870 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,552 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,552 StatusLogger.java:56 - > ReadStage 0 0 3266887 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,553 StatusLogger.java:56 - > RequestResponseStage 0 0 389429305 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,553 StatusLogger.java:56 - > ReadRepairStage 0 0 322804 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,554 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,554 StatusLogger.java:56 - > MiscStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,554 StatusLogger.java:56 - > CompactionExecutor 4 54 31305 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 - > MemtableReclaimMemory 0 0 3310 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 - > PendingRangeCalculator 0 0 10 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 - > GossipStage 0 0 338170 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 - > SecondaryIndexManagement 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,556 StatusLogger.java:56 - > HintsDispatcher 1 4 6264 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,558 StatusLogger.java:56 - > MigrationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,559 StatusLogger.java:56 - > MemtablePostFlush 0 0 3451 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,560 StatusLogger.java:56 - > ValidationExecutor 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,561 StatusLogger.java:56 - > Sampler 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,563 StatusLogger.java:56 - > MemtableFlushWriter 0 0 3310 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,564 StatusLogger.java:56 - > InternalResponseStage 0 0 1873184 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,565 StatusLogger.java:56 - > AntiEntropyStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,565 StatusLogger.java:56 - > CacheCleanupExecutor 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,566 StatusLogger.java:56 - > Native-Transport-Requests 3 2 212872837 0 > 1796 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,566 StatusLogger.java:66 - > CompactionManager 4 32 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,566 StatusLogger.java:78 - > MessagingService n/a 0/0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,567 StatusLogger.java:88 - > Cache Type Size Capacity > KeysToSave > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,568 StatusLogger.java:90 - > KeyCache 88654200 104857600 > all > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,568 StatusLogger.java:96 - > RowCache 0 0 > all > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,568 StatusLogger.java:103 - > Table Memtable ops,data > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,569 StatusLogger.java:106 - > ddp.fingerprint_by_content_uuid_mv 99624,17726363 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,569 StatusLogger.java:106 - > ddp.sync 4747,1551 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,569 StatusLogger.java:106 - > ddp.log_portal 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,570 StatusLogger.java:106 - > ddp.meta_data 2083,3829424 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,570 StatusLogger.java:106 - > ddp.configuration 199,13915 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,570 StatusLogger.java:106 - > ddp.uuids_by_related_uuid 414431,17721698 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,570 StatusLogger.java:106 - > ddp.file_by_file_id 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,571 StatusLogger.java:106 - > ddp.fingerprint_by_content_type_mv 83007,3317002 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,571 StatusLogger.java:106 - > ddp.heartbeat 35855,13407 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,574 StatusLogger.java:106 - > ddp.concept 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,575 StatusLogger.java:106 - > ddp.classification_scheme 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,576 StatusLogger.java:106 - > ddp.rendering_relations_mv 9665,1201123 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,576 StatusLogger.java:106 - > ddp.file 41,19826 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,576 StatusLogger.java:106 - > ddp.rendering_relations 276049,13465909 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,576 StatusLogger.java:106 - > ddp.semantic_group 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,577 StatusLogger.java:106 - > ddp.rendering 23344,68861340 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,577 StatusLogger.java:106 - > ddp.thesauri 44,8 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,577 StatusLogger.java:106 - > ddp.file_download 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,577 StatusLogger.java:106 - > ddp.uuids_by_related_uuid_mv 172394,14524712 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,578 StatusLogger.java:106 - > ddp.fingerprint 124884,54955730 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,578 StatusLogger.java:106 - > ddp.ddp_status 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,578 StatusLogger.java:106 - > ddp.log_portal_mv 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,578 StatusLogger.java:106 - > system_distributed.parent_repair_history 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,579 StatusLogger.java:106 - > system_distributed.repair_history 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,579 StatusLogger.java:106 - > system.compaction_history 20,4595 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,580 StatusLogger.java:106 - > system.hints 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,580 StatusLogger.java:106 - > system.schema_aggregates 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,580 StatusLogger.java:106 - > system.IndexInfo 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,581 StatusLogger.java:106 - > system.schema_columnfamilies 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,581 StatusLogger.java:106 - > system.schema_triggers 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,582 StatusLogger.java:106 - > system.size_estimates 50400,764904 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,582 StatusLogger.java:106 - > system.schema_functions 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,582 StatusLogger.java:106 - > system.paxos 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,582 StatusLogger.java:106 - > system.views_builds_in_progress 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,583 StatusLogger.java:106 - > system.built_views 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,583 StatusLogger.java:106 - > system.peer_events 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,583 StatusLogger.java:106 - > system.range_xfers 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,583 StatusLogger.java:106 - > system.peers 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,584 StatusLogger.java:106 - > system.batches 188441,33975764 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,584 StatusLogger.java:106 - > system.schema_keyspaces 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,585 StatusLogger.java:106 - > system.schema_usertypes 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,585 StatusLogger.java:106 - > system.local 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,585 StatusLogger.java:106 - > system.sstable_activity 1376,24807 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,587 StatusLogger.java:106 - > system.available_ranges 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,587 StatusLogger.java:106 - > system.batchlog 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,588 StatusLogger.java:106 - > system.schema_columns 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,589 StatusLogger.java:106 - > system_schema.columns 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,589 StatusLogger.java:106 - > system_schema.types 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,590 StatusLogger.java:106 - > system_schema.indexes 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,590 StatusLogger.java:106 - > system_schema.keyspaces 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,590 StatusLogger.java:106 - > system_schema.dropped_columns 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,592 StatusLogger.java:106 - > system_schema.aggregates 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,592 StatusLogger.java:106 - > system_schema.triggers 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,592 StatusLogger.java:106 - > system_schema.tables 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,592 StatusLogger.java:106 - > system_schema.views 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,593 StatusLogger.java:106 - > system_schema.functions 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,593 StatusLogger.java:106 - > system_auth.roles 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,593 StatusLogger.java:106 - > system_auth.role_members 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,593 StatusLogger.java:106 - > system_auth.resource_role_permissons_index 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,594 StatusLogger.java:106 - > system_auth.role_permissions 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,594 StatusLogger.java:106 - > system_traces.sessions 0,0 > INFO [ScheduledTasks:1] 2016-04-12 17:35:35,594 StatusLogger.java:106 - > system_traces.events 0,0 > > ----------------------------------------------------------------------------- > > compactionstats: > > ----------------------------------------------------------------------------- > pending tasks: 32 > id compaction type keyspace > table completed total unit progress > 3bf89310-00d1-11e6-a5a3-f125ce747d55 Compaction ddp > rendering_relations 1,09 GB 5,43 GB bytes 20,10% > 9c5bd6b1-00d4-11e6-a5a3-f125ce747d55 Compaction ddp > meta_data 243,64 MB 338,56 MB bytes 71,96% > 50308cb0-00d2-11e6-a5a3-f125ce747d55 Compaction ddp > uuids_by_related_uuid 1,37 GB 2,17 GB bytes 63,11% > e8965d90-00c3-11e6-a5a3-f125ce747d55 Compaction ddp > rendering 19,02 GB 24,09 GB bytes 78,96% > > ----------------------------------------------------------------------------- > > Thank you in advance :) > > Yours sincerely, > Bo Madsen >