New ticket for backporting, referencing the existing. On Mon., 13 Aug. 2018, 22:50 Steinmaurer, Thomas, < thomas.steinmau...@dynatrace.com> wrote:
> Thanks Kurt. > > > > What is the proper workflow here to get this accepted? Create a new ticket > dedicated for the backport referencing 11540 or re-open 11540? > > > > Thanks for your help. > > > > Thomas > > > > *From:* kurt greaves <k...@instaclustr.com> > *Sent:* Montag, 13. August 2018 13:24 > *To:* User <user@cassandra.apache.org> > *Subject:* Re: Data Corruption due to multiple Cassandra 2.1 processes? > > > > Yeah that's not ideal and could lead to problems. I think corruption is > only likely if compactions occur, but seems like data loss is a potential > not to mention all sorts of other possible nasties that could occur running > two C*'s at once. Seems to me that 11540 should have gone to 2.1 in the > first place, but it just got missed. Very simple patch so I think a > backport should be accepted. > > > > On 7 August 2018 at 15:57, Steinmaurer, Thomas < > thomas.steinmau...@dynatrace.com> wrote: > > Hello, > > > > with 2.1, in case a second Cassandra process/instance is started on a host > (by accident), may this result in some sort of corruption, although > Cassandra will exit at some point in time due to not being able to bind TCP > ports already in use? > > > > What we have seen in this scenario is something like that: > > > > ERROR [main] 2018-08-05 21:10:24,046 CassandraDaemon.java:120 - Error > starting local jmx server: > > java.rmi.server.ExportException: Port already in use: 7199; nested > exception is: > > java.net.BindException: Address already in use (Bind > failed) > > … > > > > But then continuing with stuff like opening system and even user tables: > > > > INFO [main] 2018-08-05 21:10:24,060 CacheService.java:110 - Initializing > key cache with capacity of 100 MBs. > > INFO [main] 2018-08-05 21:10:24,067 CacheService.java:132 - Initializing > row cache with capacity of 0 MBs > > INFO [main] 2018-08-05 21:10:24,073 CacheService.java:149 - Initializing > counter cache with capacity of 50 MBs > > INFO [main] 2018-08-05 21:10:24,074 CacheService.java:160 - Scheduling > counter cache save to every 7200 seconds (going to save all keys). > > INFO [main] 2018-08-05 21:10:24,161 ColumnFamilyStore.java:365 - > Initializing system.sstable_activity > > INFO [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 SSTableReader.java:475 > - Opening > /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165 > (2023 bytes) > > INFO [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 SSTableReader.java:475 > - Opening > /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167 > (2336 bytes) > > INFO [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 SSTableReader.java:475 > - Opening > /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166 > (2686 bytes) > > INFO [main] 2018-08-05 21:10:24,755 ColumnFamilyStore.java:365 - > Initializing system.hints > > INFO [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 SSTableReader.java:475 > - Opening > /var/opt/xxx-managed/cassandra/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377 > (46210621 bytes) > > INFO [main] 2018-08-05 21:10:24,766 ColumnFamilyStore.java:365 - > Initializing system.compaction_history > > INFO [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 SSTableReader.java:475 > - Opening > /var/opt/xxx-managed/cassandra/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129 > (91269 bytes) > > … > > > > Replaying commit logs: > > > > … > > INFO [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:267 - > Replaying > /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log > > INFO [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:270 - > Replaying > /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log > (CL version 4, messaging version 8) > > … > > > > Even writing memtables already (below just pasted system tables, but also > user tables): > > > > … > > INFO [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 Memtable.java:347 - > Writing Memtable-size_estimates@1941663179(2.655MiB serialized bytes, > 325710 ops, 2%/0% of on/off-heap limit) > > INFO [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 Memtable.java:347 - > Writing Memtable-peer_events@1474667699(0.199KiB serialized bytes, 4 ops, > 0%/0% of on/off-heap limit) > > … > > > > Until it comes to a point where it can’t bind ports like the storage port > 7000: > > > > ERROR [main] 2018-08-05 21:11:54,350 CassandraDaemon.java:395 - Fatal > configuration error > > org.apache.cassandra.exceptions.ConfigurationException: /XXX:7000 is in > use by another process. Change listen_address:storage_port in > cassandra.yaml to values that do not conflict with other services > > at org.apache.cassandra.net > <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Forg.apache.cassandra.net&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cc9c83c6a323b4b24cffa08d60110530c%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=0ZMpjJBhOWznJH3SfvW%2BHBflSq%2F1Q4CCYvdy8jlcbT0%3D&reserved=0>.MessagingService.getServerSockets(MessagingService.java:495) > ~[apache-cassandra-2.1.18.jar:2.1.18] > > … > > > > Until Cassandra stops: > > > > … > > INFO [StorageServiceShutdownHook] 2018-08-05 21:11:54,361 > Gossiper.java:1454 - Announcing shutdown > > … > > > > > > So, we have around 2 minutes where Cassandra is mangling with existing > data, although it shouldn’t. > > > > Sounds like a potential candidate for data corruption, right? E.g. later > on we then see things like (still while being in progress to shutdown?): > > > > WARN [SharedPool-Worker-1] 2018-08-05 21:11:58,181 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > > java.lang.RuntimeException: java.io.FileNotFoundException: > /var/opt/xxx-managed/cassandra/xxx/xxx-fdc68b70950611e8ad7179f2d5bfa3cf/xxx-xxx-ka-15-Data.db > (No such file or directory) > > at org.apache.cassandra.io > <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Forg.apache.cassandra.io&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cc9c83c6a323b4b24cffa08d60110530c%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=emcFyFAYwIDkaS1DVC3f%2FZyY%2Bb77Qj4T%2Bdgr0mnjqEE%3D&reserved=0>.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:52) > ~[apache-cassandra-2.1.18.jar:2.1.18] > > at > org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95) > ~[apache-cassandra-2.1.18.jar:2.1.18] > > at > org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:62) > ~[apache-cassandra-2.1.18.jar:2.1.18] > > … > > > > > > I found this one here: > https://issues.apache.org/jira/browse/CASSANDRA-11540 > <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-11540&data=01%7C01%7Cthomas.steinmaurer%40dynatrace.com%7Cc9c83c6a323b4b24cffa08d60110530c%7C70ebe3a35b30435d9d677716d74ca190%7C1&sdata=K0fVRupYMv4%2B0uVjMQJ8jpd1dHIbYvFI%2BOGukAwJn7o%3D&reserved=0> > > > > So, if this all leads to corruption, might this be a candidate for a > backport for a 2.1 bugfix release? > > > > Thanks a lot! > > > > Thomas > > > > The contents of this e-mail are intended for the named addressee only. It > contains information that may be confidential. Unless you are the named > addressee or an authorized designee, you may not copy or use it, or > disclose it to anyone else. If you received it in error please notify us > immediately and then destroy it. Dynatrace Austria GmbH (registration > number FN 91482h) is a company registered in Linz whose registered office > is at 4040 Linz, Austria, Freistädterstraße 313 > > > The contents of this e-mail are intended for the named addressee only. It > contains information that may be confidential. Unless you are the named > addressee or an authorized designee, you may not copy or use it, or > disclose it to anyone else. If you received it in error please notify us > immediately and then destroy it. Dynatrace Austria GmbH (registration > number FN 91482h) is a company registered in Linz whose registered office > is at 4040 Linz, Austria, Freistädterstraße 313 >