[jira] [Commented] (CASSANDRA-14355) Memory leak
[ https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623385#comment-17623385 ] Doug Whitfield commented on CASSANDRA-14355: It has been over two years since there has been any comment on this. Is it safe to assume this was fixed in 3.11.5? > Memory leak > --- > > Key: CASSANDRA-14355 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14355 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core > Environment: Debian Jessie, OpenJDK 1.8.0_151 >Reporter: Eric Evans >Priority: Normal > Fix For: 3.11.x > > Attachments: 01_Screenshot from 2018-04-04 14-24-00.png, > 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04 > 14-24-50.png, LongGC_Dominator-Tree.png, LongGC_Histogram.png, > LongGC_Problem-Suspect-1_FastThreadLocalThread.png, LongGC_nodetool_info.txt > > > We're seeing regular, frequent {{OutOfMemoryError}} exceptions. Similar to > CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the > {{threadLocals}} member of the instances of > {{io.netty.util.concurrent.FastThreadLocalThread}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14401) Attempted serializing to buffer exceeded maximum of 65535 bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542057#comment-17542057 ] Doug Whitfield commented on CASSANDRA-14401: Has anyone seen this in anything after 3.11.4? I think we are seeing this in 3.11.5. > Attempted serializing to buffer exceeded maximum of 65535 bytes > > > Key: CASSANDRA-14401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14401 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Artem Rokhin >Priority: Normal > Labels: remove-reopen > > Cassandra version: 3.11.2 > 3 nodes cluster > The following exception appears on all 3 nodes and after awhile cluster > becomes unreposnsive > > {code} > java.lang.AssertionError: Attempted serializing to buffer exceeded maximum of > 65535 bytes: 67661 > at > org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:309) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.db.filter.RowFilter$Expression$Serializer.serialize(RowFilter.java:547) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.db.filter.RowFilter$Serializer.serialize(RowFilter.java:1143) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.db.ReadCommand$Serializer.serialize(ReadCommand.java:726) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.db.ReadCommand$Serializer.serialize(ReadCommand.java:683) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.io.ForwardingVersionedSerializer.serialize(ForwardingVersionedSerializer.java:45) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:120) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:385) > [apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:337) > [apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:263) > [apache-cassandra-3.11.2.jar:3.11.2] > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13365) Nodes entering GC loop, does not recover
[ https://issues.apache.org/jira/browse/CASSANDRA-13365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528902#comment-17528902 ] Doug Whitfield commented on CASSANDRA-13365: Do we know if this is a problem in the 4.x series? Also, I don't see any reference to a version after 3.11.3 and the last comment was in 2018. Do we think maybe this got fixed by accident with some of the other improvements? > Nodes entering GC loop, does not recover > > > Key: CASSANDRA-13365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13365 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core > Environment: 34-node cluster over 4 DCs > Linux CentOS 7.2 x86 > Mix of 64GB/128GB RAM / node > Mix of 32/40 hardware threads / node, Xeon ~2.4Ghz > High read volume, low write volume, occasional sstable bulk loading >Reporter: Mina Naguib >Priority: Normal > > Over the last week we've been observing two related problems affecting our > Cassandra cluster > Problem 1: 1-few nodes per DC entering GC loop, not recovering > Checking the heap usage stats, there's a sudden jump of 1-3GB. Some nodes > recover, but some don't and log this: > {noformat} > 2017-03-21T11:23:02.957-0400: 54099.519: [Full GC (Allocation Failure) > 13G->11G(14G), 29.4127307 secs] > 2017-03-21T11:23:45.270-0400: 54141.833: [Full GC (Allocation Failure) > 13G->12G(14G), 28.1561881 secs] > 2017-03-21T11:24:20.307-0400: 54176.869: [Full GC (Allocation Failure) > 13G->13G(14G), 27.7019501 secs] > 2017-03-21T11:24:50.528-0400: 54207.090: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1372267 secs] > 2017-03-21T11:25:19.190-0400: 54235.752: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0703975 secs] > 2017-03-21T11:25:46.711-0400: 54263.273: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3187768 secs] > 2017-03-21T11:26:15.419-0400: 54291.981: [Full GC (Allocation Failure) > 13G->13G(14G), 26.9493405 secs] > 2017-03-21T11:26:43.399-0400: 54319.961: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5222085 secs] > 2017-03-21T11:27:11.383-0400: 54347.945: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1769581 secs] > 2017-03-21T11:27:40.174-0400: 54376.737: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4639031 secs] > 2017-03-21T11:28:08.946-0400: 54405.508: [Full GC (Allocation Failure) > 13G->13G(14G), 30.3480523 secs] > 2017-03-21T11:28:40.117-0400: 54436.680: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8220513 secs] > 2017-03-21T11:29:08.459-0400: 54465.022: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4691271 secs] > 2017-03-21T11:29:37.114-0400: 54493.676: [Full GC (Allocation Failure) > 13G->13G(14G), 27.0275733 secs] > 2017-03-21T11:30:04.635-0400: 54521.198: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1902627 secs] > 2017-03-21T11:30:32.114-0400: 54548.676: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8872850 secs] > 2017-03-21T11:31:01.430-0400: 54577.993: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1609706 secs] > 2017-03-21T11:31:29.024-0400: 54605.587: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3635138 secs] > 2017-03-21T11:31:57.303-0400: 54633.865: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4143510 secs] > 2017-03-21T11:32:25.110-0400: 54661.672: [Full GC (Allocation Failure) > 13G->13G(14G), 27.8595986 secs] > 2017-03-21T11:32:53.922-0400: 54690.485: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5242543 secs] > 2017-03-21T11:33:21.867-0400: 54718.429: [Full GC (Allocation Failure) > 13G->13G(14G), 30.8930130 secs] > 2017-03-21T11:33:53.712-0400: 54750.275: [Full GC (Allocation Failure) > 13G->13G(14G), 27.6523013 secs] > 2017-03-21T11:34:21.760-0400: 54778.322: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3030198 secs] > 2017-03-21T11:34:50.073-0400: 54806.635: [Full GC (Allocation Failure) > 13G->13G(14G), 27.1594154 secs] > 2017-03-21T11:35:17.743-0400: 54834.306: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3766949 secs] > 2017-03-21T11:35:45.797-0400: 54862.360: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5756770 secs] > 2017-03-21T11:36:13.816-0400: 54890.378: [Full GC (Allocation Failure) > 13G->13G(14G), 27.5541813 secs] > 2017-03-21T11:36:41.926-0400: 54918.488: [Full GC (Allocation Failure) > 13G->13G(14G), 33.7510103 secs] > 2017-03-21T11:37:16.132-0400: 54952.695: [Full GC (Allocation Failure) > 13G->13G(14G), 27.4856611 secs] > 2017-03-21T11:37:44.454-0400: 54981.017: [Full GC (Allocation Failure) > 13G->13G(14G), 28.1269335 secs] > 2017-03-21T11:38:12.774-0400: 55009.337: [Full GC (Allocation Failure) > 13G->13G(14G), 27.7830448 secs] > 2017-03-21T11:38:40.840-0400: 55037.402: [Full GC (Allocation Failure) > 13G->13G(14G), 27.3527326 secs] > 2017-03-21T11:39:08.610-0400:
[jira] [Comment Edited] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528376#comment-17528376 ] Doug Whitfield edited comment on CASSANDRA-16619 at 4/26/22 7:47 PM: - oops, I was wanting to do a search for things since 3.11.9 but I changed this bug. I clearly need more coffee. going to see if I can figure out what it was. UPDATE: Unfortunately, this is not on Wayback...digging. UPDATE: History tab to the rescue was (Author: douglasawh): oops, I was wanting to do a search for things since 3.11.9 but I changed this bug. I clearly need more coffee. going to see if I can figure out what it was. UPDATE: Unfortunately, this is not on Wayback...digging. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Whitfield updated CASSANDRA-16619: --- Since Version: 0.3 (was: 3.11.9) > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528376#comment-17528376 ] Doug Whitfield edited comment on CASSANDRA-16619 at 4/26/22 7:45 PM: - oops, I was wanting to do a search for things since 3.11.9 but I changed this bug. I clearly need more coffee. going to see if I can figure out what it was. UPDATE: Unfortunately, this is not on Wayback...digging. was (Author: douglasawh): oops, I was wanting to do a search for things since 3.11.9 but I changed this bug. I clearly need more coffee. going to see if I can figure out what it was. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528376#comment-17528376 ] Doug Whitfield commented on CASSANDRA-16619: oops, I was wanting to do a search for things since 3.11.9 but I changed this bug. I clearly need more coffee. going to see if I can figure out what it was. > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest
[ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Whitfield updated CASSANDRA-16619: --- Since Version: 3.11.9 (was: 0.3) > Loss of commit log data possible after sstable ingest > - > > Key: CASSANDRA-16619 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16619 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > SSTable metadata contains commit log positions of the sstable. These > positions are used to filter out mutations from the commit log on restart and > only make sense for the node on which the data was flushed. > If an SSTable is moved between nodes they may cover regions that the > receiving node has not yet flushed, and result in valid data being lost > should these sections of the commit log need to be replayed. > Solution: > The chosen solution introduces a new sstable metadata (StatsMetadata) - > originatingHostId (UUID), which is the local host id of the node on which the > sstable was created, or null if not known. Commit log intervals from an > sstable are taken into account during Commit Log replay only when the > originatingHostId of the sstable matches the local node's hostId. > For new sstables the originatingHostId is set according to StorageService's > local hostId. > For compacted sstables the originatingHostId set according to > StorageService's local hostId, and only commit log intervals from local > sstables is preserved in the resulting sstable. > discovered by [~jakubzytka] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org