[jira] [Commented] (CASSANDRA-14355) Memory leak

2022-10-24 Thread Doug Whitfield (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623385#comment-17623385
 ] 

Doug Whitfield commented on CASSANDRA-14355:


It has been over two years since there has been any comment on this. Is it safe 
to assume this was fixed in 3.11.5?

> Memory leak
> ---
>
> Key: CASSANDRA-14355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14355
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
> Environment: Debian Jessie, OpenJDK 1.8.0_151
>Reporter: Eric Evans
>Priority: Normal
> Fix For: 3.11.x
>
> Attachments: 01_Screenshot from 2018-04-04 14-24-00.png, 
> 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04 
> 14-24-50.png, LongGC_Dominator-Tree.png, LongGC_Histogram.png, 
> LongGC_Problem-Suspect-1_FastThreadLocalThread.png, LongGC_nodetool_info.txt
>
>
> We're seeing regular, frequent {{OutOfMemoryError}} exceptions.  Similar to 
> CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the 
> {{threadLocals}} member of the instances of 
> {{io.netty.util.concurrent.FastThreadLocalThread}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14401) Attempted serializing to buffer exceeded maximum of 65535 bytes

2022-05-25 Thread Doug Whitfield (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542057#comment-17542057
 ] 

Doug Whitfield commented on CASSANDRA-14401:


Has anyone seen this in anything after 3.11.4? I think we are seeing this in 
3.11.5.

>  Attempted serializing to buffer exceeded maximum of 65535 bytes
> 
>
> Key: CASSANDRA-14401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Artem Rokhin
>Priority: Normal
>  Labels: remove-reopen
>
> Cassandra version: 3.11.2
> 3 nodes cluster 
> The following exception appears on all 3 nodes and after awhile cluster 
> becomes unreposnsive 
>  
> {code}
> java.lang.AssertionError: Attempted serializing to buffer exceeded maximum of 
> 65535 bytes: 67661
>  at 
> org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:309)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>  at 
> org.apache.cassandra.db.filter.RowFilter$Expression$Serializer.serialize(RowFilter.java:547)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>  at 
> org.apache.cassandra.db.filter.RowFilter$Serializer.serialize(RowFilter.java:1143)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>  at 
> org.apache.cassandra.db.ReadCommand$Serializer.serialize(ReadCommand.java:726)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>  at 
> org.apache.cassandra.db.ReadCommand$Serializer.serialize(ReadCommand.java:683)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>  at 
> org.apache.cassandra.io.ForwardingVersionedSerializer.serialize(ForwardingVersionedSerializer.java:45)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
>  at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:120) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
>  at 
> org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:385)
>  [apache-cassandra-3.11.2.jar:3.11.2]
>  at 
> org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:337)
>  [apache-cassandra-3.11.2.jar:3.11.2]
>  at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:263)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13365) Nodes entering GC loop, does not recover

2022-04-27 Thread Doug Whitfield (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528902#comment-17528902
 ] 

Doug Whitfield commented on CASSANDRA-13365:


Do we know if this is a problem in the 4.x series?

Also, I don't see any reference to a version after 3.11.3 and the last comment 
was in 2018. Do we think maybe this got fixed by accident with some of the 
other improvements?

> Nodes entering GC loop, does not recover
> 
>
> Key: CASSANDRA-13365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13365
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
> Environment: 34-node cluster over 4 DCs
> Linux CentOS 7.2 x86
> Mix of 64GB/128GB RAM / node
> Mix of 32/40 hardware threads / node, Xeon ~2.4Ghz
> High read volume, low write volume, occasional sstable bulk loading
>Reporter: Mina Naguib
>Priority: Normal
>
> Over the last week we've been observing two related problems affecting our 
> Cassandra cluster
> Problem 1: 1-few nodes per DC entering GC loop, not recovering
> Checking the heap usage stats, there's a sudden jump of 1-3GB. Some nodes 
> recover, but some don't and log this:
> {noformat}
> 2017-03-21T11:23:02.957-0400: 54099.519: [Full GC (Allocation Failure)  
> 13G->11G(14G), 29.4127307 secs]
> 2017-03-21T11:23:45.270-0400: 54141.833: [Full GC (Allocation Failure)  
> 13G->12G(14G), 28.1561881 secs]
> 2017-03-21T11:24:20.307-0400: 54176.869: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.7019501 secs]
> 2017-03-21T11:24:50.528-0400: 54207.090: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.1372267 secs]
> 2017-03-21T11:25:19.190-0400: 54235.752: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.0703975 secs]
> 2017-03-21T11:25:46.711-0400: 54263.273: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.3187768 secs]
> 2017-03-21T11:26:15.419-0400: 54291.981: [Full GC (Allocation Failure)  
> 13G->13G(14G), 26.9493405 secs]
> 2017-03-21T11:26:43.399-0400: 54319.961: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.5222085 secs]
> 2017-03-21T11:27:11.383-0400: 54347.945: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.1769581 secs]
> 2017-03-21T11:27:40.174-0400: 54376.737: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.4639031 secs]
> 2017-03-21T11:28:08.946-0400: 54405.508: [Full GC (Allocation Failure)  
> 13G->13G(14G), 30.3480523 secs]
> 2017-03-21T11:28:40.117-0400: 54436.680: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.8220513 secs]
> 2017-03-21T11:29:08.459-0400: 54465.022: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.4691271 secs]
> 2017-03-21T11:29:37.114-0400: 54493.676: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.0275733 secs]
> 2017-03-21T11:30:04.635-0400: 54521.198: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.1902627 secs]
> 2017-03-21T11:30:32.114-0400: 54548.676: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.8872850 secs]
> 2017-03-21T11:31:01.430-0400: 54577.993: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.1609706 secs]
> 2017-03-21T11:31:29.024-0400: 54605.587: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.3635138 secs]
> 2017-03-21T11:31:57.303-0400: 54633.865: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.4143510 secs]
> 2017-03-21T11:32:25.110-0400: 54661.672: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.8595986 secs]
> 2017-03-21T11:32:53.922-0400: 54690.485: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.5242543 secs]
> 2017-03-21T11:33:21.867-0400: 54718.429: [Full GC (Allocation Failure)  
> 13G->13G(14G), 30.8930130 secs]
> 2017-03-21T11:33:53.712-0400: 54750.275: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.6523013 secs]
> 2017-03-21T11:34:21.760-0400: 54778.322: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.3030198 secs]
> 2017-03-21T11:34:50.073-0400: 54806.635: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.1594154 secs]
> 2017-03-21T11:35:17.743-0400: 54834.306: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.3766949 secs]
> 2017-03-21T11:35:45.797-0400: 54862.360: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.5756770 secs]
> 2017-03-21T11:36:13.816-0400: 54890.378: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.5541813 secs]
> 2017-03-21T11:36:41.926-0400: 54918.488: [Full GC (Allocation Failure)  
> 13G->13G(14G), 33.7510103 secs]
> 2017-03-21T11:37:16.132-0400: 54952.695: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.4856611 secs]
> 2017-03-21T11:37:44.454-0400: 54981.017: [Full GC (Allocation Failure)  
> 13G->13G(14G), 28.1269335 secs]
> 2017-03-21T11:38:12.774-0400: 55009.337: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.7830448 secs]
> 2017-03-21T11:38:40.840-0400: 55037.402: [Full GC (Allocation Failure)  
> 13G->13G(14G), 27.3527326 secs]
> 2017-03-21T11:39:08.610-0400: 

[jira] [Comment Edited] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2022-04-26 Thread Doug Whitfield (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528376#comment-17528376
 ] 

Doug Whitfield edited comment on CASSANDRA-16619 at 4/26/22 7:47 PM:
-

oops, I was wanting to do a search for things since 3.11.9 but I changed this 
bug. I clearly need more coffee. going to see if I can figure out what it was.

UPDATE: Unfortunately, this is not on Wayback...digging.

UPDATE: History tab to the rescue


was (Author: douglasawh):
oops, I was wanting to do a search for things since 3.11.9 but I changed this 
bug. I clearly need more coffee. going to see if I can figure out what it was.

UPDATE: Unfortunately, this is not on Wayback...digging.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2022-04-26 Thread Doug Whitfield (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Whitfield updated CASSANDRA-16619:
---
Since Version: 0.3  (was: 3.11.9)

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2022-04-26 Thread Doug Whitfield (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528376#comment-17528376
 ] 

Doug Whitfield edited comment on CASSANDRA-16619 at 4/26/22 7:45 PM:
-

oops, I was wanting to do a search for things since 3.11.9 but I changed this 
bug. I clearly need more coffee. going to see if I can figure out what it was.

UPDATE: Unfortunately, this is not on Wayback...digging.


was (Author: douglasawh):
oops, I was wanting to do a search for things since 3.11.9 but I changed this 
bug. I clearly need more coffee. going to see if I can figure out what it was.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2022-04-26 Thread Doug Whitfield (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528376#comment-17528376
 ] 

Doug Whitfield commented on CASSANDRA-16619:


oops, I was wanting to do a search for things since 3.11.9 but I changed this 
bug. I clearly need more coffee. going to see if I can figure out what it was.

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

2022-04-26 Thread Doug Whitfield (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Whitfield updated CASSANDRA-16619:
---
Since Version: 3.11.9  (was: 0.3)

> Loss of commit log data possible after sstable ingest
> -
>
> Key: CASSANDRA-16619
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These 
> positions are used to filter out mutations from the commit log on restart and 
> only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the 
> receiving node has not yet flushed, and result in valid data being lost 
> should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - 
> originatingHostId (UUID), which is the local host id of the node on which the 
> sstable was created, or null if not known. Commit log intervals from an 
> sstable are taken into account during Commit Log replay only when the 
> originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's 
> local hostId.
> For compacted sstables the originatingHostId set according to 
> StorageService's local hostId, and only commit log intervals from local 
> sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org