[jira] [Updated] (CASSANDRA-18786) Javadoc BigFormat

2023-09-24 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-18786:

Source Control Link: 
https://github.com/apache/cassandra/commit/9aa2109803a6dd53db36b058e89e7b431762ded2
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Javadoc BigFormat
> -
>
> Key: CASSANDRA-18786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18786
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation/Javadoc
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0.x
>
> Attachments: screenshot-1.png
>
>
> This ticket intends to go through the current sstables code and javadoc the 
> format at high-level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18786) Javadoc BigFormat

2023-09-24 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768534#comment-17768534
 ] 

Berenguer Blasi commented on CASSANDRA-18786:
-

Thx for the reviews!

> Javadoc BigFormat
> -
>
> Key: CASSANDRA-18786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18786
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation/Javadoc
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0.x
>
> Attachments: screenshot-1.png
>
>
> This ticket intends to go through the current sstables code and javadoc the 
> format at high-level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Javadoc BigFormat

2023-09-24 Thread bereng
This is an automated email from the ASF dual-hosted git repository.

bereng pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 9aa2109803 Javadoc BigFormat
9aa2109803 is described below

commit 9aa2109803a6dd53db36b058e89e7b431762ded2
Author: Bereng 
AuthorDate: Thu Aug 24 11:19:57 2023 +0200

Javadoc BigFormat

patch by Berenguer Blasi; reviewed by Ling Mao, Stefan Miklosovic for 
CASSANDRA-18786
---
 .../cassandra/io/sstable/format/big/BigFormat.java | 97 +-
 .../io/sstable/indexsummary/IndexSummary.java  |  2 +
 2 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java 
b/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
index 4de58b4b6b..d40d6a6f07 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
@@ -38,13 +38,14 @@ import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.DecoratedKey;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
+import org.apache.cassandra.db.memtable.Flushing;
 import org.apache.cassandra.dht.IPartitioner;
 import org.apache.cassandra.io.sstable.Component;
-import org.apache.cassandra.io.sstable.SSTable;
 import org.apache.cassandra.io.sstable.Descriptor;
 import org.apache.cassandra.io.sstable.GaugeProvider;
 import org.apache.cassandra.io.sstable.IScrubber;
 import org.apache.cassandra.io.sstable.MetricsProviders;
+import org.apache.cassandra.io.sstable.SSTable;
 import org.apache.cassandra.io.sstable.filter.BloomFilterMetrics;
 import org.apache.cassandra.io.sstable.format.AbstractSSTableFormat;
 import org.apache.cassandra.io.sstable.format.SSTableFormat;
@@ -66,7 +67,99 @@ import org.apache.cassandra.utils.Pair;
 import static 
org.apache.cassandra.io.sstable.format.SSTableFormat.Components.DATA;
 
 /**
- * Legacy bigtable format
+ * Legacy bigtable format. Components and approximate lifecycle:
+ * 
+ * {@link SSTableFormat.Components}
+ * 
+ * {@link Components#ALL_COMPONENTS}
+ *  
+ * 
+ *   {@link Components#SUMMARY}: When searching for a PK we go here for a 
first approximation on where to look in the index file. It is
+ *   a small sampling of the Index entries intended for a first fast 
search in-memory.
+ *   
+ *   {@link org.apache.cassandra.io.sstable.indexsummary.IndexSummary}
+ *   
+ *   {@link IndexSummaryComponent}
+ *   
+ * 
+ * 
+ *   {@link Components#PRIMARY_INDEX}: We'll land here in the approximate 
area where to look for the PK thanks to the Summary. Now we'll search for
+ *   the exact PK to get it's exact position in the data file.
+ *   
+ *   {@link BigTableWriter#indexWriter}
+ *   
+ *   {@link RowIndexEntry}
+ *   
+ *   {@link org.apache.cassandra.io.sstable.IndexInfo}
+ *   
+ *   {@link org.apache.cassandra.io.sstable.format.IndexComponent}
+ *   
+ * 
+ * 
+ *   {@link Components#DATA}: The actual data/partitions file as an array 
or partitions. Each partition has the form:
+ *   
+ *   A partition header
+ *   Maybe a static row
+ *   Rows or range tombstone
+ *   
+ *   I.e. upon flush {@link Flushing.FlushRunnable#writeSortedContents}
+ *   
+ *   Down to {@link 
org.apache.cassandra.io.sstable.format.SortedTableWriter#startPartition}
+ *   
+ *   Down to {@link 
org.apache.cassandra.io.sstable.format.SortedTablePartitionWriter#start}
+ *   
+ *   {@link org.apache.cassandra.io.sstable.format.DataComponent}
+ *   
+ * 
+ * 
+ *   {@link Components#STATS}: Stats on the data such as min timestamps to 
later vint encode TTL, markForDeleteAt, etc
+ *   
+ *   {@link org.apache.cassandra.db.rows.EncodingStats}
+ *   
+ *   {@link org.apache.cassandra.io.sstable.format.StatsComponent}
+ *   
+ * 
+ * 
+ *   {@link Components#COMPRESSION_INFO}: Contains compresion metadata
+ *   
+ *   {@link org.apache.cassandra.io.compress.CompressedSequentialWriter}
+ *   
+ *   {@link org.apache.cassandra.io.compress.CompressionMetadata}
+ *   
+ *   {@link 
org.apache.cassandra.io.sstable.format.CompressionInfoComponent}
+ *   
+ * 
+ * 
+ *   {@link Components#DIGEST}: The digest supporting the compression
+ *   
+ *   {@link org.apache.cassandra.io.compress.CompressedSequentialWriter}
+ *   
+ *   {@link org.apache.cassandra.io.util.ChecksumWriter}
+ *   
+ * 
+ * 
+ *   {@link Components#FILTER}: Bloom filter for data files
+ *   
+ *   {@link org.apache.cassandra.io.sstable.format.FilterComponent}
+ *   
+ *   {@link 

[jira] [Updated] (CASSANDRA-18866) Node sends multiple inflight echos

2023-09-24 Thread Erick Ramirez (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Ramirez updated CASSANDRA-18866:
--
Change Category: Operability
 Complexity: Normal
Component/s: Cluster/Gossip
 Status: Open  (was: Triage Needed)

> Node sends multiple inflight echos
> --
>
> Key: CASSANDRA-18866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip
>Reporter: Cameron Zemek
>Assignee: Cameron Zemek
>Priority: Normal
> Attachments: 18866-regression.patch, duplicates.log, echo.log
>
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-18866) Node sends multiple inflight echos

2023-09-24 Thread Erick Ramirez (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Ramirez reassigned CASSANDRA-18866:
-

Assignee: Cameron Zemek

> Node sends multiple inflight echos
> --
>
> Key: CASSANDRA-18866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Assignee: Cameron Zemek
>Priority: Normal
> Attachments: 18866-regression.patch, duplicates.log, echo.log
>
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18866) Node sends multiple inflight echos

2023-09-24 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768462#comment-17768462
 ] 

Cameron Zemek edited comment on CASSANDRA-18866 at 9/24/23 11:47 PM:
-

Had to make the following change for some more dtests:

Previous:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                logger.trace("Resending ECHO_REQ to {}", addr);
                Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                MessagingService.instance().sendWithCallback(echoMessage, addr, 
this);
            } {code}
After:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                if (isEnabled())
                {
                    logger.trace("Resending ECHO_REQ to {}", addr);
                    Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                    MessagingService.instance().sendWithCallback(echoMessage, 
addr, this);
                }
                else
                {
                    logger.trace("Failed ECHO_REQ to {}, aborting due to 
disabled gossip", addr);
                    inflightEcho.remove(addr);
                 }
            }
 {code}
[instaclustr/cassandra at CASSANDRA-18866-regressiontest 
(github.com)|https://github.com/instaclustr/cassandra/tree/CASSANDRA-18866-regressiontest]


was (Author: cam1982):
Had to make the following change for some more dtests:

Previous:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                logger.trace("Resending ECHO_REQ to {}", addr);
                Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                MessagingService.instance().sendWithCallback(echoMessage, addr, 
this);
            } {code}
After:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                if (isEnabled())
                {
                    logger.trace("Resending ECHO_REQ to {}", addr);
                    Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                    MessagingService.instance().sendWithCallback(echoMessage, 
addr, this);
                }
                else
                {
                    logger.trace("Failed ECHO_REQ to {}, aborting due to 
disabled gossip", addr);
                }
            }
 {code}
[instaclustr/cassandra at CASSANDRA-18866-regressiontest 
(github.com)|https://github.com/instaclustr/cassandra/tree/CASSANDRA-18866-regressiontest]

> Node sends multiple inflight echos
> --
>
> Key: CASSANDRA-18866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18866-regression.patch, duplicates.log, echo.log
>
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18866) Node sends multiple inflight echos

2023-09-24 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768462#comment-17768462
 ] 

Cameron Zemek edited comment on CASSANDRA-18866 at 9/24/23 11:42 PM:
-

Had to make the following change for some more dtests:

Previous:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                logger.trace("Resending ECHO_REQ to {}", addr);
                Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                MessagingService.instance().sendWithCallback(echoMessage, addr, 
this);
            } {code}
After:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                if (isEnabled())
                {
                    logger.trace("Resending ECHO_REQ to {}", addr);
                    Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                    MessagingService.instance().sendWithCallback(echoMessage, 
addr, this);
                }
                else
                {
                    logger.trace("Failed ECHO_REQ to {}, aborting due to 
disabled gossip", addr);
                }
            }
 {code}
[instaclustr/cassandra at CASSANDRA-18866-regressiontest 
(github.com)|https://github.com/instaclustr/cassandra/tree/CASSANDRA-18866-regressiontest]


was (Author: cam1982):
Had to make the following change for some more dtests:

Previous:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                logger.trace("Resending ECHO_REQ to {}", addr);
                Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                MessagingService.instance().sendWithCallback(echoMessage, addr, 
this);
            } {code}
After:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                if (isEnabled())
                {
                    logger.trace("Resending ECHO_REQ to {}", addr);
                    Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                    MessagingService.instance().sendWithCallback(echoMessage, 
addr, this);
                }
                else
                {
                    logger.trace("Failed ECHO_REQ to {}, aborting due to 
disabled gossip", addr);
                }
            }
 {code}

> Node sends multiple inflight echos
> --
>
> Key: CASSANDRA-18866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18866-regression.patch, duplicates.log, echo.log
>
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18866) Node sends multiple inflight echos

2023-09-24 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768462#comment-17768462
 ] 

Cameron Zemek commented on CASSANDRA-18866:
---

Had to make the following change for some more dtests:

Previous:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                logger.trace("Resending ECHO_REQ to {}", addr);
                Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                MessagingService.instance().sendWithCallback(echoMessage, addr, 
this);
            } {code}
After:
{code:java}
            @Override
            public void onFailure(InetAddressAndPort from, RequestFailureReason 
failureReason)
            {
                if (isEnabled())
                {
                    logger.trace("Resending ECHO_REQ to {}", addr);
                    Message echoMessage = Message.out(ECHO_REQ, 
noPayload);
                    MessagingService.instance().sendWithCallback(echoMessage, 
addr, this);
                }
                else
                {
                    logger.trace("Failed ECHO_REQ to {}, aborting due to 
disabled gossip", addr);
                }
            }
 {code}

> Node sends multiple inflight echos
> --
>
> Key: CASSANDRA-18866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18866
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18866-regression.patch, duplicates.log, echo.log
>
>
> CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, 
> 18845 had change to only allow 1 inflight ECHO request at a time. As per 
> 18854 some tests have an error rate due to this change. Creating this ticket 
> to discuss this further. As the current state also does not have retry logic, 
> it just allowing multiple ECHO requests inflight at the same time so less 
> likely that all ECHO will timeout or get lost.
> With the change from 18845 adding in some extra logging to track what is 
> going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO 
> requests from a node and also see it retrying ECHOs when it doesn't get a 
> reply.
> Therefore, I think the problem is more specific than the dropping of one ECHO 
> request. Yes there no retry logic for failed ECHO requests, but this is the 
> case even both before and after 18845. ECHO requests are only sent via gossip 
> verb handlers calling applyStateLocally. In these failed tests I therefore 
> assuming their cases where it won't call markAlive when other nodes consider 
> the node UP but its marked DOWN by a node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17992) Upgrade Netty on 5.0

2023-09-24 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-17992:

Test and Documentation Plan: Run CI and check regressions; check release 
notes  (was: Run CI and check regressions)

> Upgrade Netty on 5.0
> 
>
> Key: CASSANDRA-17992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17992
> Project: Cassandra
>  Issue Type: Task
>  Components: Dependencies
>Reporter: Ekaterina Dimitrova
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 5.0, 5.0-alpha1
>
> Attachments: important-netty-inter-releases.md, 
> netty-release-notes-filtered.md, netty-release-notes.md, signature.asc
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I haven't been able to identify from the Netty docs which was the lowest 
> version where JDK17 was added but we are about 40 versions behind in netty 4 
> so I suspect we better update. 
> -We need to consider there was an issue with class cast exceptions when 
> building with JDK17 with newer versions of netty (the newest available in 
> March 2022). For the record, we didn't see those when running CI on JDK8 and 
> JDK11. We also need to carefully revise the changes between the netty 
> versions. -->- CASSANDRA-18180
> Upgrading will cover also a fix in netty that was discussed in 
> [this|https://the-asf.slack.com/archives/CK23JSY2K/p1665567660202989] ASF 
> Slack thread. 
> CC [~benedict] , [~aleksey] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17992) Upgrade Netty on 5.0

2023-09-24 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-17992:

Test and Documentation Plan: Run CI and check regressions  (was: Run 
regressions)

> Upgrade Netty on 5.0
> 
>
> Key: CASSANDRA-17992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17992
> Project: Cassandra
>  Issue Type: Task
>  Components: Dependencies
>Reporter: Ekaterina Dimitrova
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 5.0, 5.0-alpha1
>
> Attachments: important-netty-inter-releases.md, 
> netty-release-notes-filtered.md, netty-release-notes.md, signature.asc
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I haven't been able to identify from the Netty docs which was the lowest 
> version where JDK17 was added but we are about 40 versions behind in netty 4 
> so I suspect we better update. 
> -We need to consider there was an issue with class cast exceptions when 
> building with JDK17 with newer versions of netty (the newest available in 
> March 2022). For the record, we didn't see those when running CI on JDK8 and 
> JDK11. We also need to carefully revise the changes between the netty 
> versions. -->- CASSANDRA-18180
> Upgrading will cover also a fix in netty that was discussed in 
> [this|https://the-asf.slack.com/archives/CK23JSY2K/p1665567660202989] ASF 
> Slack thread. 
> CC [~benedict] , [~aleksey] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18877) remove bytebuddy / byteman from production classpath and remove compress-lzf dependency from build deps

2023-09-24 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768416#comment-17768416
 ] 

Stefan Miklosovic edited comment on CASSANDRA-18877 at 9/24/23 4:34 PM:


In (1), it was using LZFOutputStream in StreamWriter which just got removed. 
That library was also purposefuly removed from the repository. I am not sure 
how it got back but it is suspicious that it was never removed from the 
dependency management (2) and then when Maven / Ant resolver stuff was 
introduced it was probably just resurrected because of that. 

(1) 
https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb
(2) 
https://github.com/apache/cassandra/blob/fc92db2b9b56c143516026ba29cecdec37e286bb/build.xml#L362


was (Author: smiklosovic):
In (1), it was using LZFOutputStream in StreamWriter which just got removed. 
That library was also purposefuly removed from the repository. I am not sure 
how it got back but it is suspicious that it was never removed from the 
dependency management (2) and then when Maven / Ant resolver stuff was 
introduced it was just resurrected. 

(1) 
https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb
(2) 
https://github.com/apache/cassandra/blob/fc92db2b9b56c143516026ba29cecdec37e286bb/build.xml#L362

> remove bytebuddy / byteman from production classpath and remove compress-lzf 
> dependency from build deps
> ---
>
> Key: CASSANDRA-18877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18877
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>
> I was digging in the project deps and if you compare all libs in "libs" dir 
> and all libs in "build/lib/jars", there are indeed some differences which are 
> OK however in build/lib/jars there are also libraries for byteman and 
> byte-buddy. This is clearly wrong as these dependecies should not be 
> accessible from the production code, only from tests.
> The reason they are accessible in prod code is that there is the class 
> TestRateLimiter (1). I do not have a clue why that class is in the prod code 
> in the first place. The only place it is referenced in is here (2) but that 
> byteman script is not loaded anywhere in tests. I was also checking Python 
> dtests.
> I think this is some leftover or something like "I will keep it here when I 
> need it", but as nobody seems to do, I strongly advocate for removing it and 
> making bytebuddy and byteman only test scoped dependencies as it should be.
> A reader who pays attention notices that these dependencies are of provided 
> scope which is a trick to have it compilable but not among the libraries in 
> the production runtime and it does not do any harm as it is never invoked 
> from the production code (if it was, it would fail on missing imports) 
> neverthless this is still an issue which should be addressed. We were doing 
> something similar with assertj dependency recently.
> The second issue is that there is a dependency on compress-lzf in build 
> dependencies. This is not necessary either as that library was removed from 
> the repository in (3) but it still somehow leaked to the build process again. 
> (1) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/TestRateLimiter.java
> (2) 
> https://github.com/apache/cassandra/blob/trunk/test/resources/byteman/mutation_limiter.btm
> (3) 
> https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18877) remove bytebuddy / byteman from production classpath and remove compress-lzf dependency from build deps

2023-09-24 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768416#comment-17768416
 ] 

Stefan Miklosovic commented on CASSANDRA-18877:
---

In (1), it was using LZFOutputStream in StreamWriter which just got removed. 
That library was also purposefuly removed from the repository. I am not sure 
how it got back but it is suspicious that it was never removed from the 
dependency management (2) and then when Maven / Ant resolver stuff was 
introduced it was just resurrected. 

(1) 
https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb
(2) 
https://github.com/apache/cassandra/blob/fc92db2b9b56c143516026ba29cecdec37e286bb/build.xml#L362

> remove bytebuddy / byteman from production classpath and remove compress-lzf 
> dependency from build deps
> ---
>
> Key: CASSANDRA-18877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18877
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>
> I was digging in the project deps and if you compare all libs in "libs" dir 
> and all libs in "build/lib/jars", there are indeed some differences which are 
> OK however in build/lib/jars there are also libraries for byteman and 
> byte-buddy. This is clearly wrong as these dependecies should not be 
> accessible from the production code, only from tests.
> The reason they are accessible in prod code is that there is the class 
> TestRateLimiter (1). I do not have a clue why that class is in the prod code 
> in the first place. The only place it is referenced in is here (2) but that 
> byteman script is not loaded anywhere in tests. I was also checking Python 
> dtests.
> I think this is some leftover or something like "I will keep it here when I 
> need it", but as nobody seems to do, I strongly advocate for removing it and 
> making bytebuddy and byteman only test scoped dependencies as it should be.
> A reader who pays attention notices that these dependencies are of provided 
> scope which is a trick to have it compilable but not among the libraries in 
> the production runtime and it does not do any harm as it is never invoked 
> from the production code (if it was, it would fail on missing imports) 
> neverthless this is still an issue which should be addressed. We were doing 
> something similar with assertj dependency recently.
> The second issue is that there is a dependency on compress-lzf in build 
> dependencies. This is not necessary either as that library was removed from 
> the repository in (3) but it still somehow leaked to the build process again. 
> (1) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/TestRateLimiter.java
> (2) 
> https://github.com/apache/cassandra/blob/trunk/test/resources/byteman/mutation_limiter.btm
> (3) 
> https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18873) Fix broken JMH benchmarks

2023-09-24 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-18873:
--
Description: 
The following benchmarks are broken:
* {{ZeroCopyStreamingBench}}
* {{MutationBench}}
* {{FastThreadLocalBench}}
* {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins)
* {{ReadSmallPartitionsBench}}

Additionally, those benchmarks take too much time to run:
* {{BTreeUpdateBench}} ~ 58 hours
* {{AtomicBTreePartitionUpdateBench}} ~ 5 hours
* {{BTreeTransformBench}} ~ 2.5 hours

Here the complete list of estimated benchmark times:
{noformat}
Estimated time for CacheLoaderBench: ~5 s
Estimated time for LatencyTrackingBench: ~26 s
Estimated time for SampleBench: ~30 s
Estimated time for ReadWriteBench: ~30 s
Estimated time for MutationBench: ~30 s
Estimated time for CompactionBench: ~35 s
Estimated time for DiagnosticEventPersistenceBench: ~40 s
Estimated time for ZeroCopyStreamingBench: ~44 s
Estimated time for BatchStatementBench: ~110 s
Estimated time for DiagnosticEventServiceBench: ~120 s
Estimated time for MessageOutBench: ~144 s
Estimated time for BloomFilterSerializerBench: ~144 s
Estimated time for FastThreadLocalBench: ~156 s
Estimated time for HashingBench: ~156 s
Estimated time for ChecksumBench: ~208 s
Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s
Estimated time for PendingRangesBench: ~ 5 m
Estimated time for DirectorySizerBench: ~ 5 m
Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m
Estimated time for PreaggregatedByteBufsBench: ~ 7 m
Estimated time for AutoBoxingBench: ~ 8 m
Estimated time for OutputStreamBench: ~ 13 m
Estimated time for BTreeBuildBench: ~ 13 m
Estimated time for StringsEncodeBench: ~ 20 m
Estimated time for instance.ReadWidePartitionsBench: ~ 21 m
Estimated time for btree.BTreeBuildBench: ~ 30 m
Estimated time for BTreeSearchIteratorBench: ~ 31 m
Estimated time for btree.BTreeTransformBench: ~ 138 m
Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m
Estimated time for btree.BTreeUpdateBench: ~58 h
Total estimated time: ~69 h
{noformat}

I'd like to add a test which estimates the benchmark times and fails if a 
single benchmark estimated run time is longer than xxx minutes.


  was:
The following benchmarks are broken:
* {{ZeroCopyStreamingBench}}
* {{MutationBench}}
* {{FastThreadLocalBench}}
* {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins)

Additionally, those benchmarks take too much time to run:
* {{BTreeUpdateBench}} ~ 58 hours
* {{AtomicBTreePartitionUpdateBench}} ~ 5 hours
* {{BTreeTransformBench}} ~ 2.5 hours

Here the complete list of estimated benchmark times:
{noformat}
Estimated time for CacheLoaderBench: ~5 s
Estimated time for LatencyTrackingBench: ~26 s
Estimated time for SampleBench: ~30 s
Estimated time for ReadWriteBench: ~30 s
Estimated time for MutationBench: ~30 s
Estimated time for CompactionBench: ~35 s
Estimated time for DiagnosticEventPersistenceBench: ~40 s
Estimated time for ZeroCopyStreamingBench: ~44 s
Estimated time for BatchStatementBench: ~110 s
Estimated time for DiagnosticEventServiceBench: ~120 s
Estimated time for MessageOutBench: ~144 s
Estimated time for BloomFilterSerializerBench: ~144 s
Estimated time for FastThreadLocalBench: ~156 s
Estimated time for HashingBench: ~156 s
Estimated time for ChecksumBench: ~208 s
Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s
Estimated time for PendingRangesBench: ~ 5 m
Estimated time for DirectorySizerBench: ~ 5 m
Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m
Estimated time for PreaggregatedByteBufsBench: ~ 7 m
Estimated time for AutoBoxingBench: ~ 8 m
Estimated time for OutputStreamBench: ~ 13 m
Estimated time for BTreeBuildBench: ~ 13 m
Estimated time for StringsEncodeBench: ~ 20 m
Estimated time for instance.ReadWidePartitionsBench: ~ 21 m
Estimated time for btree.BTreeBuildBench: ~ 30 m
Estimated time for BTreeSearchIteratorBench: ~ 31 m
Estimated time for btree.BTreeTransformBench: ~ 138 m
Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m
Estimated time for btree.BTreeUpdateBench: ~58 h
Total estimated time: ~69 h
{noformat}

I'd like to add a test which estimates the benchmark times and fails if a 
single benchmark estimated run time is longer than xxx minutes.



> Fix broken JMH benchmarks
> -
>
> Key: CASSANDRA-18873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18873
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Jacek Lewandowski
>Priority: Normal
> Attachments: BenchTimeTest.java
>
>
> The following benchmarks are broken:
> * {{ZeroCopyStreamingBench}}
> * {{MutationBench}}
> * {{FastThreadLocalBench}}
> * {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins)
> * {{ReadSmallPartitionsBench}}
> 

[jira] [Commented] (CASSANDRA-18877) remove bytebuddy / byteman from production classpath and remove compress-lzf dependency from build deps

2023-09-24 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768367#comment-17768367
 ] 

Michael Semb Wever commented on CASSANDRA-18877:


bq. The second issue is that there is a dependency on compress-lzf

note, the LZ4Compressor uses lz4-java

> remove bytebuddy / byteman from production classpath and remove compress-lzf 
> dependency from build deps
> ---
>
> Key: CASSANDRA-18877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18877
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>
> I was digging in the project deps and if you compare all libs in "libs" dir 
> and all libs in "build/lib/jars", there are indeed some differences which are 
> OK however in build/lib/jars there are also libraries for byteman and 
> byte-buddy. This is clearly wrong as these dependecies should not be 
> accessible from the production code, only from tests.
> The reason they are accessible in prod code is that there is the class 
> TestRateLimiter (1). I do not have a clue why that class is in the prod code 
> in the first place. The only place it is referenced in is here (2) but that 
> byteman script is not loaded anywhere in tests. I was also checking Python 
> dtests.
> I think this is some leftover or something like "I will keep it here when I 
> need it", but as nobody seems to do, I strongly advocate for removing it and 
> making bytebuddy and byteman only test scoped dependencies as it should be.
> A reader who pays attention notices that these dependencies are of provided 
> scope which is a trick to have it compilable but not among the libraries in 
> the production runtime and it does not do any harm as it is never invoked 
> from the production code (if it was, it would fail on missing imports) 
> neverthless this is still an issue which should be addressed. We were doing 
> something similar with assertj dependency recently.
> The second issue is that there is a dependency on compress-lzf in build 
> dependencies. This is not necessary either as that library was removed from 
> the repository in (3) but it still somehow leaked to the build process again. 
> (1) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/TestRateLimiter.java
> (2) 
> https://github.com/apache/cassandra/blob/trunk/test/resources/byteman/mutation_limiter.btm
> (3) 
> https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18877) remove bytebuddy / byteman from production classpath and remove compress-lzf dependency from build deps

2023-09-24 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768366#comment-17768366
 ] 

Michael Semb Wever commented on CASSANDRA-18877:


We sure it wasn't (or isn't) needed as part of the build process.  
build/lib/jars will always contain more for this reason.

> remove bytebuddy / byteman from production classpath and remove compress-lzf 
> dependency from build deps
> ---
>
> Key: CASSANDRA-18877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18877
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>
> I was digging in the project deps and if you compare all libs in "libs" dir 
> and all libs in "build/lib/jars", there are indeed some differences which are 
> OK however in build/lib/jars there are also libraries for byteman and 
> byte-buddy. This is clearly wrong as these dependecies should not be 
> accessible from the production code, only from tests.
> The reason they are accessible in prod code is that there is the class 
> TestRateLimiter (1). I do not have a clue why that class is in the prod code 
> in the first place. The only place it is referenced in is here (2) but that 
> byteman script is not loaded anywhere in tests. I was also checking Python 
> dtests.
> I think this is some leftover or something like "I will keep it here when I 
> need it", but as nobody seems to do, I strongly advocate for removing it and 
> making bytebuddy and byteman only test scoped dependencies as it should be.
> A reader who pays attention notices that these dependencies are of provided 
> scope which is a trick to have it compilable but not among the libraries in 
> the production runtime and it does not do any harm as it is never invoked 
> from the production code (if it was, it would fail on missing imports) 
> neverthless this is still an issue which should be addressed. We were doing 
> something similar with assertj dependency recently.
> The second issue is that there is a dependency on compress-lzf in build 
> dependencies. This is not necessary either as that library was removed from 
> the repository in (3) but it still somehow leaked to the build process again. 
> (1) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/TestRateLimiter.java
> (2) 
> https://github.com/apache/cassandra/blob/trunk/test/resources/byteman/mutation_limiter.btm
> (3) 
> https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14667) Upgrade Dropwizard Metrics to 4.x

2023-09-24 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768364#comment-17768364
 ] 

Michael Semb Wever commented on CASSANDRA-14667:


bq. ant resolver plugin has some bugs in it 

https://issues.apache.org/jira/browse/CASSANDRA-18049?focusedCommentId=17706782=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17706782

> Upgrade Dropwizard Metrics to 4.x
> -
>
> Key: CASSANDRA-14667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14667
> Project: Cassandra
>  Issue Type: Task
>  Components: Observability/Metrics
>Reporter: Stig Rohde Døssing
>Assignee: Maxim Muzafarov
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: signature.asc, signature.asc, signature.asc, 
> signature.asc
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Cassandra currently uses Metrics 3.1.5. Version 4.0.0 added some fixes for 
> Java 9 compatibility. It would be good to upgrade the Metrics library as part 
> of the version of Cassandra that adds Java 9 compatibility 
> (https://issues.apache.org/jira/browse/CASSANDRA-9608). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18871) JMH benchmark improvements

2023-09-24 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768331#comment-17768331
 ] 

Jacek Lewandowski edited comment on CASSANDRA-18871 at 9/24/23 7:23 AM:


So the build failed, probably it was running for too long. I looked into the 
logs to figure out why it takes so long and learned that there are benchmarks 
which takes extremely long to run. 

{{btree.BTreeTransformBench}}, params = 7x9x2=63, x4 forks = 252, x3 methods = 
756 tests x 11s each ~= 2h 20m
{{btree.BTreeUpdateBench}}, params = 7x7x3x2x2x3=1764, x4 forks = 7056, x1 
method = 7056 tests x 30s each ~= 59h

I'm going to exclude the longest benchmarks for now and create a ticket to fix 
it later - see CASSANDRA-18873. Those benchmarks are still ok to run locally 
with {{-Dbenchmark.name=...}}
 
btw. I've implemented a test which estimates benchmark run times according to 
the number of forks, warups and measurement iterations number and time, and the 
number of parameter combinations. The results are as follows: 

{noformat}
Estimated time for CacheLoaderBench: ~5 s
Estimated time for LatencyTrackingBench: ~26 s
Estimated time for SampleBench: ~30 s
Estimated time for ReadWriteBench: ~30 s
Estimated time for MutationBench: ~30 s
Estimated time for CompactionBench: ~35 s
Estimated time for DiagnosticEventPersistenceBench: ~40 s
Estimated time for ZeroCopyStreamingBench: ~44 s
Estimated time for BatchStatementBench: ~110 s
Estimated time for DiagnosticEventServiceBench: ~120 s
Estimated time for MessageOutBench: ~144 s
Estimated time for BloomFilterSerializerBench: ~144 s
Estimated time for FastThreadLocalBench: ~156 s
Estimated time for HashingBench: ~156 s
Estimated time for ChecksumBench: ~208 s
Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s
Estimated time for PendingRangesBench: ~ 5 m
Estimated time for DirectorySizerBench: ~ 5 m
Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m
Estimated time for PreaggregatedByteBufsBench: ~ 7 m
Estimated time for AutoBoxingBench: ~ 8 m
Estimated time for OutputStreamBench: ~ 13 m
Estimated time for BTreeBuildBench: ~ 13 m
Estimated time for StringsEncodeBench: ~ 20 m
Estimated time for instance.ReadWidePartitionsBench: ~ 21 m
Estimated time for btree.BTreeBuildBench: ~ 30 m
Estimated time for BTreeSearchIteratorBench: ~ 31 m
Estimated time for btree.BTreeTransformBench: ~ 138 m
Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m
Estimated time for btree.BTreeUpdateBench: ~58 h
Total estimated time: ~69 h
{noformat}

We can make it assert that no benchmark is planned to run longer than 30 
minutes (but as said, a separate ticket)


was (Author: jlewandowski):
So the build failed, probably it was running for too long. I looked into the 
logs to figure out why it takes so long and learned that there are benchmarks 
which takes extremely long to run. 

{{btree.BTreeTransformBench}}, params = 7x9x2=63, x4 forks = 252, x3 methods = 
756 tests x 11s each ~= 2h 20m
{{btree.BTreeUpdateBench}}, params = 7x7x3x2x2x3=1764, x4 forks = 7056, x1 
method = 7056 tests x 30s each ~= 59h

To me, the other one is unacceptable for CI. We need to reduce the number of 
parameters, and also, set the number of forks to 1 (probably for each test). 

I'm going to exclude the benchmark for now and create a ticket to fix it later 
(I'm going to do that for each benchmark which causes CI to fail). Those 
benchmarks are still ok to run locally with {{-Dbenchmark.name=...}}
 
btw. I've implemented a test which estimates benchmark run times according to 
the number of forks, warups and measurement iterations number and time, and the 
number of parameter combinations. The results are as follows: 

{noformat}
Estimated time for CacheLoaderBench: ~5 s
Estimated time for LatencyTrackingBench: ~26 s
Estimated time for SampleBench: ~30 s
Estimated time for ReadWriteBench: ~30 s
Estimated time for MutationBench: ~30 s
Estimated time for CompactionBench: ~35 s
Estimated time for DiagnosticEventPersistenceBench: ~40 s
Estimated time for ZeroCopyStreamingBench: ~44 s
Estimated time for BatchStatementBench: ~110 s
Estimated time for DiagnosticEventServiceBench: ~120 s
Estimated time for MessageOutBench: ~144 s
Estimated time for BloomFilterSerializerBench: ~144 s
Estimated time for FastThreadLocalBench: ~156 s
Estimated time for HashingBench: ~156 s
Estimated time for ChecksumBench: ~208 s
Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s
Estimated time for PendingRangesBench: ~ 5 m
Estimated time for DirectorySizerBench: ~ 5 m
Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m
Estimated time for PreaggregatedByteBufsBench: ~ 7 m
Estimated time for AutoBoxingBench: ~ 8 m
Estimated time for OutputStreamBench: ~ 13 m
Estimated time for BTreeBuildBench: ~ 13 m
Estimated time for StringsEncodeBench: ~ 20 m
Estimated time for 

[jira] [Updated] (CASSANDRA-18873) Fix broken JMH benchmarks

2023-09-24 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-18873:
--
Attachment: BenchTimeTest.java

> Fix broken JMH benchmarks
> -
>
> Key: CASSANDRA-18873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18873
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Jacek Lewandowski
>Priority: Normal
> Attachments: BenchTimeTest.java
>
>
> The following benchmarks are broken:
> * {{ZeroCopyStreamingBench}}
> * {{MutationBench}}
> * {{FastThreadLocalBench}}
> * {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins)
> Additionally, those benchmarks take too much time to run:
> * {{BTreeUpdateBench}} ~ 58 hours
> * {{AtomicBTreePartitionUpdateBench}} ~ 5 hours
> * {{BTreeTransformBench}} ~ 2.5 hours
> Here the complete list of estimated benchmark times:
> {noformat}
> Estimated time for CacheLoaderBench: ~5 s
> Estimated time for LatencyTrackingBench: ~26 s
> Estimated time for SampleBench: ~30 s
> Estimated time for ReadWriteBench: ~30 s
> Estimated time for MutationBench: ~30 s
> Estimated time for CompactionBench: ~35 s
> Estimated time for DiagnosticEventPersistenceBench: ~40 s
> Estimated time for ZeroCopyStreamingBench: ~44 s
> Estimated time for BatchStatementBench: ~110 s
> Estimated time for DiagnosticEventServiceBench: ~120 s
> Estimated time for MessageOutBench: ~144 s
> Estimated time for BloomFilterSerializerBench: ~144 s
> Estimated time for FastThreadLocalBench: ~156 s
> Estimated time for HashingBench: ~156 s
> Estimated time for ChecksumBench: ~208 s
> Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s
> Estimated time for PendingRangesBench: ~ 5 m
> Estimated time for DirectorySizerBench: ~ 5 m
> Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m
> Estimated time for PreaggregatedByteBufsBench: ~ 7 m
> Estimated time for AutoBoxingBench: ~ 8 m
> Estimated time for OutputStreamBench: ~ 13 m
> Estimated time for BTreeBuildBench: ~ 13 m
> Estimated time for StringsEncodeBench: ~ 20 m
> Estimated time for instance.ReadWidePartitionsBench: ~ 21 m
> Estimated time for btree.BTreeBuildBench: ~ 30 m
> Estimated time for BTreeSearchIteratorBench: ~ 31 m
> Estimated time for btree.BTreeTransformBench: ~ 138 m
> Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m
> Estimated time for btree.BTreeUpdateBench: ~58 h
> Total estimated time: ~69 h
> {noformat}
> I'd like to add a test which estimates the benchmark times and fails if a 
> single benchmark estimated run time is longer than xxx minutes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18873) Fix broken JMH benchmarks

2023-09-24 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-18873:
--
Description: 
The following benchmarks are broken:
* {{ZeroCopyStreamingBench}}
* {{MutationBench}}
* {{FastThreadLocalBench}}
* {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins)

Additionally, those benchmarks take too much time to run:
* {{BTreeUpdateBench}} ~ 58 hours
* {{AtomicBTreePartitionUpdateBench}} ~ 5 hours
* {{BTreeTransformBench}} ~ 2.5 hours

Here the complete list of estimated benchmark times:
{noformat}
Estimated time for CacheLoaderBench: ~5 s
Estimated time for LatencyTrackingBench: ~26 s
Estimated time for SampleBench: ~30 s
Estimated time for ReadWriteBench: ~30 s
Estimated time for MutationBench: ~30 s
Estimated time for CompactionBench: ~35 s
Estimated time for DiagnosticEventPersistenceBench: ~40 s
Estimated time for ZeroCopyStreamingBench: ~44 s
Estimated time for BatchStatementBench: ~110 s
Estimated time for DiagnosticEventServiceBench: ~120 s
Estimated time for MessageOutBench: ~144 s
Estimated time for BloomFilterSerializerBench: ~144 s
Estimated time for FastThreadLocalBench: ~156 s
Estimated time for HashingBench: ~156 s
Estimated time for ChecksumBench: ~208 s
Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s
Estimated time for PendingRangesBench: ~ 5 m
Estimated time for DirectorySizerBench: ~ 5 m
Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m
Estimated time for PreaggregatedByteBufsBench: ~ 7 m
Estimated time for AutoBoxingBench: ~ 8 m
Estimated time for OutputStreamBench: ~ 13 m
Estimated time for BTreeBuildBench: ~ 13 m
Estimated time for StringsEncodeBench: ~ 20 m
Estimated time for instance.ReadWidePartitionsBench: ~ 21 m
Estimated time for btree.BTreeBuildBench: ~ 30 m
Estimated time for BTreeSearchIteratorBench: ~ 31 m
Estimated time for btree.BTreeTransformBench: ~ 138 m
Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m
Estimated time for btree.BTreeUpdateBench: ~58 h
Total estimated time: ~69 h
{noformat}

I'd like to add a test which estimates the benchmark times and fails if a 
single benchmark estimated run time is longer than xxx minutes.


  was:
ZeroCopyStreamingBench
MutationBench
FastThreadLocalBench
AtomicBTreePartitionUpdateBench (OOM on Jenkins)



> Fix broken JMH benchmarks
> -
>
> Key: CASSANDRA-18873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18873
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Jacek Lewandowski
>Priority: Normal
>
> The following benchmarks are broken:
> * {{ZeroCopyStreamingBench}}
> * {{MutationBench}}
> * {{FastThreadLocalBench}}
> * {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins)
> Additionally, those benchmarks take too much time to run:
> * {{BTreeUpdateBench}} ~ 58 hours
> * {{AtomicBTreePartitionUpdateBench}} ~ 5 hours
> * {{BTreeTransformBench}} ~ 2.5 hours
> Here the complete list of estimated benchmark times:
> {noformat}
> Estimated time for CacheLoaderBench: ~5 s
> Estimated time for LatencyTrackingBench: ~26 s
> Estimated time for SampleBench: ~30 s
> Estimated time for ReadWriteBench: ~30 s
> Estimated time for MutationBench: ~30 s
> Estimated time for CompactionBench: ~35 s
> Estimated time for DiagnosticEventPersistenceBench: ~40 s
> Estimated time for ZeroCopyStreamingBench: ~44 s
> Estimated time for BatchStatementBench: ~110 s
> Estimated time for DiagnosticEventServiceBench: ~120 s
> Estimated time for MessageOutBench: ~144 s
> Estimated time for BloomFilterSerializerBench: ~144 s
> Estimated time for FastThreadLocalBench: ~156 s
> Estimated time for HashingBench: ~156 s
> Estimated time for ChecksumBench: ~208 s
> Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s
> Estimated time for PendingRangesBench: ~ 5 m
> Estimated time for DirectorySizerBench: ~ 5 m
> Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m
> Estimated time for PreaggregatedByteBufsBench: ~ 7 m
> Estimated time for AutoBoxingBench: ~ 8 m
> Estimated time for OutputStreamBench: ~ 13 m
> Estimated time for BTreeBuildBench: ~ 13 m
> Estimated time for StringsEncodeBench: ~ 20 m
> Estimated time for instance.ReadWidePartitionsBench: ~ 21 m
> Estimated time for btree.BTreeBuildBench: ~ 30 m
> Estimated time for BTreeSearchIteratorBench: ~ 31 m
> Estimated time for btree.BTreeTransformBench: ~ 138 m
> Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m
> Estimated time for btree.BTreeUpdateBench: ~58 h
> Total estimated time: ~69 h
> {noformat}
> I'd like to add a test which estimates the benchmark times and fails if a 
> single benchmark estimated run time is longer than xxx minutes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CASSANDRA-18871) JMH benchmark improvements

2023-09-24 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768331#comment-17768331
 ] 

Jacek Lewandowski edited comment on CASSANDRA-18871 at 9/24/23 7:03 AM:


So the build failed, probably it was running for too long. I looked into the 
logs to figure out why it takes so long and learned that there are benchmarks 
which takes extremely long to run. 

{{btree.BTreeTransformBench}}, params = 7x9x2=63, x4 forks = 252, x3 methods = 
756 tests x 11s each ~= 2h 20m
{{btree.BTreeUpdateBench}}, params = 7x7x3x2x2x3=1764, x4 forks = 7056, x1 
method = 7056 tests x 30s each ~= 59h

To me, the other one is unacceptable for CI. We need to reduce the number of 
parameters, and also, set the number of forks to 1 (probably for each test). 

I'm going to exclude the benchmark for now and create a ticket to fix it later 
(I'm going to do that for each benchmark which causes CI to fail). Those 
benchmarks are still ok to run locally with {{-Dbenchmark.name=...}}
 
btw. I've implemented a test which estimates benchmark run times according to 
the number of forks, warups and measurement iterations number and time, and the 
number of parameter combinations. The results are as follows: 

{noformat}
Estimated time for CacheLoaderBench: ~5 s
Estimated time for LatencyTrackingBench: ~26 s
Estimated time for SampleBench: ~30 s
Estimated time for ReadWriteBench: ~30 s
Estimated time for MutationBench: ~30 s
Estimated time for CompactionBench: ~35 s
Estimated time for DiagnosticEventPersistenceBench: ~40 s
Estimated time for ZeroCopyStreamingBench: ~44 s
Estimated time for BatchStatementBench: ~110 s
Estimated time for DiagnosticEventServiceBench: ~120 s
Estimated time for MessageOutBench: ~144 s
Estimated time for BloomFilterSerializerBench: ~144 s
Estimated time for FastThreadLocalBench: ~156 s
Estimated time for HashingBench: ~156 s
Estimated time for ChecksumBench: ~208 s
Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s
Estimated time for PendingRangesBench: ~ 5 m
Estimated time for DirectorySizerBench: ~ 5 m
Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m
Estimated time for PreaggregatedByteBufsBench: ~ 7 m
Estimated time for AutoBoxingBench: ~ 8 m
Estimated time for OutputStreamBench: ~ 13 m
Estimated time for BTreeBuildBench: ~ 13 m
Estimated time for StringsEncodeBench: ~ 20 m
Estimated time for instance.ReadWidePartitionsBench: ~ 21 m
Estimated time for btree.BTreeBuildBench: ~ 30 m
Estimated time for BTreeSearchIteratorBench: ~ 31 m
Estimated time for btree.BTreeTransformBench: ~ 138 m
Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m
Estimated time for btree.BTreeUpdateBench: ~58 h
Total estimated time: ~69 h
{noformat}

We can make it assert that no benchmark is planned to run longer than 30 
minutes (but as said, a separate ticket)


was (Author: jlewandowski):
So the build failed, probably it was running for too long. I looked into the 
logs to figure out why it takes so long and learned that there are benchmarks 
which takes extremely long to run. 

{{btree.BTreeTransformBench}}, params = 7x9x2=63, x4 forks = 252, x3 methods = 
756 tests x 11s each ~= 2h 20m
{{btree.BTreeUpdateBench}}, params = 7x7x3x2x2x3=1764, x4 forks = 7056, x1 
method = 7056 tests x 30s each ~= 59h

To me, the other one is unacceptable for CI. We need to reduce the number of 
parameters, and also, set the number of forks to 1 (probably for each test). 

I'm going to exclude the benchmark for now and create a ticket to fix it later 
(I'm going to do that for each benchmark which causes CI to fail). Those 
benchmarks are still ok to run locally with {{-Dbenchmark.name=...}}
 

> JMH benchmark improvements
> --
>
> Key: CASSANDRA-18871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18871
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build, Legacy/Tools
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 1. CASSANDRA-12586  introduced {{build-jmh}} task which builds uber jar for 
> JMH benchmarks which is then not used with {{ant microbench}} task. It is 
> used though by the {{test/bin/jmh}} script. 
> In fact, I have no idea why we should use uber jar if JMH can perfectly run 
> with a regular classpath. Maybe that had something to do with older JMH 
> version which was used that time. Building uber jars takes time and is 
> annoying. Since it seems to be redundant anyway, I'm going to remove it and 
> fix {{test/bin/jmh}} to use a regular classpath. 
> 2. I'll add support for async profiler in benchmarks. That is, the 
> {{microbench}} target automatically fetches the async profiler