[jira] [Updated] (CASSANDRA-18786) Javadoc BigFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Berenguer Blasi updated CASSANDRA-18786: Source Control Link: https://github.com/apache/cassandra/commit/9aa2109803a6dd53db36b058e89e7b431762ded2 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Javadoc BigFormat > - > > Key: CASSANDRA-18786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18786 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Javadoc >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x > > Attachments: screenshot-1.png > > > This ticket intends to go through the current sstables code and javadoc the > format at high-level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18786) Javadoc BigFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768534#comment-17768534 ] Berenguer Blasi commented on CASSANDRA-18786: - Thx for the reviews! > Javadoc BigFormat > - > > Key: CASSANDRA-18786 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18786 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Javadoc >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0.x > > Attachments: screenshot-1.png > > > This ticket intends to go through the current sstables code and javadoc the > format at high-level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated: Javadoc BigFormat
This is an automated email from the ASF dual-hosted git repository. bereng pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 9aa2109803 Javadoc BigFormat 9aa2109803 is described below commit 9aa2109803a6dd53db36b058e89e7b431762ded2 Author: Bereng AuthorDate: Thu Aug 24 11:19:57 2023 +0200 Javadoc BigFormat patch by Berenguer Blasi; reviewed by Ling Mao, Stefan Miklosovic for CASSANDRA-18786 --- .../cassandra/io/sstable/format/big/BigFormat.java | 97 +- .../io/sstable/indexsummary/IndexSummary.java | 2 + 2 files changed, 97 insertions(+), 2 deletions(-) diff --git a/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java b/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java index 4de58b4b6b..d40d6a6f07 100644 --- a/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java +++ b/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java @@ -38,13 +38,14 @@ import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.db.ColumnFamilyStore; import org.apache.cassandra.db.DecoratedKey; import org.apache.cassandra.db.lifecycle.LifecycleTransaction; +import org.apache.cassandra.db.memtable.Flushing; import org.apache.cassandra.dht.IPartitioner; import org.apache.cassandra.io.sstable.Component; -import org.apache.cassandra.io.sstable.SSTable; import org.apache.cassandra.io.sstable.Descriptor; import org.apache.cassandra.io.sstable.GaugeProvider; import org.apache.cassandra.io.sstable.IScrubber; import org.apache.cassandra.io.sstable.MetricsProviders; +import org.apache.cassandra.io.sstable.SSTable; import org.apache.cassandra.io.sstable.filter.BloomFilterMetrics; import org.apache.cassandra.io.sstable.format.AbstractSSTableFormat; import org.apache.cassandra.io.sstable.format.SSTableFormat; @@ -66,7 +67,99 @@ import org.apache.cassandra.utils.Pair; import static org.apache.cassandra.io.sstable.format.SSTableFormat.Components.DATA; /** - * Legacy bigtable format + * Legacy bigtable format. Components and approximate lifecycle: + * + * {@link SSTableFormat.Components} + * + * {@link Components#ALL_COMPONENTS} + * + * + * {@link Components#SUMMARY}: When searching for a PK we go here for a first approximation on where to look in the index file. It is + * a small sampling of the Index entries intended for a first fast search in-memory. + * + * {@link org.apache.cassandra.io.sstable.indexsummary.IndexSummary} + * + * {@link IndexSummaryComponent} + * + * + * + * {@link Components#PRIMARY_INDEX}: We'll land here in the approximate area where to look for the PK thanks to the Summary. Now we'll search for + * the exact PK to get it's exact position in the data file. + * + * {@link BigTableWriter#indexWriter} + * + * {@link RowIndexEntry} + * + * {@link org.apache.cassandra.io.sstable.IndexInfo} + * + * {@link org.apache.cassandra.io.sstable.format.IndexComponent} + * + * + * + * {@link Components#DATA}: The actual data/partitions file as an array or partitions. Each partition has the form: + * + * A partition header + * Maybe a static row + * Rows or range tombstone + * + * I.e. upon flush {@link Flushing.FlushRunnable#writeSortedContents} + * + * Down to {@link org.apache.cassandra.io.sstable.format.SortedTableWriter#startPartition} + * + * Down to {@link org.apache.cassandra.io.sstable.format.SortedTablePartitionWriter#start} + * + * {@link org.apache.cassandra.io.sstable.format.DataComponent} + * + * + * + * {@link Components#STATS}: Stats on the data such as min timestamps to later vint encode TTL, markForDeleteAt, etc + * + * {@link org.apache.cassandra.db.rows.EncodingStats} + * + * {@link org.apache.cassandra.io.sstable.format.StatsComponent} + * + * + * + * {@link Components#COMPRESSION_INFO}: Contains compresion metadata + * + * {@link org.apache.cassandra.io.compress.CompressedSequentialWriter} + * + * {@link org.apache.cassandra.io.compress.CompressionMetadata} + * + * {@link org.apache.cassandra.io.sstable.format.CompressionInfoComponent} + * + * + * + * {@link Components#DIGEST}: The digest supporting the compression + * + * {@link org.apache.cassandra.io.compress.CompressedSequentialWriter} + * + * {@link org.apache.cassandra.io.util.ChecksumWriter} + * + * + * + * {@link Components#FILTER}: Bloom filter for data files + * + * {@link org.apache.cassandra.io.sstable.format.FilterComponent} + * + * {@link
[jira] [Updated] (CASSANDRA-18866) Node sends multiple inflight echos
[ https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Ramirez updated CASSANDRA-18866: -- Change Category: Operability Complexity: Normal Component/s: Cluster/Gossip Status: Open (was: Triage Needed) > Node sends multiple inflight echos > -- > > Key: CASSANDRA-18866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18866 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip >Reporter: Cameron Zemek >Assignee: Cameron Zemek >Priority: Normal > Attachments: 18866-regression.patch, duplicates.log, echo.log > > > CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, > 18845 had change to only allow 1 inflight ECHO request at a time. As per > 18854 some tests have an error rate due to this change. Creating this ticket > to discuss this further. As the current state also does not have retry logic, > it just allowing multiple ECHO requests inflight at the same time so less > likely that all ECHO will timeout or get lost. > With the change from 18845 adding in some extra logging to track what is > going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO > requests from a node and also see it retrying ECHOs when it doesn't get a > reply. > Therefore, I think the problem is more specific than the dropping of one ECHO > request. Yes there no retry logic for failed ECHO requests, but this is the > case even both before and after 18845. ECHO requests are only sent via gossip > verb handlers calling applyStateLocally. In these failed tests I therefore > assuming their cases where it won't call markAlive when other nodes consider > the node UP but its marked DOWN by a node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-18866) Node sends multiple inflight echos
[ https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Ramirez reassigned CASSANDRA-18866: - Assignee: Cameron Zemek > Node sends multiple inflight echos > -- > > Key: CASSANDRA-18866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18866 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Assignee: Cameron Zemek >Priority: Normal > Attachments: 18866-regression.patch, duplicates.log, echo.log > > > CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, > 18845 had change to only allow 1 inflight ECHO request at a time. As per > 18854 some tests have an error rate due to this change. Creating this ticket > to discuss this further. As the current state also does not have retry logic, > it just allowing multiple ECHO requests inflight at the same time so less > likely that all ECHO will timeout or get lost. > With the change from 18845 adding in some extra logging to track what is > going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO > requests from a node and also see it retrying ECHOs when it doesn't get a > reply. > Therefore, I think the problem is more specific than the dropping of one ECHO > request. Yes there no retry logic for failed ECHO requests, but this is the > case even both before and after 18845. ECHO requests are only sent via gossip > verb handlers calling applyStateLocally. In these failed tests I therefore > assuming their cases where it won't call markAlive when other nodes consider > the node UP but its marked DOWN by a node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18866) Node sends multiple inflight echos
[ https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768462#comment-17768462 ] Cameron Zemek edited comment on CASSANDRA-18866 at 9/24/23 11:47 PM: - Had to make the following change for some more dtests: Previous: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } {code} After: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { if (isEnabled()) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } else { logger.trace("Failed ECHO_REQ to {}, aborting due to disabled gossip", addr); inflightEcho.remove(addr); } } {code} [instaclustr/cassandra at CASSANDRA-18866-regressiontest (github.com)|https://github.com/instaclustr/cassandra/tree/CASSANDRA-18866-regressiontest] was (Author: cam1982): Had to make the following change for some more dtests: Previous: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } {code} After: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { if (isEnabled()) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } else { logger.trace("Failed ECHO_REQ to {}, aborting due to disabled gossip", addr); } } {code} [instaclustr/cassandra at CASSANDRA-18866-regressiontest (github.com)|https://github.com/instaclustr/cassandra/tree/CASSANDRA-18866-regressiontest] > Node sends multiple inflight echos > -- > > Key: CASSANDRA-18866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18866 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18866-regression.patch, duplicates.log, echo.log > > > CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, > 18845 had change to only allow 1 inflight ECHO request at a time. As per > 18854 some tests have an error rate due to this change. Creating this ticket > to discuss this further. As the current state also does not have retry logic, > it just allowing multiple ECHO requests inflight at the same time so less > likely that all ECHO will timeout or get lost. > With the change from 18845 adding in some extra logging to track what is > going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO > requests from a node and also see it retrying ECHOs when it doesn't get a > reply. > Therefore, I think the problem is more specific than the dropping of one ECHO > request. Yes there no retry logic for failed ECHO requests, but this is the > case even both before and after 18845. ECHO requests are only sent via gossip > verb handlers calling applyStateLocally. In these failed tests I therefore > assuming their cases where it won't call markAlive when other nodes consider > the node UP but its marked DOWN by a node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18866) Node sends multiple inflight echos
[ https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768462#comment-17768462 ] Cameron Zemek edited comment on CASSANDRA-18866 at 9/24/23 11:42 PM: - Had to make the following change for some more dtests: Previous: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } {code} After: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { if (isEnabled()) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } else { logger.trace("Failed ECHO_REQ to {}, aborting due to disabled gossip", addr); } } {code} [instaclustr/cassandra at CASSANDRA-18866-regressiontest (github.com)|https://github.com/instaclustr/cassandra/tree/CASSANDRA-18866-regressiontest] was (Author: cam1982): Had to make the following change for some more dtests: Previous: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } {code} After: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { if (isEnabled()) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } else { logger.trace("Failed ECHO_REQ to {}, aborting due to disabled gossip", addr); } } {code} > Node sends multiple inflight echos > -- > > Key: CASSANDRA-18866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18866 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18866-regression.patch, duplicates.log, echo.log > > > CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, > 18845 had change to only allow 1 inflight ECHO request at a time. As per > 18854 some tests have an error rate due to this change. Creating this ticket > to discuss this further. As the current state also does not have retry logic, > it just allowing multiple ECHO requests inflight at the same time so less > likely that all ECHO will timeout or get lost. > With the change from 18845 adding in some extra logging to track what is > going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO > requests from a node and also see it retrying ECHOs when it doesn't get a > reply. > Therefore, I think the problem is more specific than the dropping of one ECHO > request. Yes there no retry logic for failed ECHO requests, but this is the > case even both before and after 18845. ECHO requests are only sent via gossip > verb handlers calling applyStateLocally. In these failed tests I therefore > assuming their cases where it won't call markAlive when other nodes consider > the node UP but its marked DOWN by a node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18866) Node sends multiple inflight echos
[ https://issues.apache.org/jira/browse/CASSANDRA-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768462#comment-17768462 ] Cameron Zemek commented on CASSANDRA-18866: --- Had to make the following change for some more dtests: Previous: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } {code} After: {code:java} @Override public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason) { if (isEnabled()) { logger.trace("Resending ECHO_REQ to {}", addr); Message echoMessage = Message.out(ECHO_REQ, noPayload); MessagingService.instance().sendWithCallback(echoMessage, addr, this); } else { logger.trace("Failed ECHO_REQ to {}, aborting due to disabled gossip", addr); } } {code} > Node sends multiple inflight echos > -- > > Key: CASSANDRA-18866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18866 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18866-regression.patch, duplicates.log, echo.log > > > CASSANDRA-18854 rolled back the changes from CASSANDRA-18845. In particular, > 18845 had change to only allow 1 inflight ECHO request at a time. As per > 18854 some tests have an error rate due to this change. Creating this ticket > to discuss this further. As the current state also does not have retry logic, > it just allowing multiple ECHO requests inflight at the same time so less > likely that all ECHO will timeout or get lost. > With the change from 18845 adding in some extra logging to track what is > going on, I do see it retrying ECHOs. Likewise, I patched a node to drop ECHO > requests from a node and also see it retrying ECHOs when it doesn't get a > reply. > Therefore, I think the problem is more specific than the dropping of one ECHO > request. Yes there no retry logic for failed ECHO requests, but this is the > case even both before and after 18845. ECHO requests are only sent via gossip > verb handlers calling applyStateLocally. In these failed tests I therefore > assuming their cases where it won't call markAlive when other nodes consider > the node UP but its marked DOWN by a node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17992) Upgrade Netty on 5.0
[ https://issues.apache.org/jira/browse/CASSANDRA-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-17992: Test and Documentation Plan: Run CI and check regressions; check release notes (was: Run CI and check regressions) > Upgrade Netty on 5.0 > > > Key: CASSANDRA-17992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17992 > Project: Cassandra > Issue Type: Task > Components: Dependencies >Reporter: Ekaterina Dimitrova >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 5.0, 5.0-alpha1 > > Attachments: important-netty-inter-releases.md, > netty-release-notes-filtered.md, netty-release-notes.md, signature.asc > > Time Spent: 1h > Remaining Estimate: 0h > > I haven't been able to identify from the Netty docs which was the lowest > version where JDK17 was added but we are about 40 versions behind in netty 4 > so I suspect we better update. > -We need to consider there was an issue with class cast exceptions when > building with JDK17 with newer versions of netty (the newest available in > March 2022). For the record, we didn't see those when running CI on JDK8 and > JDK11. We also need to carefully revise the changes between the netty > versions. -->- CASSANDRA-18180 > Upgrading will cover also a fix in netty that was discussed in > [this|https://the-asf.slack.com/archives/CK23JSY2K/p1665567660202989] ASF > Slack thread. > CC [~benedict] , [~aleksey] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17992) Upgrade Netty on 5.0
[ https://issues.apache.org/jira/browse/CASSANDRA-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-17992: Test and Documentation Plan: Run CI and check regressions (was: Run regressions) > Upgrade Netty on 5.0 > > > Key: CASSANDRA-17992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17992 > Project: Cassandra > Issue Type: Task > Components: Dependencies >Reporter: Ekaterina Dimitrova >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 5.0, 5.0-alpha1 > > Attachments: important-netty-inter-releases.md, > netty-release-notes-filtered.md, netty-release-notes.md, signature.asc > > Time Spent: 1h > Remaining Estimate: 0h > > I haven't been able to identify from the Netty docs which was the lowest > version where JDK17 was added but we are about 40 versions behind in netty 4 > so I suspect we better update. > -We need to consider there was an issue with class cast exceptions when > building with JDK17 with newer versions of netty (the newest available in > March 2022). For the record, we didn't see those when running CI on JDK8 and > JDK11. We also need to carefully revise the changes between the netty > versions. -->- CASSANDRA-18180 > Upgrading will cover also a fix in netty that was discussed in > [this|https://the-asf.slack.com/archives/CK23JSY2K/p1665567660202989] ASF > Slack thread. > CC [~benedict] , [~aleksey] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18877) remove bytebuddy / byteman from production classpath and remove compress-lzf dependency from build deps
[ https://issues.apache.org/jira/browse/CASSANDRA-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768416#comment-17768416 ] Stefan Miklosovic edited comment on CASSANDRA-18877 at 9/24/23 4:34 PM: In (1), it was using LZFOutputStream in StreamWriter which just got removed. That library was also purposefuly removed from the repository. I am not sure how it got back but it is suspicious that it was never removed from the dependency management (2) and then when Maven / Ant resolver stuff was introduced it was probably just resurrected because of that. (1) https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb (2) https://github.com/apache/cassandra/blob/fc92db2b9b56c143516026ba29cecdec37e286bb/build.xml#L362 was (Author: smiklosovic): In (1), it was using LZFOutputStream in StreamWriter which just got removed. That library was also purposefuly removed from the repository. I am not sure how it got back but it is suspicious that it was never removed from the dependency management (2) and then when Maven / Ant resolver stuff was introduced it was just resurrected. (1) https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb (2) https://github.com/apache/cassandra/blob/fc92db2b9b56c143516026ba29cecdec37e286bb/build.xml#L362 > remove bytebuddy / byteman from production classpath and remove compress-lzf > dependency from build deps > --- > > Key: CASSANDRA-18877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18877 > Project: Cassandra > Issue Type: Task > Components: Build >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > > I was digging in the project deps and if you compare all libs in "libs" dir > and all libs in "build/lib/jars", there are indeed some differences which are > OK however in build/lib/jars there are also libraries for byteman and > byte-buddy. This is clearly wrong as these dependecies should not be > accessible from the production code, only from tests. > The reason they are accessible in prod code is that there is the class > TestRateLimiter (1). I do not have a clue why that class is in the prod code > in the first place. The only place it is referenced in is here (2) but that > byteman script is not loaded anywhere in tests. I was also checking Python > dtests. > I think this is some leftover or something like "I will keep it here when I > need it", but as nobody seems to do, I strongly advocate for removing it and > making bytebuddy and byteman only test scoped dependencies as it should be. > A reader who pays attention notices that these dependencies are of provided > scope which is a trick to have it compilable but not among the libraries in > the production runtime and it does not do any harm as it is never invoked > from the production code (if it was, it would fail on missing imports) > neverthless this is still an issue which should be addressed. We were doing > something similar with assertj dependency recently. > The second issue is that there is a dependency on compress-lzf in build > dependencies. This is not necessary either as that library was removed from > the repository in (3) but it still somehow leaked to the build process again. > (1) > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/TestRateLimiter.java > (2) > https://github.com/apache/cassandra/blob/trunk/test/resources/byteman/mutation_limiter.btm > (3) > https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18877) remove bytebuddy / byteman from production classpath and remove compress-lzf dependency from build deps
[ https://issues.apache.org/jira/browse/CASSANDRA-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768416#comment-17768416 ] Stefan Miklosovic commented on CASSANDRA-18877: --- In (1), it was using LZFOutputStream in StreamWriter which just got removed. That library was also purposefuly removed from the repository. I am not sure how it got back but it is suspicious that it was never removed from the dependency management (2) and then when Maven / Ant resolver stuff was introduced it was just resurrected. (1) https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb (2) https://github.com/apache/cassandra/blob/fc92db2b9b56c143516026ba29cecdec37e286bb/build.xml#L362 > remove bytebuddy / byteman from production classpath and remove compress-lzf > dependency from build deps > --- > > Key: CASSANDRA-18877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18877 > Project: Cassandra > Issue Type: Task > Components: Build >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > > I was digging in the project deps and if you compare all libs in "libs" dir > and all libs in "build/lib/jars", there are indeed some differences which are > OK however in build/lib/jars there are also libraries for byteman and > byte-buddy. This is clearly wrong as these dependecies should not be > accessible from the production code, only from tests. > The reason they are accessible in prod code is that there is the class > TestRateLimiter (1). I do not have a clue why that class is in the prod code > in the first place. The only place it is referenced in is here (2) but that > byteman script is not loaded anywhere in tests. I was also checking Python > dtests. > I think this is some leftover or something like "I will keep it here when I > need it", but as nobody seems to do, I strongly advocate for removing it and > making bytebuddy and byteman only test scoped dependencies as it should be. > A reader who pays attention notices that these dependencies are of provided > scope which is a trick to have it compilable but not among the libraries in > the production runtime and it does not do any harm as it is never invoked > from the production code (if it was, it would fail on missing imports) > neverthless this is still an issue which should be addressed. We were doing > something similar with assertj dependency recently. > The second issue is that there is a dependency on compress-lzf in build > dependencies. This is not necessary either as that library was removed from > the repository in (3) but it still somehow leaked to the build process again. > (1) > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/TestRateLimiter.java > (2) > https://github.com/apache/cassandra/blob/trunk/test/resources/byteman/mutation_limiter.btm > (3) > https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18873) Fix broken JMH benchmarks
[ https://issues.apache.org/jira/browse/CASSANDRA-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-18873: -- Description: The following benchmarks are broken: * {{ZeroCopyStreamingBench}} * {{MutationBench}} * {{FastThreadLocalBench}} * {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins) * {{ReadSmallPartitionsBench}} Additionally, those benchmarks take too much time to run: * {{BTreeUpdateBench}} ~ 58 hours * {{AtomicBTreePartitionUpdateBench}} ~ 5 hours * {{BTreeTransformBench}} ~ 2.5 hours Here the complete list of estimated benchmark times: {noformat} Estimated time for CacheLoaderBench: ~5 s Estimated time for LatencyTrackingBench: ~26 s Estimated time for SampleBench: ~30 s Estimated time for ReadWriteBench: ~30 s Estimated time for MutationBench: ~30 s Estimated time for CompactionBench: ~35 s Estimated time for DiagnosticEventPersistenceBench: ~40 s Estimated time for ZeroCopyStreamingBench: ~44 s Estimated time for BatchStatementBench: ~110 s Estimated time for DiagnosticEventServiceBench: ~120 s Estimated time for MessageOutBench: ~144 s Estimated time for BloomFilterSerializerBench: ~144 s Estimated time for FastThreadLocalBench: ~156 s Estimated time for HashingBench: ~156 s Estimated time for ChecksumBench: ~208 s Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s Estimated time for PendingRangesBench: ~ 5 m Estimated time for DirectorySizerBench: ~ 5 m Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m Estimated time for PreaggregatedByteBufsBench: ~ 7 m Estimated time for AutoBoxingBench: ~ 8 m Estimated time for OutputStreamBench: ~ 13 m Estimated time for BTreeBuildBench: ~ 13 m Estimated time for StringsEncodeBench: ~ 20 m Estimated time for instance.ReadWidePartitionsBench: ~ 21 m Estimated time for btree.BTreeBuildBench: ~ 30 m Estimated time for BTreeSearchIteratorBench: ~ 31 m Estimated time for btree.BTreeTransformBench: ~ 138 m Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m Estimated time for btree.BTreeUpdateBench: ~58 h Total estimated time: ~69 h {noformat} I'd like to add a test which estimates the benchmark times and fails if a single benchmark estimated run time is longer than xxx minutes. was: The following benchmarks are broken: * {{ZeroCopyStreamingBench}} * {{MutationBench}} * {{FastThreadLocalBench}} * {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins) Additionally, those benchmarks take too much time to run: * {{BTreeUpdateBench}} ~ 58 hours * {{AtomicBTreePartitionUpdateBench}} ~ 5 hours * {{BTreeTransformBench}} ~ 2.5 hours Here the complete list of estimated benchmark times: {noformat} Estimated time for CacheLoaderBench: ~5 s Estimated time for LatencyTrackingBench: ~26 s Estimated time for SampleBench: ~30 s Estimated time for ReadWriteBench: ~30 s Estimated time for MutationBench: ~30 s Estimated time for CompactionBench: ~35 s Estimated time for DiagnosticEventPersistenceBench: ~40 s Estimated time for ZeroCopyStreamingBench: ~44 s Estimated time for BatchStatementBench: ~110 s Estimated time for DiagnosticEventServiceBench: ~120 s Estimated time for MessageOutBench: ~144 s Estimated time for BloomFilterSerializerBench: ~144 s Estimated time for FastThreadLocalBench: ~156 s Estimated time for HashingBench: ~156 s Estimated time for ChecksumBench: ~208 s Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s Estimated time for PendingRangesBench: ~ 5 m Estimated time for DirectorySizerBench: ~ 5 m Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m Estimated time for PreaggregatedByteBufsBench: ~ 7 m Estimated time for AutoBoxingBench: ~ 8 m Estimated time for OutputStreamBench: ~ 13 m Estimated time for BTreeBuildBench: ~ 13 m Estimated time for StringsEncodeBench: ~ 20 m Estimated time for instance.ReadWidePartitionsBench: ~ 21 m Estimated time for btree.BTreeBuildBench: ~ 30 m Estimated time for BTreeSearchIteratorBench: ~ 31 m Estimated time for btree.BTreeTransformBench: ~ 138 m Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m Estimated time for btree.BTreeUpdateBench: ~58 h Total estimated time: ~69 h {noformat} I'd like to add a test which estimates the benchmark times and fails if a single benchmark estimated run time is longer than xxx minutes. > Fix broken JMH benchmarks > - > > Key: CASSANDRA-18873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18873 > Project: Cassandra > Issue Type: Bug > Components: Test/benchmark >Reporter: Jacek Lewandowski >Priority: Normal > Attachments: BenchTimeTest.java > > > The following benchmarks are broken: > * {{ZeroCopyStreamingBench}} > * {{MutationBench}} > * {{FastThreadLocalBench}} > * {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins) > * {{ReadSmallPartitionsBench}} >
[jira] [Commented] (CASSANDRA-18877) remove bytebuddy / byteman from production classpath and remove compress-lzf dependency from build deps
[ https://issues.apache.org/jira/browse/CASSANDRA-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768367#comment-17768367 ] Michael Semb Wever commented on CASSANDRA-18877: bq. The second issue is that there is a dependency on compress-lzf note, the LZ4Compressor uses lz4-java > remove bytebuddy / byteman from production classpath and remove compress-lzf > dependency from build deps > --- > > Key: CASSANDRA-18877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18877 > Project: Cassandra > Issue Type: Task > Components: Build >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > > I was digging in the project deps and if you compare all libs in "libs" dir > and all libs in "build/lib/jars", there are indeed some differences which are > OK however in build/lib/jars there are also libraries for byteman and > byte-buddy. This is clearly wrong as these dependecies should not be > accessible from the production code, only from tests. > The reason they are accessible in prod code is that there is the class > TestRateLimiter (1). I do not have a clue why that class is in the prod code > in the first place. The only place it is referenced in is here (2) but that > byteman script is not loaded anywhere in tests. I was also checking Python > dtests. > I think this is some leftover or something like "I will keep it here when I > need it", but as nobody seems to do, I strongly advocate for removing it and > making bytebuddy and byteman only test scoped dependencies as it should be. > A reader who pays attention notices that these dependencies are of provided > scope which is a trick to have it compilable but not among the libraries in > the production runtime and it does not do any harm as it is never invoked > from the production code (if it was, it would fail on missing imports) > neverthless this is still an issue which should be addressed. We were doing > something similar with assertj dependency recently. > The second issue is that there is a dependency on compress-lzf in build > dependencies. This is not necessary either as that library was removed from > the repository in (3) but it still somehow leaked to the build process again. > (1) > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/TestRateLimiter.java > (2) > https://github.com/apache/cassandra/blob/trunk/test/resources/byteman/mutation_limiter.btm > (3) > https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18877) remove bytebuddy / byteman from production classpath and remove compress-lzf dependency from build deps
[ https://issues.apache.org/jira/browse/CASSANDRA-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768366#comment-17768366 ] Michael Semb Wever commented on CASSANDRA-18877: We sure it wasn't (or isn't) needed as part of the build process. build/lib/jars will always contain more for this reason. > remove bytebuddy / byteman from production classpath and remove compress-lzf > dependency from build deps > --- > > Key: CASSANDRA-18877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18877 > Project: Cassandra > Issue Type: Task > Components: Build >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > > I was digging in the project deps and if you compare all libs in "libs" dir > and all libs in "build/lib/jars", there are indeed some differences which are > OK however in build/lib/jars there are also libraries for byteman and > byte-buddy. This is clearly wrong as these dependecies should not be > accessible from the production code, only from tests. > The reason they are accessible in prod code is that there is the class > TestRateLimiter (1). I do not have a clue why that class is in the prod code > in the first place. The only place it is referenced in is here (2) but that > byteman script is not loaded anywhere in tests. I was also checking Python > dtests. > I think this is some leftover or something like "I will keep it here when I > need it", but as nobody seems to do, I strongly advocate for removing it and > making bytebuddy and byteman only test scoped dependencies as it should be. > A reader who pays attention notices that these dependencies are of provided > scope which is a trick to have it compilable but not among the libraries in > the production runtime and it does not do any harm as it is never invoked > from the production code (if it was, it would fail on missing imports) > neverthless this is still an issue which should be addressed. We were doing > something similar with assertj dependency recently. > The second issue is that there is a dependency on compress-lzf in build > dependencies. This is not necessary either as that library was removed from > the repository in (3) but it still somehow leaked to the build process again. > (1) > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/TestRateLimiter.java > (2) > https://github.com/apache/cassandra/blob/trunk/test/resources/byteman/mutation_limiter.btm > (3) > https://github.com/apache/cassandra/commit/fc92db2b9b56c143516026ba29cecdec37e286bb -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14667) Upgrade Dropwizard Metrics to 4.x
[ https://issues.apache.org/jira/browse/CASSANDRA-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768364#comment-17768364 ] Michael Semb Wever commented on CASSANDRA-14667: bq. ant resolver plugin has some bugs in it https://issues.apache.org/jira/browse/CASSANDRA-18049?focusedCommentId=17706782=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17706782 > Upgrade Dropwizard Metrics to 4.x > - > > Key: CASSANDRA-14667 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14667 > Project: Cassandra > Issue Type: Task > Components: Observability/Metrics >Reporter: Stig Rohde Døssing >Assignee: Maxim Muzafarov >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: signature.asc, signature.asc, signature.asc, > signature.asc > > Time Spent: 1.5h > Remaining Estimate: 0h > > Cassandra currently uses Metrics 3.1.5. Version 4.0.0 added some fixes for > Java 9 compatibility. It would be good to upgrade the Metrics library as part > of the version of Cassandra that adds Java 9 compatibility > (https://issues.apache.org/jira/browse/CASSANDRA-9608). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18871) JMH benchmark improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-18871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768331#comment-17768331 ] Jacek Lewandowski edited comment on CASSANDRA-18871 at 9/24/23 7:23 AM: So the build failed, probably it was running for too long. I looked into the logs to figure out why it takes so long and learned that there are benchmarks which takes extremely long to run. {{btree.BTreeTransformBench}}, params = 7x9x2=63, x4 forks = 252, x3 methods = 756 tests x 11s each ~= 2h 20m {{btree.BTreeUpdateBench}}, params = 7x7x3x2x2x3=1764, x4 forks = 7056, x1 method = 7056 tests x 30s each ~= 59h I'm going to exclude the longest benchmarks for now and create a ticket to fix it later - see CASSANDRA-18873. Those benchmarks are still ok to run locally with {{-Dbenchmark.name=...}} btw. I've implemented a test which estimates benchmark run times according to the number of forks, warups and measurement iterations number and time, and the number of parameter combinations. The results are as follows: {noformat} Estimated time for CacheLoaderBench: ~5 s Estimated time for LatencyTrackingBench: ~26 s Estimated time for SampleBench: ~30 s Estimated time for ReadWriteBench: ~30 s Estimated time for MutationBench: ~30 s Estimated time for CompactionBench: ~35 s Estimated time for DiagnosticEventPersistenceBench: ~40 s Estimated time for ZeroCopyStreamingBench: ~44 s Estimated time for BatchStatementBench: ~110 s Estimated time for DiagnosticEventServiceBench: ~120 s Estimated time for MessageOutBench: ~144 s Estimated time for BloomFilterSerializerBench: ~144 s Estimated time for FastThreadLocalBench: ~156 s Estimated time for HashingBench: ~156 s Estimated time for ChecksumBench: ~208 s Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s Estimated time for PendingRangesBench: ~ 5 m Estimated time for DirectorySizerBench: ~ 5 m Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m Estimated time for PreaggregatedByteBufsBench: ~ 7 m Estimated time for AutoBoxingBench: ~ 8 m Estimated time for OutputStreamBench: ~ 13 m Estimated time for BTreeBuildBench: ~ 13 m Estimated time for StringsEncodeBench: ~ 20 m Estimated time for instance.ReadWidePartitionsBench: ~ 21 m Estimated time for btree.BTreeBuildBench: ~ 30 m Estimated time for BTreeSearchIteratorBench: ~ 31 m Estimated time for btree.BTreeTransformBench: ~ 138 m Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m Estimated time for btree.BTreeUpdateBench: ~58 h Total estimated time: ~69 h {noformat} We can make it assert that no benchmark is planned to run longer than 30 minutes (but as said, a separate ticket) was (Author: jlewandowski): So the build failed, probably it was running for too long. I looked into the logs to figure out why it takes so long and learned that there are benchmarks which takes extremely long to run. {{btree.BTreeTransformBench}}, params = 7x9x2=63, x4 forks = 252, x3 methods = 756 tests x 11s each ~= 2h 20m {{btree.BTreeUpdateBench}}, params = 7x7x3x2x2x3=1764, x4 forks = 7056, x1 method = 7056 tests x 30s each ~= 59h To me, the other one is unacceptable for CI. We need to reduce the number of parameters, and also, set the number of forks to 1 (probably for each test). I'm going to exclude the benchmark for now and create a ticket to fix it later (I'm going to do that for each benchmark which causes CI to fail). Those benchmarks are still ok to run locally with {{-Dbenchmark.name=...}} btw. I've implemented a test which estimates benchmark run times according to the number of forks, warups and measurement iterations number and time, and the number of parameter combinations. The results are as follows: {noformat} Estimated time for CacheLoaderBench: ~5 s Estimated time for LatencyTrackingBench: ~26 s Estimated time for SampleBench: ~30 s Estimated time for ReadWriteBench: ~30 s Estimated time for MutationBench: ~30 s Estimated time for CompactionBench: ~35 s Estimated time for DiagnosticEventPersistenceBench: ~40 s Estimated time for ZeroCopyStreamingBench: ~44 s Estimated time for BatchStatementBench: ~110 s Estimated time for DiagnosticEventServiceBench: ~120 s Estimated time for MessageOutBench: ~144 s Estimated time for BloomFilterSerializerBench: ~144 s Estimated time for FastThreadLocalBench: ~156 s Estimated time for HashingBench: ~156 s Estimated time for ChecksumBench: ~208 s Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s Estimated time for PendingRangesBench: ~ 5 m Estimated time for DirectorySizerBench: ~ 5 m Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m Estimated time for PreaggregatedByteBufsBench: ~ 7 m Estimated time for AutoBoxingBench: ~ 8 m Estimated time for OutputStreamBench: ~ 13 m Estimated time for BTreeBuildBench: ~ 13 m Estimated time for StringsEncodeBench: ~ 20 m Estimated time for
[jira] [Updated] (CASSANDRA-18873) Fix broken JMH benchmarks
[ https://issues.apache.org/jira/browse/CASSANDRA-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-18873: -- Attachment: BenchTimeTest.java > Fix broken JMH benchmarks > - > > Key: CASSANDRA-18873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18873 > Project: Cassandra > Issue Type: Bug > Components: Test/benchmark >Reporter: Jacek Lewandowski >Priority: Normal > Attachments: BenchTimeTest.java > > > The following benchmarks are broken: > * {{ZeroCopyStreamingBench}} > * {{MutationBench}} > * {{FastThreadLocalBench}} > * {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins) > Additionally, those benchmarks take too much time to run: > * {{BTreeUpdateBench}} ~ 58 hours > * {{AtomicBTreePartitionUpdateBench}} ~ 5 hours > * {{BTreeTransformBench}} ~ 2.5 hours > Here the complete list of estimated benchmark times: > {noformat} > Estimated time for CacheLoaderBench: ~5 s > Estimated time for LatencyTrackingBench: ~26 s > Estimated time for SampleBench: ~30 s > Estimated time for ReadWriteBench: ~30 s > Estimated time for MutationBench: ~30 s > Estimated time for CompactionBench: ~35 s > Estimated time for DiagnosticEventPersistenceBench: ~40 s > Estimated time for ZeroCopyStreamingBench: ~44 s > Estimated time for BatchStatementBench: ~110 s > Estimated time for DiagnosticEventServiceBench: ~120 s > Estimated time for MessageOutBench: ~144 s > Estimated time for BloomFilterSerializerBench: ~144 s > Estimated time for FastThreadLocalBench: ~156 s > Estimated time for HashingBench: ~156 s > Estimated time for ChecksumBench: ~208 s > Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s > Estimated time for PendingRangesBench: ~ 5 m > Estimated time for DirectorySizerBench: ~ 5 m > Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m > Estimated time for PreaggregatedByteBufsBench: ~ 7 m > Estimated time for AutoBoxingBench: ~ 8 m > Estimated time for OutputStreamBench: ~ 13 m > Estimated time for BTreeBuildBench: ~ 13 m > Estimated time for StringsEncodeBench: ~ 20 m > Estimated time for instance.ReadWidePartitionsBench: ~ 21 m > Estimated time for btree.BTreeBuildBench: ~ 30 m > Estimated time for BTreeSearchIteratorBench: ~ 31 m > Estimated time for btree.BTreeTransformBench: ~ 138 m > Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m > Estimated time for btree.BTreeUpdateBench: ~58 h > Total estimated time: ~69 h > {noformat} > I'd like to add a test which estimates the benchmark times and fails if a > single benchmark estimated run time is longer than xxx minutes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18873) Fix broken JMH benchmarks
[ https://issues.apache.org/jira/browse/CASSANDRA-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Lewandowski updated CASSANDRA-18873: -- Description: The following benchmarks are broken: * {{ZeroCopyStreamingBench}} * {{MutationBench}} * {{FastThreadLocalBench}} * {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins) Additionally, those benchmarks take too much time to run: * {{BTreeUpdateBench}} ~ 58 hours * {{AtomicBTreePartitionUpdateBench}} ~ 5 hours * {{BTreeTransformBench}} ~ 2.5 hours Here the complete list of estimated benchmark times: {noformat} Estimated time for CacheLoaderBench: ~5 s Estimated time for LatencyTrackingBench: ~26 s Estimated time for SampleBench: ~30 s Estimated time for ReadWriteBench: ~30 s Estimated time for MutationBench: ~30 s Estimated time for CompactionBench: ~35 s Estimated time for DiagnosticEventPersistenceBench: ~40 s Estimated time for ZeroCopyStreamingBench: ~44 s Estimated time for BatchStatementBench: ~110 s Estimated time for DiagnosticEventServiceBench: ~120 s Estimated time for MessageOutBench: ~144 s Estimated time for BloomFilterSerializerBench: ~144 s Estimated time for FastThreadLocalBench: ~156 s Estimated time for HashingBench: ~156 s Estimated time for ChecksumBench: ~208 s Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s Estimated time for PendingRangesBench: ~ 5 m Estimated time for DirectorySizerBench: ~ 5 m Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m Estimated time for PreaggregatedByteBufsBench: ~ 7 m Estimated time for AutoBoxingBench: ~ 8 m Estimated time for OutputStreamBench: ~ 13 m Estimated time for BTreeBuildBench: ~ 13 m Estimated time for StringsEncodeBench: ~ 20 m Estimated time for instance.ReadWidePartitionsBench: ~ 21 m Estimated time for btree.BTreeBuildBench: ~ 30 m Estimated time for BTreeSearchIteratorBench: ~ 31 m Estimated time for btree.BTreeTransformBench: ~ 138 m Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m Estimated time for btree.BTreeUpdateBench: ~58 h Total estimated time: ~69 h {noformat} I'd like to add a test which estimates the benchmark times and fails if a single benchmark estimated run time is longer than xxx minutes. was: ZeroCopyStreamingBench MutationBench FastThreadLocalBench AtomicBTreePartitionUpdateBench (OOM on Jenkins) > Fix broken JMH benchmarks > - > > Key: CASSANDRA-18873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18873 > Project: Cassandra > Issue Type: Bug > Components: Test/benchmark >Reporter: Jacek Lewandowski >Priority: Normal > > The following benchmarks are broken: > * {{ZeroCopyStreamingBench}} > * {{MutationBench}} > * {{FastThreadLocalBench}} > * {{AtomicBTreePartitionUpdateBench}} (OOM on Jenkins) > Additionally, those benchmarks take too much time to run: > * {{BTreeUpdateBench}} ~ 58 hours > * {{AtomicBTreePartitionUpdateBench}} ~ 5 hours > * {{BTreeTransformBench}} ~ 2.5 hours > Here the complete list of estimated benchmark times: > {noformat} > Estimated time for CacheLoaderBench: ~5 s > Estimated time for LatencyTrackingBench: ~26 s > Estimated time for SampleBench: ~30 s > Estimated time for ReadWriteBench: ~30 s > Estimated time for MutationBench: ~30 s > Estimated time for CompactionBench: ~35 s > Estimated time for DiagnosticEventPersistenceBench: ~40 s > Estimated time for ZeroCopyStreamingBench: ~44 s > Estimated time for BatchStatementBench: ~110 s > Estimated time for DiagnosticEventServiceBench: ~120 s > Estimated time for MessageOutBench: ~144 s > Estimated time for BloomFilterSerializerBench: ~144 s > Estimated time for FastThreadLocalBench: ~156 s > Estimated time for HashingBench: ~156 s > Estimated time for ChecksumBench: ~208 s > Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s > Estimated time for PendingRangesBench: ~ 5 m > Estimated time for DirectorySizerBench: ~ 5 m > Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m > Estimated time for PreaggregatedByteBufsBench: ~ 7 m > Estimated time for AutoBoxingBench: ~ 8 m > Estimated time for OutputStreamBench: ~ 13 m > Estimated time for BTreeBuildBench: ~ 13 m > Estimated time for StringsEncodeBench: ~ 20 m > Estimated time for instance.ReadWidePartitionsBench: ~ 21 m > Estimated time for btree.BTreeBuildBench: ~ 30 m > Estimated time for BTreeSearchIteratorBench: ~ 31 m > Estimated time for btree.BTreeTransformBench: ~ 138 m > Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m > Estimated time for btree.BTreeUpdateBench: ~58 h > Total estimated time: ~69 h > {noformat} > I'd like to add a test which estimates the benchmark times and fails if a > single benchmark estimated run time is longer than xxx minutes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CASSANDRA-18871) JMH benchmark improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-18871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768331#comment-17768331 ] Jacek Lewandowski edited comment on CASSANDRA-18871 at 9/24/23 7:03 AM: So the build failed, probably it was running for too long. I looked into the logs to figure out why it takes so long and learned that there are benchmarks which takes extremely long to run. {{btree.BTreeTransformBench}}, params = 7x9x2=63, x4 forks = 252, x3 methods = 756 tests x 11s each ~= 2h 20m {{btree.BTreeUpdateBench}}, params = 7x7x3x2x2x3=1764, x4 forks = 7056, x1 method = 7056 tests x 30s each ~= 59h To me, the other one is unacceptable for CI. We need to reduce the number of parameters, and also, set the number of forks to 1 (probably for each test). I'm going to exclude the benchmark for now and create a ticket to fix it later (I'm going to do that for each benchmark which causes CI to fail). Those benchmarks are still ok to run locally with {{-Dbenchmark.name=...}} btw. I've implemented a test which estimates benchmark run times according to the number of forks, warups and measurement iterations number and time, and the number of parameter combinations. The results are as follows: {noformat} Estimated time for CacheLoaderBench: ~5 s Estimated time for LatencyTrackingBench: ~26 s Estimated time for SampleBench: ~30 s Estimated time for ReadWriteBench: ~30 s Estimated time for MutationBench: ~30 s Estimated time for CompactionBench: ~35 s Estimated time for DiagnosticEventPersistenceBench: ~40 s Estimated time for ZeroCopyStreamingBench: ~44 s Estimated time for BatchStatementBench: ~110 s Estimated time for DiagnosticEventServiceBench: ~120 s Estimated time for MessageOutBench: ~144 s Estimated time for BloomFilterSerializerBench: ~144 s Estimated time for FastThreadLocalBench: ~156 s Estimated time for HashingBench: ~156 s Estimated time for ChecksumBench: ~208 s Estimated time for StreamingTombstoneHistogramBuilderBench: ~208 s Estimated time for PendingRangesBench: ~ 5 m Estimated time for DirectorySizerBench: ~ 5 m Estimated time for instance.ReadSmallPartitionsBench: ~ 5 m Estimated time for PreaggregatedByteBufsBench: ~ 7 m Estimated time for AutoBoxingBench: ~ 8 m Estimated time for OutputStreamBench: ~ 13 m Estimated time for BTreeBuildBench: ~ 13 m Estimated time for StringsEncodeBench: ~ 20 m Estimated time for instance.ReadWidePartitionsBench: ~ 21 m Estimated time for btree.BTreeBuildBench: ~ 30 m Estimated time for BTreeSearchIteratorBench: ~ 31 m Estimated time for btree.BTreeTransformBench: ~ 138 m Estimated time for btree.AtomicBTreePartitionUpdateBench: ~ 288 m Estimated time for btree.BTreeUpdateBench: ~58 h Total estimated time: ~69 h {noformat} We can make it assert that no benchmark is planned to run longer than 30 minutes (but as said, a separate ticket) was (Author: jlewandowski): So the build failed, probably it was running for too long. I looked into the logs to figure out why it takes so long and learned that there are benchmarks which takes extremely long to run. {{btree.BTreeTransformBench}}, params = 7x9x2=63, x4 forks = 252, x3 methods = 756 tests x 11s each ~= 2h 20m {{btree.BTreeUpdateBench}}, params = 7x7x3x2x2x3=1764, x4 forks = 7056, x1 method = 7056 tests x 30s each ~= 59h To me, the other one is unacceptable for CI. We need to reduce the number of parameters, and also, set the number of forks to 1 (probably for each test). I'm going to exclude the benchmark for now and create a ticket to fix it later (I'm going to do that for each benchmark which causes CI to fail). Those benchmarks are still ok to run locally with {{-Dbenchmark.name=...}} > JMH benchmark improvements > -- > > Key: CASSANDRA-18871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18871 > Project: Cassandra > Issue Type: Improvement > Components: Build, Legacy/Tools >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.1 > > Time Spent: 50m > Remaining Estimate: 0h > > 1. CASSANDRA-12586 introduced {{build-jmh}} task which builds uber jar for > JMH benchmarks which is then not used with {{ant microbench}} task. It is > used though by the {{test/bin/jmh}} script. > In fact, I have no idea why we should use uber jar if JMH can perfectly run > with a regular classpath. Maybe that had something to do with older JMH > version which was used that time. Building uber jars takes time and is > annoying. Since it seems to be redundant anyway, I'm going to remove it and > fix {{test/bin/jmh}} to use a regular classpath. > 2. I'll add support for async profiler in benchmarks. That is, the > {{microbench}} target automatically fetches the async profiler