[jira] [Commented] (CASSANDRA-19579) threads lingering after driver shutdown: session close starts thread and doesn't await its stop

2024-04-29 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841964#comment-17841964
 ] 

Brandon Williams commented on CASSANDRA-19579:
--

/cc [~absurdfarce] (sorry not sure how better to get these on the driver radar)

> threads lingering after driver shutdown: session close starts thread and 
> doesn't await its stop
> ---
>
> Key: CASSANDRA-19579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19579
> Project: Cassandra
>  Issue Type: Bug
>  Components: Client/java-driver
>Reporter: Thomas Klambauer
>Priority: Normal
>
> We are checking remaining/lingering threads during shutdown.
> we noticed some with naming pattern/thread factory: 
> ""globalEventExecutor-1-2" Id=146 TIMED_WAITING"
> this one seems to be created during shutdown / session close and not 
> awaited/shut down:
> {noformat}
> addTask:156, GlobalEventExecutor (io.netty.util.concurrent)
> execute0:225, GlobalEventExecutor (io.netty.util.concurrent)
> execute:221, GlobalEventExecutor (io.netty.util.concurrent)
> onClose:188, DefaultNettyOptions 
> (com.datastax.oss.driver.internal.core.context)
> onChildrenClosed:589, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> lambda$close$9:552, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> run:-1, 860270832 
> (com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded$$Lambda$9508)
> tryFire$$$capture:783, CompletableFuture$UniRun (java.util.concurrent)
> tryFire:-1, CompletableFuture$UniRun (java.util.concurrent)
>  - Async stack trace
> addTask:-1, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:836, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute0:827, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:817, SingleThreadEventExecutor (io.netty.util.concurrent)
> claim:568, CompletableFuture$UniCompletion (java.util.concurrent)
> tryFire$$$capture:780, CompletableFuture$UniRun (java.util.concurrent)
> tryFire:-1, CompletableFuture$UniRun (java.util.concurrent)
>  - Async stack trace
> :767, CompletableFuture$UniRun (java.util.concurrent)
> uniRunStage:801, CompletableFuture (java.util.concurrent)
> thenRunAsync:2136, CompletableFuture (java.util.concurrent)
> thenRunAsync:143, CompletableFuture (java.util.concurrent)
> whenAllDone:75, CompletableFutures 
> (com.datastax.oss.driver.internal.core.util.concurrent)
> close:551, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> access$1000:300, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> lambda$closeAsync$1:272, DefaultSession 
> (com.datastax.oss.driver.internal.core.session)
> runTask:98, PromiseTask (io.netty.util.concurrent)
> run:106, PromiseTask (io.netty.util.concurrent)
> runTask$$$capture:174, AbstractEventExecutor (io.netty.util.concurrent)
> runTask:-1, AbstractEventExecutor (io.netty.util.concurrent)
>  - Async stack trace
> addTask:-1, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:836, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute0:827, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:817, SingleThreadEventExecutor (io.netty.util.concurrent)
> submit:118, AbstractExecutorService (java.util.concurrent)
> submit:118, AbstractEventExecutor (io.netty.util.concurrent)
> on:57, RunOrSchedule (com.datastax.oss.driver.internal.core.util.concurrent)
> closeSafely:286, DefaultSession 
> (com.datastax.oss.driver.internal.core.session)
> closeAsync:272, DefaultSession (com.datastax.oss.driver.internal.core.session)
> close:76, AsyncAutoCloseable (com.datastax.oss.driver.api.core)
> -- custom shutdown code
> run:829, Thread (java.lang)
> {noformat}
> the initial close here is called on 
> com.datastax.oss.driver.api.core.CqlSession.
> netty framework suggests to call
> io.netty.util.concurrent.GlobalEventExecutor#awaitInactivity
> during shutdown to await event thread stopping
> (slightly related issue in netty: 
> [https://github.com/netty/netty/issues/2084] )
> suggestion to add maybe GlobalEventExecutor.INSTANCE.awaitInactivity with 
> some timeout during close around here:
> [https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/internal/core/context/DefaultNettyOptions.java#L199]
> noting that this might slow down closing for up to 2 seconds if the netty 
> issue comment is correct.
> this is on latest datastax java driver version: 4.17,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, 

[jira] [Updated] (CASSANDRA-19579) threads lingering after driver shutdown: session close starts thread and doesn't await its stop

2024-04-29 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19579:
-
 Bug Category: Parent values: Degradation(12984)
   Complexity: Normal
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> threads lingering after driver shutdown: session close starts thread and 
> doesn't await its stop
> ---
>
> Key: CASSANDRA-19579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19579
> Project: Cassandra
>  Issue Type: Bug
>  Components: Client/java-driver
>Reporter: Thomas Klambauer
>Priority: Normal
>
> We are checking remaining/lingering threads during shutdown.
> we noticed some with naming pattern/thread factory: 
> ""globalEventExecutor-1-2" Id=146 TIMED_WAITING"
> this one seems to be created during shutdown / session close and not 
> awaited/shut down:
> {noformat}
> addTask:156, GlobalEventExecutor (io.netty.util.concurrent)
> execute0:225, GlobalEventExecutor (io.netty.util.concurrent)
> execute:221, GlobalEventExecutor (io.netty.util.concurrent)
> onClose:188, DefaultNettyOptions 
> (com.datastax.oss.driver.internal.core.context)
> onChildrenClosed:589, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> lambda$close$9:552, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> run:-1, 860270832 
> (com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded$$Lambda$9508)
> tryFire$$$capture:783, CompletableFuture$UniRun (java.util.concurrent)
> tryFire:-1, CompletableFuture$UniRun (java.util.concurrent)
>  - Async stack trace
> addTask:-1, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:836, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute0:827, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:817, SingleThreadEventExecutor (io.netty.util.concurrent)
> claim:568, CompletableFuture$UniCompletion (java.util.concurrent)
> tryFire$$$capture:780, CompletableFuture$UniRun (java.util.concurrent)
> tryFire:-1, CompletableFuture$UniRun (java.util.concurrent)
>  - Async stack trace
> :767, CompletableFuture$UniRun (java.util.concurrent)
> uniRunStage:801, CompletableFuture (java.util.concurrent)
> thenRunAsync:2136, CompletableFuture (java.util.concurrent)
> thenRunAsync:143, CompletableFuture (java.util.concurrent)
> whenAllDone:75, CompletableFutures 
> (com.datastax.oss.driver.internal.core.util.concurrent)
> close:551, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> access$1000:300, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> lambda$closeAsync$1:272, DefaultSession 
> (com.datastax.oss.driver.internal.core.session)
> runTask:98, PromiseTask (io.netty.util.concurrent)
> run:106, PromiseTask (io.netty.util.concurrent)
> runTask$$$capture:174, AbstractEventExecutor (io.netty.util.concurrent)
> runTask:-1, AbstractEventExecutor (io.netty.util.concurrent)
>  - Async stack trace
> addTask:-1, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:836, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute0:827, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:817, SingleThreadEventExecutor (io.netty.util.concurrent)
> submit:118, AbstractExecutorService (java.util.concurrent)
> submit:118, AbstractEventExecutor (io.netty.util.concurrent)
> on:57, RunOrSchedule (com.datastax.oss.driver.internal.core.util.concurrent)
> closeSafely:286, DefaultSession 
> (com.datastax.oss.driver.internal.core.session)
> closeAsync:272, DefaultSession (com.datastax.oss.driver.internal.core.session)
> close:76, AsyncAutoCloseable (com.datastax.oss.driver.api.core)
> -- custom shutdown code
> run:829, Thread (java.lang)
> {noformat}
> the initial close here is called on 
> com.datastax.oss.driver.api.core.CqlSession.
> netty framework suggests to call
> io.netty.util.concurrent.GlobalEventExecutor#awaitInactivity
> during shutdown to await event thread stopping
> (slightly related issue in netty: 
> [https://github.com/netty/netty/issues/2084] )
> suggestion to add maybe GlobalEventExecutor.INSTANCE.awaitInactivity with 
> some timeout during close around here:
> [https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/internal/core/context/DefaultNettyOptions.java#L199]
> noting that this might slow down closing for up to 2 seconds if the netty 
> issue comment is correct.
> this is on latest datastax java driver version: 4.17,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Assigned] (CASSANDRA-19579) threads lingering after driver shutdown: session close starts thread and doesn't await its stop

2024-04-29 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reassigned CASSANDRA-19579:


Assignee: (was: Henry Hughes)

> threads lingering after driver shutdown: session close starts thread and 
> doesn't await its stop
> ---
>
> Key: CASSANDRA-19579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19579
> Project: Cassandra
>  Issue Type: Bug
>  Components: Client/java-driver
>Reporter: Thomas Klambauer
>Priority: Normal
>
> We are checking remaining/lingering threads during shutdown.
> we noticed some with naming pattern/thread factory: 
> ""globalEventExecutor-1-2" Id=146 TIMED_WAITING"
> this one seems to be created during shutdown / session close and not 
> awaited/shut down:
> {noformat}
> addTask:156, GlobalEventExecutor (io.netty.util.concurrent)
> execute0:225, GlobalEventExecutor (io.netty.util.concurrent)
> execute:221, GlobalEventExecutor (io.netty.util.concurrent)
> onClose:188, DefaultNettyOptions 
> (com.datastax.oss.driver.internal.core.context)
> onChildrenClosed:589, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> lambda$close$9:552, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> run:-1, 860270832 
> (com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded$$Lambda$9508)
> tryFire$$$capture:783, CompletableFuture$UniRun (java.util.concurrent)
> tryFire:-1, CompletableFuture$UniRun (java.util.concurrent)
>  - Async stack trace
> addTask:-1, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:836, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute0:827, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:817, SingleThreadEventExecutor (io.netty.util.concurrent)
> claim:568, CompletableFuture$UniCompletion (java.util.concurrent)
> tryFire$$$capture:780, CompletableFuture$UniRun (java.util.concurrent)
> tryFire:-1, CompletableFuture$UniRun (java.util.concurrent)
>  - Async stack trace
> :767, CompletableFuture$UniRun (java.util.concurrent)
> uniRunStage:801, CompletableFuture (java.util.concurrent)
> thenRunAsync:2136, CompletableFuture (java.util.concurrent)
> thenRunAsync:143, CompletableFuture (java.util.concurrent)
> whenAllDone:75, CompletableFutures 
> (com.datastax.oss.driver.internal.core.util.concurrent)
> close:551, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> access$1000:300, DefaultSession$SingleThreaded 
> (com.datastax.oss.driver.internal.core.session)
> lambda$closeAsync$1:272, DefaultSession 
> (com.datastax.oss.driver.internal.core.session)
> runTask:98, PromiseTask (io.netty.util.concurrent)
> run:106, PromiseTask (io.netty.util.concurrent)
> runTask$$$capture:174, AbstractEventExecutor (io.netty.util.concurrent)
> runTask:-1, AbstractEventExecutor (io.netty.util.concurrent)
>  - Async stack trace
> addTask:-1, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:836, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute0:827, SingleThreadEventExecutor (io.netty.util.concurrent)
> execute:817, SingleThreadEventExecutor (io.netty.util.concurrent)
> submit:118, AbstractExecutorService (java.util.concurrent)
> submit:118, AbstractEventExecutor (io.netty.util.concurrent)
> on:57, RunOrSchedule (com.datastax.oss.driver.internal.core.util.concurrent)
> closeSafely:286, DefaultSession 
> (com.datastax.oss.driver.internal.core.session)
> closeAsync:272, DefaultSession (com.datastax.oss.driver.internal.core.session)
> close:76, AsyncAutoCloseable (com.datastax.oss.driver.api.core)
> -- custom shutdown code
> run:829, Thread (java.lang)
> {noformat}
> the initial close here is called on 
> com.datastax.oss.driver.api.core.CqlSession.
> netty framework suggests to call
> io.netty.util.concurrent.GlobalEventExecutor#awaitInactivity
> during shutdown to await event thread stopping
> (slightly related issue in netty: 
> [https://github.com/netty/netty/issues/2084] )
> suggestion to add maybe GlobalEventExecutor.INSTANCE.awaitInactivity with 
> some timeout during close around here:
> [https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/internal/core/context/DefaultNettyOptions.java#L199]
> noting that this might slow down closing for up to 2 seconds if the netty 
> issue comment is correct.
> this is on latest datastax java driver version: 4.17,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19590) Unexpected error deserializing mutation when upgrade from 2.2.19 to 3.11.17

2024-04-25 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19590:
-
Resolution: Not A Problem
Status: Resolved  (was: Triage Needed)

> Unexpected error deserializing mutation when upgrade from 2.2.19 to 3.11.17
> ---
>
> Key: CASSANDRA-19590
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19590
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Klay
>Priority: Normal
> Attachments: data.tar.gz, system.log
>
>
> I am trying to upgrade from 2.2.19 to 3.11.17. I encountered the following 
> exception during the upgrade process and the 3.11.17 node cannot start up.
> {code:java}
> ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting 
> due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>  Unexpected error deserializing mutation; saved to 
> /tmp/mutation8318204837345269856dat.  This may be caused by replaying a 
> mutation against a table with the same name but incompatible schema.  
> Exception follows: java.lang.AssertionError
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170)
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331)
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791) 
> {code}
> h1. Reproduce
> This can be reproduced deterministically by 
> 1. Start up cassandra-2.2.19, singe node is enough (Using default 
> configuration)
> 2. Execute the following commands in cqlsh
> {code:java}
> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 1 };
> CREATE TABLE ks.tb (c1 INT,c2 TEXT, PRIMARY KEY (c1, c2));
> ALTER TABLE ks.tb ADD c0 INT ;
> INSERT INTO ks.tb (c0, c1, c2) VALUES (0,0,'RANDOM_STR');
> CREATE INDEX idx ON ks.tb (c2);
> ALTER TABLE ks.tb DROP c0 ;
> ALTER TABLE ks.tb ADD c0 set ; {code}
> 3. Stop the old version.
> {code:java}
> bin/nodetool -h :::127.0.0.1 flush
> bin/nodetool -h :::127.0.0.1 stopdaemon{code}
> 4. Copy the data and start up the new version
> Upgrade crashes with the following error
> {code:java}
> ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting 
> due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>  Unexpected error deserializing mutation; saved to 
> /tmp/mutation8318204837345269856dat.  This may be caused by replaying a 
> mutation against a table with the same name but incompatible schema.  
> Exception follows: java.lang.AssertionError
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170)
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331)
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791){code}
> I have attached the system.log when starting up the 3.11.17 node.
> I also attached the data folder generated from the 2.2.19, start up 3.0.30 or 
> 3.11.17 with this data folder can directly expose the error.
> h2. Upgrade from 2.2.19 to 3.0.30 will 

[jira] [Updated] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-25 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15439:
-
Status: Needs Committer  (was: Patch Available)

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Assignee: Raymond Huffman
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-25 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15439:
-
Test and Documentation Plan: run CI
 Status: Patch Available  (was: Open)

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Assignee: Raymond Huffman
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-25 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15439:
-
Reviewers: Brandon Williams

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Assignee: Raymond Huffman
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-25 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840984#comment-17840984
 ] 

Brandon Williams commented on CASSANDRA-15439:
--

4.0 and 4.1 are seeing CASSANDRA-18447, 5.0 hit CASSANDRA-18098, so this looks 
good. +1 from me.

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-25 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reassigned CASSANDRA-15439:


Assignee: Raymond Huffman

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Assignee: Raymond Huffman
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19590) Unexpected error deserializing mutation when upgrade from 2.2.19 to 3.11.17

2024-04-25 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840981#comment-17840981
 ] 

Brandon Williams commented on CASSANDRA-19590:
--

You should drain a node before upgrading so there are no commitlogs, they are 
not compatible between major versions.

> Unexpected error deserializing mutation when upgrade from 2.2.19 to 3.11.17
> ---
>
> Key: CASSANDRA-19590
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19590
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Klay
>Priority: Normal
> Attachments: data.tar.gz, system.log
>
>
> I am trying to upgrade from 2.2.19 to 3.11.17. I encountered the following 
> exception during the upgrade process and the 3.11.17 node cannot start up.
> {code:java}
> ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting 
> due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>  Unexpected error deserializing mutation; saved to 
> /tmp/mutation8318204837345269856dat.  This may be caused by replaying a 
> mutation against a table with the same name but incompatible schema.  
> Exception follows: java.lang.AssertionError
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170)
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331)
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791) 
> {code}
> h1. Reproduce
> This can be reproduced deterministically by 
> 1. Start up cassandra-2.2.19, singe node is enough (Using default 
> configuration)
> 2. Execute the following commands in cqlsh
> {code:java}
> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 1 };
> CREATE TABLE ks.tb (c1 INT,c2 TEXT, PRIMARY KEY (c1, c2));
> ALTER TABLE ks.tb ADD c0 INT ;
> INSERT INTO ks.tb (c0, c1, c2) VALUES (0,0,'RANDOM_STR');
> CREATE INDEX idx ON ks.tb (c2);
> ALTER TABLE ks.tb DROP c0 ;
> ALTER TABLE ks.tb ADD c0 set ; {code}
> 3. Stop the old version.
> {code:java}
> bin/nodetool -h :::127.0.0.1 flush
> bin/nodetool -h :::127.0.0.1 stopdaemon{code}
> 4. Copy the data and start up the new version
> Upgrade crashes with the following error
> {code:java}
> ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting 
> due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>  Unexpected error deserializing mutation; saved to 
> /tmp/mutation8318204837345269856dat.  This may be caused by replaying a 
> mutation against a table with the same name but incompatible schema.  
> Exception follows: java.lang.AssertionError
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170)
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331)
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791){code}
> I have attached the system.log when starting up the 3.11.17 node.
> I also attached the data folder generated from the 2.2.19, start up 3.0.30 or 
> 3.11.17 with this 

[jira] [Updated] (CASSANDRA-19591) MarshalException when migrate data from 2.2.19 to 3.11.17

2024-04-25 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19591:
-
Resolution: Duplicate
Status: Resolved  (was: Triage Needed)

> MarshalException when migrate data from 2.2.19 to 3.11.17
> -
>
> Key: CASSANDRA-19591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19591
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Klay
>Priority: Normal
> Attachments: data.tar.gz, system.log
>
>
> When migrate data from 2.2.19 to 3.11.17, I encountered the following 
> exception and the migration fails.
> {code:java}
> ERROR [main] 2024-04-25 19:41:22,996 JVMStabilityInspector.java:124 - Exiting 
> due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>  Unexpected error deserializing mutation; saved to 
> /tmp/mutation3085092904780349005dat.  This may be caused by replaying a 
> mutation against a table with the same name but incompatible schema.  
> Exception follows: org.apache.cassandra.serializers.MarshalException: 
> Expected 4 or 0 byte int (2)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170)
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331)
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791) 
> {code}
> h1. Reproduce
> 1. Start up single node cassandra-2.2.19 with default configuration and 
> execute the following commands
> {code:java}
> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 1 };
> CREATE TABLE ks.tb (c0 INT,c2 TEXT, PRIMARY KEY (c0));
> INSERT INTO ks.tb (c0, c2) VALUES (1,'BB');
> ALTER TABLE ks.tb DROP c2 ;
> ALTER TABLE ks.tb ADD c2 INT ; {code}
> 2. Stop the 2.2 node
> {code:java}
> bin/nodetool -h :::127.0.0.1 flush
> bin/nodetool -h :::127.0.0.1 stopdaemon; {code}
> 3. Copy the data to 3.11.17 folder and start up, it will expose the following 
> exception during the start up process. The node cannot start up.
> {code:java}
> ERROR [main] 2024-04-25 19:41:22,996 JVMStabilityInspector.java:124 - Exiting 
> due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>  Unexpected error deserializing mutation; saved to 
> /tmp/mutation3085092904780349005dat.  This may be caused by replaying a 
> mutation against a table with the same name but incompatible schema.  
> Exception follows: org.apache.cassandra.serializers.MarshalException: 
> Expected 4 or 0 byte int (2)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189)
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170)
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331)
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791) 
> {code}
> I have attached the system.log and data.tar.gz. (use 3.11.17 to start up with 
> this data can directly expose the error).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Commented] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-25 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840912#comment-17840912
 ] 

Brandon Williams commented on CASSANDRA-15439:
--

This applied with little effort to 4.0, and for 5.0 I changed the delay to 5 
minutes.

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-15439-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1603/workflows/17ff9ed1-4268-4946-b4a6-59d3b98282eb],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1603/workflows/e7a1de3b-1469-4d4e-97a0-2e675ed6a76c]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-15439-4.1]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/1604/workflows/a3085f33-0110-4a06-b1f5-af073149c387],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1604/workflows/5f011a3a-608d-4885-8f13-05c514efca26]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-15439-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1605/workflows/4bb67e4f-4dfe-4bbe-b596-2586bb9416f9],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1605/workflows/f08455aa-2d9c-4dac-8ae5-b6d69318228e]|


> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-25 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840851#comment-17840851
 ] 

Brandon Williams commented on CASSANDRA-15439:
--

I think this is fairly safe to do, but 5 minutes is quite a large increase from 
where it sits now.  I think that's okay for 5.0 (and larger values make sense 
for resumable bootstrap,) but for minor releases I think we should be more 
cautious, maybe 2x RING_DELAY?

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes

2024-04-25 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840776#comment-17840776
 ] 

Brandon Williams commented on CASSANDRA-19558:
--

Looks good, +1.

> Standalone jenkinsfile first round bug fixes
> 
>
> Key: CASSANDRA-19558
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19558
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments:  CASSANDRA-19558_50_#5_ci_summary.html,  
> CASSANDRA-19558_50_#5_results_details.tar.xz, 
> CASSANDRA-19558-5.0_#13_ci_summary.html, 
> CASSANDRA-19558-5.0_#13_results_details.tar.xz, 
> CASSANDRA-19558-5.0_#16_ci_summary.html, 
> CASSANDRA-19558-5.0_#16_results_details.tar.xz, 
> CASSANDRA-19558-50-33-ci_summary.html, 
> CASSANDRA-19558-50-33-results_details.tar.xz, 
> CASSANDRA-19558-trunk-11-ci_summary.html, CASSANDRA-19558_#8_ci_summary.html, 
> CASSANDRA-19558_#8_results_details.tar.xz
>
>
> A few follow up improvements and bug fixes for the standalone jenkinsfile.
> - add at top a list of test failures in ci_summary.html
> - docker scripts always try to login (as base images need to be pulled too)
> - move simulator-dtests to large containers (they need 8g just heap)
> - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct 
> perms (from marcuse)
> - persist the jenkinsfile parameters from run to run (important for the 
> post-commit jobs to keep their non-default branch and profile values) (was 
> CASSANDRA-19536)
> - increase jvm-dtest splits from 8 to 12
> - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile 
> generateTestReports() with manual wget of test files, allowing the summary 
> phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was 
> INFRA-25694)
> - copy ci_summary.html and results_details.tar.xz to nightlies



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19583) setting compaction throughput to 0 throws a startup error

2024-04-24 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840506#comment-17840506
 ] 

Brandon Williams commented on CASSANDRA-19583:
--

[~edimitrova] WDYT?

> setting compaction throughput to 0 throws a startup error
> -
>
> Key: CASSANDRA-19583
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19583
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jon Haddad
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> The inline docs say:
> {noformat}
> Setting this to 0 disables throttling.
> {noformat}
> However, on startup, we throw this error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted 
> units: MiB/s, KiB/s, B/s where case matters and only non-negative values a>
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames 
> omitted
> {noformat}
> We should allow 0 as per the inline doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19583) setting compaction throughput to 0 throws a startup error

2024-04-24 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19583:
-
 Bug Category: Parent values: Correctness(12982)
   Complexity: Low Hanging Fruit
Discovered By: User Report
Fix Version/s: 4.1.x
   5.0.x
   5.x
 Severity: Low
   Status: Open  (was: Triage Needed)

> setting compaction throughput to 0 throws a startup error
> -
>
> Key: CASSANDRA-19583
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19583
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jon Haddad
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> The inline docs say:
> {noformat}
> Setting this to 0 disables throttling.
> {noformat}
> However, on startup, we throw this error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted 
> units: MiB/s, KiB/s, B/s where case matters and only non-negative values a>
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames 
> omitted
> {noformat}
> We should allow 0 as per the inline doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19583) setting compaction throughput to 0 throws a startup error

2024-04-24 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840503#comment-17840503
 ] 

Brandon Williams commented on CASSANDRA-19583:
--

Ah, I see.  In that case I think we should probably just update the comment to 
include a unit.

> setting compaction throughput to 0 throws a startup error
> -
>
> Key: CASSANDRA-19583
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19583
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jon Haddad
>Priority: Normal
>
> The inline docs say:
> {noformat}
> Setting this to 0 disables throttling.
> {noformat}
> However, on startup, we throw this error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted 
> units: MiB/s, KiB/s, B/s where case matters and only non-negative values a>
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames 
> omitted
> {noformat}
> We should allow 0 as per the inline doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19583) setting compaction throughput to 0 throws a startup error

2024-04-24 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840494#comment-17840494
 ] 

Brandon Williams commented on CASSANDRA-19583:
--

Hmm, I tried 4.1, 5.0 and trunk, and wasn't able to reproduce on any of them. 
'nodetool setcompactionthroughput 0' was accepted and correctly respected in 
all versions.

> setting compaction throughput to 0 throws a startup error
> -
>
> Key: CASSANDRA-19583
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19583
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jon Haddad
>Priority: Normal
>
> The inline docs say:
> {noformat}
> Setting this to 0 disables throttling.
> {noformat}
> However, on startup, we throw this error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted 
> units: MiB/s, KiB/s, B/s where case matters and only non-negative values a>
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames 
> omitted
> {noformat}
> We should allow 0 as per the inline doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status

2024-04-23 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840267#comment-17840267
 ] 

Brandon Williams commented on CASSANDRA-19580:
--

Set compression to all so there are no special cases and test again.

> Unable to contact any seeds with node in hibernate status
> -
>
> Key: CASSANDRA-19580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19580
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cameron Zemek
>Priority: Normal
>
> We have customer running into the error 'Unable to contact any seeds!' . I 
> have been able to reproduce this issue if I kill Cassandra as its joining 
> which will put the node into hibernate status. Once a node is in hibernate it 
> will no longer receive any SYN messages from other nodes during startup and 
> as it sends only itself as digest in outbound SYN messages it never receives 
> any states in any of the ACK replies. So once it gets to the check 
> `seenAnySeed` in it fails as the endpointStateMap is empty.
>  
> A workaround is copying the system.peers table from other node but this is 
> less than ideal. I tested modifying maybeGossipToSeed as follows:
> {code:java}
>     /* Possibly gossip to a seed for facilitating partition healing */
>     private void maybeGossipToSeed(MessageOut prod)
>     {
>         int size = seeds.size();
>         if (size > 0)
>         {
>             if (size == 1 && 
> seeds.contains(FBUtilities.getBroadcastAddress()))
>             {
>                 return;
>             }
>             if (liveEndpoints.size() == 0)
>             {
>                 List gDigests = prod.payload.gDigests;
>                 if (gDigests.size() == 1 && 
> gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 {
>                     gDigests = new ArrayList();
>                     GossipDigestSyn digestSynMessage = new 
> GossipDigestSyn(DatabaseDescriptor.getClusterName(),
>                                                                            
> DatabaseDescriptor.getPartitionerName(),
>                                                                            
> gDigests);
>                     MessageOut message = new 
> MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN,
>                                                                               
>             digestSynMessage,
>                                                                               
>             GossipDigestSyn.serializer);
>                     sendGossip(message, seeds);
>                 }
>                 else
>                 {
>                     sendGossip(prod, seeds);
>                 }
>             }
>             else
>             {
>                 /* Gossip with the seed with some probability. */
>                 double probability = seeds.size() / (double) 
> (liveEndpoints.size() + unreachableEndpoints.size());
>                 double randDbl = random.nextDouble();
>                 if (randDbl <= probability)
>                     sendGossip(prod, seeds);
>             }
>         }
>     }
>  {code}
> Only problem is this is the same as SYN from shadow round. It does resolve 
> the issue however as then receive an ACK with all the states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19585) syntax formatting on CQL doc is garbled

2024-04-23 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19585:
-
 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> syntax formatting on CQL doc is garbled
> ---
>
> Key: CASSANDRA-19585
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19585
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation/Website
>Reporter: Jon Haddad
>Priority: Normal
> Attachments: image-2024-04-23-17-37-54-438.png
>
>
> It looks like the build process for the 4.1 docs isn't correctly processed.  
> Screenshot attached.
> https://cassandra.apache.org/doc/4.1/cassandra/cql/cql_singlefile.html#alterTableStmt
>  !image-2024-04-23-17-37-54-438.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19583) setting compaction throughput to 0 throws a startup error

2024-04-23 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840251#comment-17840251
 ] 

Brandon Williams commented on CASSANDRA-19583:
--

Which version was this?

> setting compaction throughput to 0 throws a startup error
> -
>
> Key: CASSANDRA-19583
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19583
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jon Haddad
>Priority: Normal
>
> The inline docs say:
> {noformat}
> Setting this to 0 disables throttling.
> {noformat}
> However, on startup, we throw this error:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted 
> units: MiB/s, KiB/s, B/s where case matters and only non-negative values a>
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: at 
> org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232)
> Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames 
> omitted
> {noformat}
> We should allow 0 as per the inline doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-23 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840228#comment-17840228
 ] 

Brandon Williams commented on CASSANDRA-15439:
--

With the failed bootstrap timeout separated out, we could take this opportunity 
to also increase it to give users some protection from the scenario you ran 
into by default, and also aid resumable bootstrap.  WDYT? /cc [~paulo]

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-23 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15439:
-
Fix Version/s: (was: 3.0.x)
   (was: 3.11.x)
   (was: 5.x)

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-04-23 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840171#comment-17840171
 ] 

Brandon Williams edited comment on CASSANDRA-19534 at 4/23/24 5:53 PM:
---

I think this all sounds good, though there may be a bit of a learning curve for 
users. Native request deadline is easy enough to understand, but things get a 
bit nuanced past that.

Regarding native_transport_timeout_in_ms:
bq. Default is 100 seconds, which is unreasonably high, but not unbounded. In 
practice, we should use at most 12 seconds.

Do you mean this currently exists at 100? If not, what is the rationale for 
that default?


was (Author: brandon.williams):
I think this all sounds good, though there may be a bit of a learning curve for 
users. Native request deadline is easy enough to understand, but things get a 
bit nuanced past that.

bq. Default is 100 seconds, which is unreasonably high, but not unbounded. In 
practice, we should use at most 12 seconds.

Do you mean this currently exists at 100? If not, what is the rationale for 
that default?

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg
>
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-04-23 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840171#comment-17840171
 ] 

Brandon Williams commented on CASSANDRA-19534:
--

I think this all sounds good, though there may be a bit of a learning curve for 
users. Native request deadline is easy enough to understand, but things get a 
bit nuanced past that.

bq. Default is 100 seconds, which is unreasonably high, but not unbounded. In 
practice, we should use at most 12 seconds.

Do you mean this currently exists at 100? If not, what is the rationale for 
that default?

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg
>
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839924#comment-17839924
 ] 

Brandon Williams commented on CASSANDRA-19580:
--

The purpose of hibernate is to have other nodes ignore the dead state, 
otherwise they will see the old node alive and just mark it back UP.

If you have internode_compression=dc then replacement with the same IP will not 
work, you need to use a different IP because the compression has already been 
negotiated on the other nodes.



> Unable to contact any seeds with node in hibernate status
> -
>
> Key: CASSANDRA-19580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19580
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cameron Zemek
>Priority: Normal
>
> We have customer running into the error 'Unable to contact any seeds!' . I 
> have been able to reproduce this issue if I kill Cassandra as its joining 
> which will put the node into hibernate status. Once a node is in hibernate it 
> will no longer receive any SYN messages from other nodes during startup and 
> as it sends only itself as digest in outbound SYN messages it never receives 
> any states in any of the ACK replies. So once it gets to the check 
> `seenAnySeed` in it fails as the endpointStateMap is empty.
>  
> A workaround is copying the system.peers table from other node but this is 
> less than ideal. I tested modifying maybeGossipToSeed as follows:
> {code:java}
>     /* Possibly gossip to a seed for facilitating partition healing */
>     private void maybeGossipToSeed(MessageOut prod)
>     {
>         int size = seeds.size();
>         if (size > 0)
>         {
>             if (size == 1 && 
> seeds.contains(FBUtilities.getBroadcastAddress()))
>             {
>                 return;
>             }
>             if (liveEndpoints.size() == 0)
>             {
>                 List gDigests = prod.payload.gDigests;
>                 if (gDigests.size() == 1 && 
> gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 {
>                     gDigests = new ArrayList();
>                     GossipDigestSyn digestSynMessage = new 
> GossipDigestSyn(DatabaseDescriptor.getClusterName(),
>                                                                            
> DatabaseDescriptor.getPartitionerName(),
>                                                                            
> gDigests);
>                     MessageOut message = new 
> MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN,
>                                                                               
>             digestSynMessage,
>                                                                               
>             GossipDigestSyn.serializer);
>                     sendGossip(message, seeds);
>                 }
>                 else
>                 {
>                     sendGossip(prod, seeds);
>                 }
>             }
>             else
>             {
>                 /* Gossip with the seed with some probability. */
>                 double probability = seeds.size() / (double) 
> (liveEndpoints.size() + unreachableEndpoints.size());
>                 double randDbl = random.nextDouble();
>                 if (randDbl <= probability)
>                     sendGossip(prod, seeds);
>             }
>         }
>     }
>  {code}
> Only problem is this is the same as SYN from shadow round. It does resolve 
> the issue however as then receive an ACK with all the states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839885#comment-17839885
 ] 

Brandon Williams commented on CASSANDRA-19580:
--

If internode compression is enabled, replacing with the same address won't work 
because the negotiated compression is cached. This is a limitation that we need 
to document.

> Unable to contact any seeds with node in hibernate status
> -
>
> Key: CASSANDRA-19580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19580
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cameron Zemek
>Priority: Normal
>
> We have customer running into the error 'Unable to contact any seeds!' . I 
> have been able to reproduce this issue if I kill Cassandra as its joining 
> which will put the node into hibernate status. Once a node is in hibernate it 
> will no longer receive any SYN messages from other nodes during startup and 
> as it sends only itself as digest in outbound SYN messages it never receives 
> any states in any of the ACK replies. So once it gets to the check 
> `seenAnySeed` in it fails as the endpointStateMap is empty.
>  
> A workaround is copying the system.peers table from other node but this is 
> less than ideal. I tested modifying maybeGossipToSeed as follows:
> {code:java}
>     /* Possibly gossip to a seed for facilitating partition healing */
>     private void maybeGossipToSeed(MessageOut prod)
>     {
>         int size = seeds.size();
>         if (size > 0)
>         {
>             if (size == 1 && 
> seeds.contains(FBUtilities.getBroadcastAddress()))
>             {
>                 return;
>             }
>             if (liveEndpoints.size() == 0)
>             {
>                 List gDigests = prod.payload.gDigests;
>                 if (gDigests.size() == 1 && 
> gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 {
>                     gDigests = new ArrayList();
>                     GossipDigestSyn digestSynMessage = new 
> GossipDigestSyn(DatabaseDescriptor.getClusterName(),
>                                                                            
> DatabaseDescriptor.getPartitionerName(),
>                                                                            
> gDigests);
>                     MessageOut message = new 
> MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN,
>                                                                               
>             digestSynMessage,
>                                                                               
>             GossipDigestSyn.serializer);
>                     sendGossip(message, seeds);
>                 }
>                 else
>                 {
>                     sendGossip(prod, seeds);
>                 }
>             }
>             else
>             {
>                 /* Gossip with the seed with some probability. */
>                 double probability = seeds.size() / (double) 
> (liveEndpoints.size() + unreachableEndpoints.size());
>                 double randDbl = random.nextDouble();
>                 if (randDbl <= probability)
>                     sendGossip(prod, seeds);
>             }
>         }
>     }
>  {code}
> Only problem is this is the same as SYN from shadow round. It does resolve 
> the issue however as then receive an ACK with all the states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839884#comment-17839884
 ] 

Brandon Williams edited comment on CASSANDRA-19580 at 4/22/24 11:05 PM:


Thanks, that was answered better than I asked. If you killed the replacement, 
which node is complaining about seeds?


was (Author: brandon.williams):
Thanks, that was answered better than I asked. If you killed the replacement, 
which nice is complaining about seeds?

> Unable to contact any seeds with node in hibernate status
> -
>
> Key: CASSANDRA-19580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19580
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cameron Zemek
>Priority: Normal
>
> We have customer running into the error 'Unable to contact any seeds!' . I 
> have been able to reproduce this issue if I kill Cassandra as its joining 
> which will put the node into hibernate status. Once a node is in hibernate it 
> will no longer receive any SYN messages from other nodes during startup and 
> as it sends only itself as digest in outbound SYN messages it never receives 
> any states in any of the ACK replies. So once it gets to the check 
> `seenAnySeed` in it fails as the endpointStateMap is empty.
>  
> A workaround is copying the system.peers table from other node but this is 
> less than ideal. I tested modifying maybeGossipToSeed as follows:
> {code:java}
>     /* Possibly gossip to a seed for facilitating partition healing */
>     private void maybeGossipToSeed(MessageOut prod)
>     {
>         int size = seeds.size();
>         if (size > 0)
>         {
>             if (size == 1 && 
> seeds.contains(FBUtilities.getBroadcastAddress()))
>             {
>                 return;
>             }
>             if (liveEndpoints.size() == 0)
>             {
>                 List gDigests = prod.payload.gDigests;
>                 if (gDigests.size() == 1 && 
> gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 {
>                     gDigests = new ArrayList();
>                     GossipDigestSyn digestSynMessage = new 
> GossipDigestSyn(DatabaseDescriptor.getClusterName(),
>                                                                            
> DatabaseDescriptor.getPartitionerName(),
>                                                                            
> gDigests);
>                     MessageOut message = new 
> MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN,
>                                                                               
>             digestSynMessage,
>                                                                               
>             GossipDigestSyn.serializer);
>                     sendGossip(message, seeds);
>                 }
>                 else
>                 {
>                     sendGossip(prod, seeds);
>                 }
>             }
>             else
>             {
>                 /* Gossip with the seed with some probability. */
>                 double probability = seeds.size() / (double) 
> (liveEndpoints.size() + unreachableEndpoints.size());
>                 double randDbl = random.nextDouble();
>                 if (randDbl <= probability)
>                     sendGossip(prod, seeds);
>             }
>         }
>     }
>  {code}
> Only problem is this is the same as SYN from shadow round. It does resolve 
> the issue however as then receive an ACK with all the states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839884#comment-17839884
 ] 

Brandon Williams commented on CASSANDRA-19580:
--

Thanks, that was answered better than I asked. If you killed the replacement, 
which nice is complaining about seeds?

> Unable to contact any seeds with node in hibernate status
> -
>
> Key: CASSANDRA-19580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19580
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cameron Zemek
>Priority: Normal
>
> We have customer running into the error 'Unable to contact any seeds!' . I 
> have been able to reproduce this issue if I kill Cassandra as its joining 
> which will put the node into hibernate status. Once a node is in hibernate it 
> will no longer receive any SYN messages from other nodes during startup and 
> as it sends only itself as digest in outbound SYN messages it never receives 
> any states in any of the ACK replies. So once it gets to the check 
> `seenAnySeed` in it fails as the endpointStateMap is empty.
>  
> A workaround is copying the system.peers table from other node but this is 
> less than ideal. I tested modifying maybeGossipToSeed as follows:
> {code:java}
>     /* Possibly gossip to a seed for facilitating partition healing */
>     private void maybeGossipToSeed(MessageOut prod)
>     {
>         int size = seeds.size();
>         if (size > 0)
>         {
>             if (size == 1 && 
> seeds.contains(FBUtilities.getBroadcastAddress()))
>             {
>                 return;
>             }
>             if (liveEndpoints.size() == 0)
>             {
>                 List gDigests = prod.payload.gDigests;
>                 if (gDigests.size() == 1 && 
> gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 {
>                     gDigests = new ArrayList();
>                     GossipDigestSyn digestSynMessage = new 
> GossipDigestSyn(DatabaseDescriptor.getClusterName(),
>                                                                            
> DatabaseDescriptor.getPartitionerName(),
>                                                                            
> gDigests);
>                     MessageOut message = new 
> MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN,
>                                                                               
>             digestSynMessage,
>                                                                               
>             GossipDigestSyn.serializer);
>                     sendGossip(message, seeds);
>                 }
>                 else
>                 {
>                     sendGossip(prod, seeds);
>                 }
>             }
>             else
>             {
>                 /* Gossip with the seed with some probability. */
>                 double probability = seeds.size() / (double) 
> (liveEndpoints.size() + unreachableEndpoints.size());
>                 double randDbl = random.nextDouble();
>                 if (randDbl <= probability)
>                     sendGossip(prod, seeds);
>             }
>         }
>     }
>  {code}
> Only problem is this is the same as SYN from shadow round. It does resolve 
> the issue however as then receive an ACK with all the states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839876#comment-17839876
 ] 

Brandon Williams commented on CASSANDRA-19580:
--

Is compression enabled on this cluster?

> Unable to contact any seeds with node in hibernate status
> -
>
> Key: CASSANDRA-19580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19580
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cameron Zemek
>Priority: Normal
>
> We have customer running into the error 'Unable to contact any seeds!' . I 
> have been able to reproduce this issue if I kill Cassandra as its joining 
> which will put the node into hibernate status. Once a node is in hibernate it 
> will no longer receive any SYN messages from other nodes during startup and 
> as it sends only itself as digest in outbound SYN messages it never receives 
> any states in any of the ACK replies. So once it gets to the check 
> `seenAnySeed` in it fails as the endpointStateMap is empty.
>  
> A workaround is copying the system.peers table from other node but this is 
> less than ideal. I tested modifying maybeGossipToSeed as follows:
> {code:java}
>     /* Possibly gossip to a seed for facilitating partition healing */
>     private void maybeGossipToSeed(MessageOut prod)
>     {
>         int size = seeds.size();
>         if (size > 0)
>         {
>             if (size == 1 && 
> seeds.contains(FBUtilities.getBroadcastAddress()))
>             {
>                 return;
>             }
>             if (liveEndpoints.size() == 0)
>             {
>                 List gDigests = prod.payload.gDigests;
>                 if (gDigests.size() == 1 && 
> gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 {
>                     gDigests = new ArrayList();
>                     GossipDigestSyn digestSynMessage = new 
> GossipDigestSyn(DatabaseDescriptor.getClusterName(),
>                                                                            
> DatabaseDescriptor.getPartitionerName(),
>                                                                            
> gDigests);
>                     MessageOut message = new 
> MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN,
>                                                                               
>             digestSynMessage,
>                                                                               
>             GossipDigestSyn.serializer);
>                     sendGossip(message, seeds);
>                 }
>                 else
>                 {
>                     sendGossip(prod, seeds);
>                 }
>             }
>             else
>             {
>                 /* Gossip with the seed with some probability. */
>                 double probability = seeds.size() / (double) 
> (liveEndpoints.size() + unreachableEndpoints.size());
>                 double randDbl = random.nextDouble();
>                 if (randDbl <= probability)
>                     sendGossip(prod, seeds);
>             }
>         }
>     }
>  {code}
> Only problem is this is the same as SYN from shadow round. It does resolve 
> the issue however as then receive an ACK with all the states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839866#comment-17839866
 ] 

Brandon Williams edited comment on CASSANDRA-15439 at 4/22/24 9:54 PM:
---

Determining if the node is bootstrapping is only part of the problem, we still 
have to evict a bootstrapping node that never comes back at some point.  What 
point that is for fat clients has always been equivalent to RING_DELAY, which 
is fine.  For bootstrapping nodes we can choose a new limit with a new 
parameter and allow an override with -D to accommodate those who need it 
longer, without the other drawbacks of increasing RING_DELAY.


was (Author: brandon.williams):
Determining if the node is bootstrapping is only part of the problem, we still 
have to evict a bootstrapping node that never comes back at some point.  What 
point that is for fat clients has always been equivalent to RING_DELAY, which 
is fine.  For bootstrapping nodes we can choose a new limit with a new 
parameter and allow an override with -D to accommodate those who need it longer.

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839866#comment-17839866
 ] 

Brandon Williams commented on CASSANDRA-15439:
--

Determining if the node is bootstrapping is only part of the problem, we still 
have to evict a bootstrapping node that never comes back at some point.  What 
point that is for fat clients has always been equivalent to RING_DELAY, which 
is fine.  For bootstrapping nodes we can choose a new limit with a new 
parameter and allow an override with -D to accommodate those who need it longer.

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19566) JSON encoded timestamp value does not always match non-JSON encoded value

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839859#comment-17839859
 ] 

Brandon Williams commented on CASSANDRA-19566:
--

If you dig into the 5.0 j11 failure it's a red herring; it's clean.  There 
seems to be an unrelated problem in test_parallel_upgrade on the upgrade tests 
but the rest have passed, I am +1 and will open a ticket for 
test_change_durable_writes (again.)

> JSON encoded timestamp value does not always match non-JSON encoded value
> -
>
> Key: CASSANDRA-19566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19566
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core, Legacy/CQL
>Reporter: Bowen Song
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Description:
> "SELECT JSON ..." and "toJson(...)" on Cassandra 4.1.4 produces different 
> date than "SELECT ..."  for some timestamp type values.
>  
> Steps to reproduce:
> {code:java}
> $ sudo docker pull cassandra:4.1.4
> $ sudo docker create --name cass cassandra:4.1.4
> $ sudo docker start cass
> $ # wait for the Cassandra instance becomes ready
> $ sudo docker exec -ti cass cqlsh
> Connected to Test Cluster at 127.0.0.1:9042
> [cqlsh 6.1.0 | Cassandra 4.1.4 | CQL spec 3.4.6 | Native protocol v5]
> Use HELP for help.
> cqlsh> create keyspace test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> use test;
> cqlsh:test> create table tbl (id int, ts timestamp, primary key (id));
> cqlsh:test> insert into tbl (id, ts) values (1, -1376701920);
> cqlsh:test> select tounixtimestamp(ts), ts, tojson(ts) from tbl where id=1;
>  system.tounixtimestamp(ts) | ts                              | 
> system.tojson(ts)
> +-+
>             -1376701920 | 1533-09-28 12:00:00.00+ | "1533-09-18 
> 12:00:00.000Z"
> (1 rows)
> cqlsh:test> select json * from tbl where id=1;
>  [json]
> -
>  {"id": 1, "ts": "1533-09-18 12:00:00.000Z"}
> (1 rows)
> {code}
>  
> Expected behaviour:
> The "select ts", "select tojson(ts)" and "select json *" should all produce 
> the same date.
>  
> Actual behaviour:
> The "select ts" produced the "1533-09-28" date but the "select tojson(ts)" 
> and "select json *" produced the "1533-09-18" date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19566) JSON encoded timestamp value does not always match non-JSON encoded value

2024-04-22 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19566:
-
Status: Ready to Commit  (was: Review In Progress)

> JSON encoded timestamp value does not always match non-JSON encoded value
> -
>
> Key: CASSANDRA-19566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19566
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core, Legacy/CQL
>Reporter: Bowen Song
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Description:
> "SELECT JSON ..." and "toJson(...)" on Cassandra 4.1.4 produces different 
> date than "SELECT ..."  for some timestamp type values.
>  
> Steps to reproduce:
> {code:java}
> $ sudo docker pull cassandra:4.1.4
> $ sudo docker create --name cass cassandra:4.1.4
> $ sudo docker start cass
> $ # wait for the Cassandra instance becomes ready
> $ sudo docker exec -ti cass cqlsh
> Connected to Test Cluster at 127.0.0.1:9042
> [cqlsh 6.1.0 | Cassandra 4.1.4 | CQL spec 3.4.6 | Native protocol v5]
> Use HELP for help.
> cqlsh> create keyspace test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> use test;
> cqlsh:test> create table tbl (id int, ts timestamp, primary key (id));
> cqlsh:test> insert into tbl (id, ts) values (1, -1376701920);
> cqlsh:test> select tounixtimestamp(ts), ts, tojson(ts) from tbl where id=1;
>  system.tounixtimestamp(ts) | ts                              | 
> system.tojson(ts)
> +-+
>             -1376701920 | 1533-09-28 12:00:00.00+ | "1533-09-18 
> 12:00:00.000Z"
> (1 rows)
> cqlsh:test> select json * from tbl where id=1;
>  [json]
> -
>  {"id": 1, "ts": "1533-09-18 12:00:00.000Z"}
> (1 rows)
> {code}
>  
> Expected behaviour:
> The "select ts", "select tojson(ts)" and "select json *" should all produce 
> the same date.
>  
> Actual behaviour:
> The "select ts" produced the "1533-09-28" date but the "select tojson(ts)" 
> and "select json *" produced the "1533-09-18" date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures

2024-04-22 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15439:
-
 Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable 
Corruption / Loss(12986)
   Complexity: Normal
  Component/s: Cluster/Membership
Discovered By: User Report
Fix Version/s: 3.0.x
   3.11.x
   4.0.x
   4.1.x
   5.0.x
   5.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Token metadata for bootstrapping nodes is lost under temporary failures
> ---
>
> Key: CASSANDRA-15439
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15439
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Josh Snyder
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the 
> bootstrapping node after RING_DELAY, since it will evicted from the TMD 
> pending ranges. Should we create a ticket to address this?"
> CASSANDRA-15264 relates to the most likely cause of such situations, where 
> the Cassandra daemon on the bootstrapping node completely crashes. Based on 
> testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it 
> also is possible to remove token metadata (and thus pending ranges, and thus 
> hints) for a bootstrapping node, simply by affecting its status in the 
> failure detector. 
> A node in the cluster sees the bootstrapping node this way:
> {noformat}
> INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node 
> /PUBLIC-IP is now part of the cluster
> INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - 
> InetAddress /PUBLIC-IP is now UP
> INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 
> OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 
> StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Creating new streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 
> StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, 
> ID#0] Received streaming plan for Bootstrap
> INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 
> StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 
> files(139744616815 bytes)
> INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - 
> InetAddress /PUBLIC-IP is now DOWN
> INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient 
> /PUBLIC-IP has been silent for 3ms, removing from gossip
> {noformat}
> Since the bootstrapping node has no tokens, it is treated like a fat client, 
> and it is removed from the ring. For correctness purposes, I believe we must 
> keep storing hints for the downed bootstrapping node until it is either 
> assassinated or until a replacement attempts to bootstrap for the same token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19566) JSON encoded timestamp value does not always match non-JSON encoded value

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839813#comment-17839813
 ] 

Brandon Williams commented on CASSANDRA-19566:
--

Running CI for 5.0, I typoed the branch name but it was already running so I've 
left it:

||Branch||CI||
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19556-5.0]|[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/1593/workflows/cc50210b-cb42-4529-be00-d017b41d9328],
 
[j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1593/workflows/2ac03a31-b5e8-4cd0-8209-b1b89779fc4e]|

[Here|https://app.circleci.com/pipelines/github/driftx/cassandra/1594/workflows/a1f7f945-a491-48b6-b166-4fd0917b2e96]
 are upgrade tests for 4.0

> JSON encoded timestamp value does not always match non-JSON encoded value
> -
>
> Key: CASSANDRA-19566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19566
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core, Legacy/CQL
>Reporter: Bowen Song
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Description:
> "SELECT JSON ..." and "toJson(...)" on Cassandra 4.1.4 produces different 
> date than "SELECT ..."  for some timestamp type values.
>  
> Steps to reproduce:
> {code:java}
> $ sudo docker pull cassandra:4.1.4
> $ sudo docker create --name cass cassandra:4.1.4
> $ sudo docker start cass
> $ # wait for the Cassandra instance becomes ready
> $ sudo docker exec -ti cass cqlsh
> Connected to Test Cluster at 127.0.0.1:9042
> [cqlsh 6.1.0 | Cassandra 4.1.4 | CQL spec 3.4.6 | Native protocol v5]
> Use HELP for help.
> cqlsh> create keyspace test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> use test;
> cqlsh:test> create table tbl (id int, ts timestamp, primary key (id));
> cqlsh:test> insert into tbl (id, ts) values (1, -1376701920);
> cqlsh:test> select tounixtimestamp(ts), ts, tojson(ts) from tbl where id=1;
>  system.tounixtimestamp(ts) | ts                              | 
> system.tojson(ts)
> +-+
>             -1376701920 | 1533-09-28 12:00:00.00+ | "1533-09-18 
> 12:00:00.000Z"
> (1 rows)
> cqlsh:test> select json * from tbl where id=1;
>  [json]
> -
>  {"id": 1, "ts": "1533-09-18 12:00:00.000Z"}
> (1 rows)
> {code}
>  
> Expected behaviour:
> The "select ts", "select tojson(ts)" and "select json *" should all produce 
> the same date.
>  
> Actual behaviour:
> The "select ts" produced the "1533-09-28" date but the "select tojson(ts)" 
> and "select json *" produced the "1533-09-18" date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19566) JSON encoded timestamp value does not always match non-JSON encoded value

2024-04-22 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19566:
-
Reviewers: Brandon Williams
   Status: Review In Progress  (was: Needs Committer)

> JSON encoded timestamp value does not always match non-JSON encoded value
> -
>
> Key: CASSANDRA-19566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19566
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core, Legacy/CQL
>Reporter: Bowen Song
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Description:
> "SELECT JSON ..." and "toJson(...)" on Cassandra 4.1.4 produces different 
> date than "SELECT ..."  for some timestamp type values.
>  
> Steps to reproduce:
> {code:java}
> $ sudo docker pull cassandra:4.1.4
> $ sudo docker create --name cass cassandra:4.1.4
> $ sudo docker start cass
> $ # wait for the Cassandra instance becomes ready
> $ sudo docker exec -ti cass cqlsh
> Connected to Test Cluster at 127.0.0.1:9042
> [cqlsh 6.1.0 | Cassandra 4.1.4 | CQL spec 3.4.6 | Native protocol v5]
> Use HELP for help.
> cqlsh> create keyspace test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> use test;
> cqlsh:test> create table tbl (id int, ts timestamp, primary key (id));
> cqlsh:test> insert into tbl (id, ts) values (1, -1376701920);
> cqlsh:test> select tounixtimestamp(ts), ts, tojson(ts) from tbl where id=1;
>  system.tounixtimestamp(ts) | ts                              | 
> system.tojson(ts)
> +-+
>             -1376701920 | 1533-09-28 12:00:00.00+ | "1533-09-18 
> 12:00:00.000Z"
> (1 rows)
> cqlsh:test> select json * from tbl where id=1;
>  [json]
> -
>  {"id": 1, "ts": "1533-09-18 12:00:00.000Z"}
> (1 rows)
> {code}
>  
> Expected behaviour:
> The "select ts", "select tojson(ts)" and "select json *" should all produce 
> the same date.
>  
> Actual behaviour:
> The "select ts" produced the "1533-09-28" date but the "select tojson(ts)" 
> and "select json *" produced the "1533-09-18" date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839796#comment-17839796
 ] 

Brandon Williams commented on CASSANDRA-17667:
--

Kind ping in case this has been forgotten.

> Text value containing "/*" interpreted as multiline comment in cqlsh
> 
>
> Key: CASSANDRA-17667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17667
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: ANOOP THOMAS
>Assignee: Brad Schoening
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> I use CQLSH command line utility to load some DDLs. The version of utility I 
> use is this:
> {noformat}
> [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol 
> v5]{noformat}
> Command that loads DDL.cql:
> {noformat}
> cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql
> {noformat}
> I have a line in CQL script that breaks the syntax.
> {noformat}
> INSERT into tablename (key,columnname1,columnname2) VALUES 
> ('keyName','value1','/value2/*/value3');{noformat}
> {{/*}} here is interpreted as start of multi-line comment. It used to work on 
> older versions of cqlsh. The error I see looks like this:
> {noformat}
> SyntaxException: line 4:2 mismatched input 'Update' expecting ')' 
> (...,'value1','/value2INSERT into tablename(INSERT into tablename 
> (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line 
> 1:0 no viable alternative at input '(' ([(]...)
> {noformat}
> Same behavior while running in interactive mode too. {{/*}} inside a CQL 
> statement should not be interpreted as start of multi-line comment.
> With schema:
> {code:java}
> CREATE TABLE tablename ( key text primary key, columnname1 text, columnname2 
> text);{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time

2024-04-22 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19448:
-
Status: Open  (was: Patch Available)

> CommitlogArchiver only has granularity to seconds for restore_point_in_time
> ---
>
> Key: CASSANDRA-19448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19448
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Jeremy Hanna
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Commitlog archiver allows users to backup commitlog files for the purpose of 
> doing point in time restores.  The [configuration 
> file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties]
>  gives an example of down to the seconds granularity but then asks what 
> whether the timestamps are microseconds or milliseconds - defaulting to 
> microseconds.  Because the [CommitLogArchiver uses a second based date 
> format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52],
>  if a user specifies to restore at something at a lower granularity like 
> milliseconds or microseconds, that means that the it will truncate everything 
> after the second and restore to that second.  So say you specify a 
> restore_point_in_time like this:
> restore_point_in_time=2024:01:18 17:01:01.623392
> it will silently truncate everything after the 01 seconds.  So effectively to 
> the user, it is missing updates between 01 and 01.623392.
> This appears to be a bug in the intent.  We should allow users to specify 
> down to the millisecond or even microsecond level. If we allow them to 
> specify down to microseconds for the restore point in time, then it may 
> internally need to change from a long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839672#comment-17839672
 ] 

Brandon Williams edited comment on CASSANDRA-19572 at 4/22/24 4:26 PM:
---

Yes, it is.  Those comments are to aid whoever takes this on, if the situation 
changes with regard to blocking something I will explicitly say so.


was (Author: brandon.williams):
Yes, it is.

> Test failure: org.apache.cassandra.db.ImportTest flakiness
> --
>
> Key: CASSANDRA-19572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/bulk load
>Reporter: Brandon Williams
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> As discovered on CASSANDRA-19401, the tests in this class are flaky, at least 
> the following:
>  * testImportCorruptWithoutValidationWithCopying
>  * testImportInvalidateCache
>  * testImportCorruptWithCopying
>  * testImportCacheEnabledWithoutSrcDir
>  * testImportInvalidateCache
> [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839772#comment-17839772
 ] 

Brandon Williams commented on CASSANDRA-19580:
--

Hibernate was added for node replacement, but it is also used if the node is 
told not to join the ring at startup.

bq.  I have been able to reproduce this issue if I kill Cassandra as its 
joining which will put the node into hibernate status. 

Can you expound upon this since it doesn't seem to meet either condition?

> Unable to contact any seeds with node in hibernate status
> -
>
> Key: CASSANDRA-19580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19580
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Cameron Zemek
>Priority: Normal
>
> We have customer running into the error 'Unable to contact any seeds!' . I 
> have been able to reproduce this issue if I kill Cassandra as its joining 
> which will put the node into hibernate status. Once a node is in hibernate it 
> will no longer receive any SYN messages from other nodes during startup and 
> as it sends only itself as digest in outbound SYN messages it never receives 
> any states in any of the ACK replies. So once it gets to the check 
> `seenAnySeed` in it fails as the endpointStateMap is empty.
>  
> A workaround is copying the system.peers table from other node but this is 
> less than ideal. I tested modifying maybeGossipToSeed as follows:
> {code:java}
>     /* Possibly gossip to a seed for facilitating partition healing */
>     private void maybeGossipToSeed(MessageOut prod)
>     {
>         int size = seeds.size();
>         if (size > 0)
>         {
>             if (size == 1 && 
> seeds.contains(FBUtilities.getBroadcastAddress()))
>             {
>                 return;
>             }
>             if (liveEndpoints.size() == 0)
>             {
>                 List gDigests = prod.payload.gDigests;
>                 if (gDigests.size() == 1 && 
> gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 {
>                     gDigests = new ArrayList();
>                     GossipDigestSyn digestSynMessage = new 
> GossipDigestSyn(DatabaseDescriptor.getClusterName(),
>                                                                            
> DatabaseDescriptor.getPartitionerName(),
>                                                                            
> gDigests);
>                     MessageOut message = new 
> MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN,
>                                                                               
>             digestSynMessage,
>                                                                               
>             GossipDigestSyn.serializer);
>                     sendGossip(message, seeds);
>                 }
>                 else
>                 {
>                     sendGossip(prod, seeds);
>                 }
>             }
>             else
>             {
>                 /* Gossip with the seed with some probability. */
>                 double probability = seeds.size() / (double) 
> (liveEndpoints.size() + unreachableEndpoints.size());
>                 double randDbl = random.nextDouble();
>                 if (randDbl <= probability)
>                     sendGossip(prod, seeds);
>             }
>         }
>     }
>  {code}
> Only problem is this is the same as SYN from shadow round. It does resolve 
> the issue however as then receive an ACK with all the states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19578) Concurrent equivalent schema updates lead to unresolved disagreement

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839767#comment-17839767
 ] 

Brandon Williams commented on CASSANDRA-19578:
--

Unsurprisingly testTransKsMigration also failed in the CI run, but that is the 
only one that needs to be addressed there.

> Concurrent equivalent schema updates lead to unresolved disagreement
> 
>
> Key: CASSANDRA-19578
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19578
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Chris Lohfink
>Priority: Normal
> Fix For: 4.1.5, 5.0-beta2
>
>
> As part of CASSANDRA-17819 a check for empty schema changes was added to the 
> updateSchema. This only looks at the _logical_ schema difference of the 
> schemas, but the changes made to the system_schema keyspace are the ones that 
> actually are involved in the digest.
> If two nodes issue the same CREATE statement the difference from the 
> keyspace.diff would be empty but the timestamps on the mutations would be 
> different, leading to a pseudo schema disagreement which will never resolve 
> until resetlocalschema or nodes being bounced.
> Only impacts 4.1
> test and fix : 
> https://github.com/clohfink/cassandra/commit/ba915f839089006ac6d08494ef19dc010bcd6411



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness

2024-04-22 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839672#comment-17839672
 ] 

Brandon Williams commented on CASSANDRA-19572:
--

Yes, it is.

> Test failure: org.apache.cassandra.db.ImportTest flakiness
> --
>
> Key: CASSANDRA-19572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/bulk load
>Reporter: Brandon Williams
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> As discovered on CASSANDRA-19401, the tests in this class are flaky, at least 
> the following:
>  * testImportCorruptWithoutValidationWithCopying
>  * testImportInvalidateCache
>  * testImportCorruptWithCopying
>  * testImportCacheEnabledWithoutSrcDir
>  * testImportInvalidateCache
> [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19578) Concurrent equivalent schema updates lead to unresolved disagreement

2024-04-20 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839280#comment-17839280
 ] 

Brandon Williams edited comment on CASSANDRA-19578 at 4/20/24 8:00 PM:
---

We'll also need to multiplex SchemaTest for flakiness, which I ran 
[here|https://app.circleci.com/pipelines/github/driftx/cassandra/1592/workflows/a84707b9-9ff8-4d49-af26-015a5e99d2b4/jobs/85318/tests]
 and is failing, which I verified, and learned that it is only failing when the 
suite is run, but testTransKsMigration passes in isolation.


was (Author: brandon.williams):
We'll also need to multiplex SchemaTest for flakiness, which I ram 
[here|https://app.circleci.com/pipelines/github/driftx/cassandra/1592/workflows/a84707b9-9ff8-4d49-af26-015a5e99d2b4/jobs/85318/tests]
 and is failing, which I verified, and learned that it is only failing when the 
suite is run, but testTransKsMigration passes in isolation.

> Concurrent equivalent schema updates lead to unresolved disagreement
> 
>
> Key: CASSANDRA-19578
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19578
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Chris Lohfink
>Priority: Normal
> Fix For: 4.1.5, 5.0-beta2
>
>
> As part of CASSANDRA-17819 a check for empty schema changes was added to the 
> updateSchema. This only looks at the _logical_ schema difference of the 
> schemas, but the changes made to the system_schema keyspace are the ones that 
> actually are involved in the digest.
> If two nodes issue the same CREATE statement the difference from the 
> keyspace.diff would be empty but the timestamps on the mutations would be 
> different, leading to a pseudo schema disagreement which will never resolve 
> until resetlocalschema or nodes being bounced.
> Only impacts 4.1
> test and fix : 
> https://github.com/clohfink/cassandra/commit/ba915f839089006ac6d08494ef19dc010bcd6411



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19578) Concurrent equivalent schema updates lead to unresolved disagreement

2024-04-20 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839280#comment-17839280
 ] 

Brandon Williams commented on CASSANDRA-19578:
--

We'll also need to multiplex SchemaTest for flakiness, which I ram 
[here|https://app.circleci.com/pipelines/github/driftx/cassandra/1592/workflows/a84707b9-9ff8-4d49-af26-015a5e99d2b4/jobs/85318/tests]
 and is failing, which I verified, and learned that it is only failing when the 
suite is run, but testTransKsMigration passes in isolation.

> Concurrent equivalent schema updates lead to unresolved disagreement
> 
>
> Key: CASSANDRA-19578
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19578
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Chris Lohfink
>Priority: Normal
> Fix For: 4.1.5, 5.0-beta2
>
>
> As part of CASSANDRA-17819 a check for empty schema changes was added to the 
> updateSchema. This only looks at the _logical_ schema difference of the 
> schemas, but the changes made to the system_schema keyspace are the ones that 
> actually are involved in the digest.
> If two nodes issue the same CREATE statement the difference from the 
> keyspace.diff would be empty but the timestamps on the mutations would be 
> different, leading to a pseudo schema disagreement which will never resolve 
> until resetlocalschema or nodes being bounced.
> Only impacts 4.1
> test and fix : 
> https://github.com/clohfink/cassandra/commit/ba915f839089006ac6d08494ef19dc010bcd6411



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19578) Concurrent equivalent schema updates lead to unresolved disagreement

2024-04-20 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839278#comment-17839278
 ] 

Brandon Williams commented on CASSANDRA-19578:
--

I've started a run: 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-before-5/2678/

> Concurrent equivalent schema updates lead to unresolved disagreement
> 
>
> Key: CASSANDRA-19578
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19578
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Chris Lohfink
>Priority: Normal
> Fix For: 4.1.5, 5.0-beta2
>
>
> As part of CASSANDRA-17819 a check for empty schema changes was added to the 
> updateSchema. This only looks at the _logical_ schema difference of the 
> schemas, but the changes made to the system_schema keyspace are the ones that 
> actually are involved in the digest.
> If two nodes issue the same CREATE statement the difference from the 
> keyspace.diff would be empty but the timestamps on the mutations would be 
> different, leading to a pseudo schema disagreement which will never resolve 
> until resetlocalschema or nodes being bounced.
> Only impacts 4.1
> test and fix : 
> https://github.com/clohfink/cassandra/commit/ba915f839089006ac6d08494ef19dc010bcd6411



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-20 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839262#comment-17839262
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

[Here|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-before-5/2677/]
 is CI for 4.1

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Thomas De Keulenaer
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: cassandra_57_debian_jdk11_amd64_attempt1.log.xz, 
> cassandra_57_redhat_jdk11_amd64_attempt1.log.xz, hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19575) StartupCheck error for read_ahead_kb

2024-04-19 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839022#comment-17839022
 ] 

Brandon Williams edited comment on CASSANDRA-19575 at 4/19/24 2:46 PM:
---

I think instead of trying to decompose the specified device name to a block 
device via regex, it would be better to list all the block devices and match 
against device names.


was (Author: brandon.williams):
I think instead of trying to decompose the specified device name to a block 
device, it would be better to list all the block devices and compare against 
device names.

> StartupCheck error for read_ahead_kb
> 
>
> Key: CASSANDRA-19575
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19575
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Morten Joenby
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> We believe the StartupChecks.java has a minor bug here:
> [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/service/StartupChecks.java#L737]
> {code:java}
> String deviceName = blockDirComponents[2].replaceAll("[0-9]*$", "");{code}
> We are using a RAID setup with two disks, so removing the "[0-9]" makes the 
> check fail:
> cat /sys/block/md/queue/read_ahead_kb
> cat: /sys/block/md/queue/read_ahead_kb: No such file or directory
> It should be "md0" in our case, so removing the '0' won't work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19575) StartupCheck error for read_ahead_kb

2024-04-19 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839022#comment-17839022
 ] 

Brandon Williams commented on CASSANDRA-19575:
--

I think instead of trying to decompose the specified device name to a block 
device, it would be better to list all the block devices and compare against 
device names.

> StartupCheck error for read_ahead_kb
> 
>
> Key: CASSANDRA-19575
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19575
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Morten Joenby
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> We believe the StartupChecks.java has a minor bug here:
> [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/service/StartupChecks.java#L737]
> {code:java}
> String deviceName = blockDirComponents[2].replaceAll("[0-9]*$", "");{code}
> We are using a RAID setup with two disks, so removing the "[0-9]" makes the 
> check fail:
> cat /sys/block/md/queue/read_ahead_kb
> cat: /sys/block/md/queue/read_ahead_kb: No such file or directory
> It should be "md0" in our case, so removing the '0' won't work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19575) StartupCheck error for read_ahead_kb

2024-04-19 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19575:
-
 Bug Category: Parent values: Degradation(12984)Level 1 values: Resource 
Management(12995)
   Complexity: Normal
Discovered By: User Report
Fix Version/s: 4.1.x
   5.0.x
   5.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> StartupCheck error for read_ahead_kb
> 
>
> Key: CASSANDRA-19575
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19575
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: Morten Joenby
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> We believe the StartupChecks.java has a minor bug here:
> [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/service/StartupChecks.java#L737]
> {code:java}
> String deviceName = blockDirComponents[2].replaceAll("[0-9]*$", "");{code}
> We are using a RAID setup with two disks, so removing the "[0-9]" makes the 
> check fail:
> cat /sys/block/md/queue/read_ahead_kb
> cat: /sys/block/md/queue/read_ahead_kb: No such file or directory
> It should be "md0" in our case, so removing the '0' won't work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-19 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19565:
-
Test and Documentation Plan: run packaging CI
 Status: Patch Available  (was: In Progress)

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Thomas De Keulenaer
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: cassandra_57_debian_jdk11_amd64_attempt1.log.xz, 
> cassandra_57_redhat_jdk11_amd64_attempt1.log.xz, hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-19 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838977#comment-17838977
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

bq.  Now I just need to figure out CI for packaging on 4.1.

This turned out to be problematic, but I merged into 5.0 
[here|https://github.com/driftx/cassandra/tree/CASSANDRA-19565-5.0] and ran the 
packaging build in private CI, then attached the logs from the package building 
here.  The package changes will be the same in all branches so this should be 
equivalent.

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Thomas De Keulenaer
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: cassandra_57_debian_jdk11_amd64_attempt1.log.xz, 
> cassandra_57_redhat_jdk11_amd64_attempt1.log.xz, hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-19 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19565:
-
Attachment: cassandra_57_redhat_jdk11_amd64_attempt1.log.xz
cassandra_57_debian_jdk11_amd64_attempt1.log.xz

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Thomas De Keulenaer
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: cassandra_57_debian_jdk11_amd64_attempt1.log.xz, 
> cassandra_57_redhat_jdk11_amd64_attempt1.log.xz, hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19566) JSON encoded timestamp value does not always match non-JSON encoded value

2024-04-19 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19566:
-
Fix Version/s: 4.0.x
   4.1.x

> JSON encoded timestamp value does not always match non-JSON encoded value
> -
>
> Key: CASSANDRA-19566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19566
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core, Legacy/CQL
>Reporter: Bowen Song
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Description:
> "SELECT JSON ..." and "toJson(...)" on Cassandra 4.1.4 produces different 
> date than "SELECT ..."  for some timestamp type values.
>  
> Steps to reproduce:
> {code:java}
> $ sudo docker pull cassandra:4.1.4
> $ sudo docker create --name cass cassandra:4.1.4
> $ sudo docker start cass
> $ # wait for the Cassandra instance becomes ready
> $ sudo docker exec -ti cass cqlsh
> Connected to Test Cluster at 127.0.0.1:9042
> [cqlsh 6.1.0 | Cassandra 4.1.4 | CQL spec 3.4.6 | Native protocol v5]
> Use HELP for help.
> cqlsh> create keyspace test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> use test;
> cqlsh:test> create table tbl (id int, ts timestamp, primary key (id));
> cqlsh:test> insert into tbl (id, ts) values (1, -1376701920);
> cqlsh:test> select tounixtimestamp(ts), ts, tojson(ts) from tbl where id=1;
>  system.tounixtimestamp(ts) | ts                              | 
> system.tojson(ts)
> +-+
>             -1376701920 | 1533-09-28 12:00:00.00+ | "1533-09-18 
> 12:00:00.000Z"
> (1 rows)
> cqlsh:test> select json * from tbl where id=1;
>  [json]
> -
>  {"id": 1, "ts": "1533-09-18 12:00:00.000Z"}
> (1 rows)
> {code}
>  
> Expected behaviour:
> The "select ts", "select tojson(ts)" and "select json *" should all produce 
> the same date.
>  
> Actual behaviour:
> The "select ts" produced the "1533-09-28" date but the "select tojson(ts)" 
> and "select json *" produced the "1533-09-18" date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19566) JSON encoded timestamp value does not always match non-JSON encoded value

2024-04-19 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838964#comment-17838964
 ] 

Brandon Williams commented on CASSANDRA-19566:
--

Jenkins won't do < 5.0 well right now, but I can facilitate upgrade tests if 
you make the branches.

> JSON encoded timestamp value does not always match non-JSON encoded value
> -
>
> Key: CASSANDRA-19566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19566
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core, Legacy/CQL
>Reporter: Bowen Song
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Description:
> "SELECT JSON ..." and "toJson(...)" on Cassandra 4.1.4 produces different 
> date than "SELECT ..."  for some timestamp type values.
>  
> Steps to reproduce:
> {code:java}
> $ sudo docker pull cassandra:4.1.4
> $ sudo docker create --name cass cassandra:4.1.4
> $ sudo docker start cass
> $ # wait for the Cassandra instance becomes ready
> $ sudo docker exec -ti cass cqlsh
> Connected to Test Cluster at 127.0.0.1:9042
> [cqlsh 6.1.0 | Cassandra 4.1.4 | CQL spec 3.4.6 | Native protocol v5]
> Use HELP for help.
> cqlsh> create keyspace test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> use test;
> cqlsh:test> create table tbl (id int, ts timestamp, primary key (id));
> cqlsh:test> insert into tbl (id, ts) values (1, -1376701920);
> cqlsh:test> select tounixtimestamp(ts), ts, tojson(ts) from tbl where id=1;
>  system.tounixtimestamp(ts) | ts                              | 
> system.tojson(ts)
> +-+
>             -1376701920 | 1533-09-28 12:00:00.00+ | "1533-09-18 
> 12:00:00.000Z"
> (1 rows)
> cqlsh:test> select json * from tbl where id=1;
>  [json]
> -
>  {"id": 1, "ts": "1533-09-18 12:00:00.000Z"}
> (1 rows)
> {code}
>  
> Expected behaviour:
> The "select ts", "select tojson(ts)" and "select json *" should all produce 
> the same date.
>  
> Actual behaviour:
> The "select ts" produced the "1533-09-28" date but the "select tojson(ts)" 
> and "select json *" produced the "1533-09-18" date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19573) incorrect queries showing up in queries virtual table

2024-04-19 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19573:
-
 Bug Category: Parent values: Correctness(12982)Level 1 values: Transient 
Incorrect Response(12987)
   Complexity: Normal
Discovered By: User Report
Fix Version/s: 4.1.x
   5.0.x
   5.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> incorrect queries showing up in queries virtual table
> -
>
> Key: CASSANDRA-19573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19573
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Virtual Tables
>Reporter: Jon Haddad
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> While running an easy-cass-stress workload I queried system_views.queries and 
> got this (edited for sanity):
>  
> {noformat}
> thread_id       | queued_micros | running_micros | task
> -+---++
>  MutationStage-2 |             0 |              0 | 
> Mutation(keyspace='easy_cass_stress', key='3030312e302e3937363535', 
> modifications=[\n  [easy_cass_stress.random_access] key=001.0.97655 
> partition_deletion=deletedAt=-9223372036854775808, localDeletion=2147483647 
> columns=[[] | [value]]\n    Row[info=[ts=1713501354789139] ]: row_id=291 | 
> [value=SXFNPWZDYFHTUSBWMUQCTTRAHQWXMGYOHASTGDFYLILWMOSFQWZGKUAIPUUGCLTADKFFXZRQGKIJJLXNOQKMAIOVSSVMVSFSFAVPABIIHGQSGRPACFWCKYMZMSNZZARSBFVDASTMCRHAVAYHKQDZWFCHRUPDWZJVTEVIWKPMKLAOZGBUDFJVOPSAHLAIWOGNXZHCBVK
>  ts=1713501354789139]\n])
>      ReadStage-4 |             0 |           6216 |  SELECT * FROM 
> easy_cass_stress.random_access WHERE partition_id = '001.0.18474' AND row_id 
> = 746 LIMIT 5000 ALLOW FILTERING
> {noformat}
> What's interesting is that I supply neither a LIMIT or ALLOW FILTERING when I 
> prepared the query.  I assume the limit is coming from the driver, and while 
> it's technically correct from the standpoint of what it does, it's not what I 
> prepared so it's a little weird to see it there.
> The ALLOW FILTERING, on the other hand, was definitely not prepared.
> {noformat}
>  session.prepare("SELECT * from random_access WHERE partition_id = ? and 
> row_id = ?")
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-04-19 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838957#comment-17838957
 ] 

Brandon Williams commented on CASSANDRA-19534:
--

FWIW, that was a single node.

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness

2024-04-18 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838805#comment-17838805
 ] 

Brandon Williams commented on CASSANDRA-19572:
--

I haven't been able to repro these - I have a suspicion they may only occur in 
circle, and only when run in the suite since I didn't repro a specific test in 
5k attempts 
[here|https://app.circleci.com/pipelines/github/driftx/cassandra/1586/workflows/2f172253-80a8-4c5f-b299-174067f0c4c6/jobs/84768/tests]
 or 
[here|https://app.circleci.com/pipelines/github/driftx/cassandra/1589/workflows/868187b9-32e5-49ce-98a4-db4b9c48d148/jobs/85046/tests].

> Test failure: org.apache.cassandra.db.ImportTest flakiness
> --
>
> Key: CASSANDRA-19572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/bulk load
>Reporter: Brandon Williams
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> As discovered on CASSANDRA-19401, the tests in this class are flaky, at least 
> the following:
>  * testImportCorruptWithoutValidationWithCopying
>  * testImportInvalidateCache
>  * testImportCorruptWithCopying
>  * testImportCacheEnabledWithoutSrcDir
>  * testImportInvalidateCache
> [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19557) ShallowIndexedEntry scenario: the same IndexInfo is read multiple times, per every read row

2024-04-18 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838781#comment-17838781
 ] 

Brandon Williams commented on CASSANDRA-19557:
--

I agree, none of the failures look related to this patch (and if it did break 
something, I'd expect to see a lot more.)  Do you know if this is a problem in 
5.0 after CASSANDRA-17056? This patch won't apply to that branch after that 
ticket.

> ShallowIndexedEntry scenario: the same IndexInfo is read multiple times, per 
> every read row
> ---
>
> Key: CASSANDRA-19557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dmitry Konstantinov
>Priority: Normal
> Attachments: 19557-4.1.patch
>
>
> When we read rows from a large partition stored in an SSTable and 
> ShallowIndexedEntry is used - the same IndexInfo entity is read from disk 
> multiple times, it happens per every read row.
> The following stacktrace shows the execution path:
> {code:java}
> at 
> org.apache.cassandra.db.RowIndexEntry$ShallowInfoRetriever.fetchIndex(RowIndexEntry.java:742)
> at 
> org.apache.cassandra.db.RowIndexEntry$FileIndexInfoRetriever.columnsIndex(RowIndexEntry.java:792)
> at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.index(AbstractSSTableIterator.java:528)
>  
> at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.currentIndex(AbstractSSTableIterator.java:523)
> at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.isPastCurrentBlock(AbstractSSTableIterator.java:513)
> at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState.updateBlock(AbstractSSTableIterator.java:487)
>  <=== here we retrieve the current index entry
> at 
> org.apache.cassandra.db.columniterator.SSTableIterator$ForwardIndexedReader.computeNext(SSTableIterator.java:290)
>  <== here we iterate over rows
> at 
> org.apache.cassandra.db.columniterator.SSTableIterator$ForwardReader.hasNextInternal(SSTableIterator.java:182)
> at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:342)
> at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:224)
> at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
> at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:110)
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48)
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
> at 
> org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:74)
> at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:76)
> at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:27)
> at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:97)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:303)
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:191)
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:181)
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:177)
> at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
> at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:308)
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1991)
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2277)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:829)
> {code}
> This Cassandra logic was originally written for the case when there is a 
> 

[jira] [Commented] (CASSANDRA-19401) Nodetool import expects directory structure

2024-04-18 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838772#comment-17838772
 ] 

Brandon Williams commented on CASSANDRA-19401:
--

No, we have to fix those tests first.

> Nodetool import expects directory structure
> ---
>
> Key: CASSANDRA-19401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Norbert Schultz
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> According to the 
> [documentation|https://cassandra.apache.org/doc/4.1/cassandra/operating/bulk_loading.html]
>  the nodetool import should not rely on the folder structure of the imported 
> sst files:
> {quote}
> Because the keyspace and table are specified on the command line for nodetool 
> import, there is not the same requirement as with sstableloader, to have the 
> SSTables in a specific directory path. When importing snapshots or 
> incremental backups with nodetool import, the SSTables don’t need to be 
> copied to another directory.
> {quote}
> However when importing old cassandra snapshots, we figured out, that sstables 
> still need to be in a directory called like $KEYSPACE/$TABLENAME files, even 
> when keyspace and table name are already present as parameters for the 
> nodetool import call.
> Call we used:
> {code}
> nodetool import --copy-data mykeyspace mytable /full_path_to/test1
> {code}
> Log:
> {code}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,565 
> SSTableImporter.java:72 - Loading new SSTables for mykeyspace/mytable: 
> Options{srcPaths='[/full_path_to/test1]', resetLevel=true, 
> clearRepaired=true, verifySSTables=true, verifyTokens=true, 
> invalidateCaches=true, extendedVerify=false, copyData= true}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,566 
> SSTableImporter.java:173 - No new SSTables were found for mykeyspace/mytable
> {code}
> However, when we move the sstables (.db-Files) to 
> {{alternative/mykeyspace/mytable}}
> and import with
> {code}
> nodetool import --copy-data mykeyspace mytable 
> /fullpath/alternative/mykeyspace/mytable
> {code}
> the import works
> {code}
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:177 - Loading new SSTables and building secondary 
> indexes for mykeyspace/mytable: 
> [BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-2-big-Data.db'),
>  
> BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-1-big-Data.db')]
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:190 - Done loading load new SSTables for 
> mykeyspace/mytable
> {code}
> We experienced this in Cassandra 4.1.3 on Java 11 (Linux)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness

2024-04-18 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19572:
-
Description: 
As discovered on CASSANDRA-19401, the tests in this class are flaky, at least 
the following:
 * testImportCorruptWithoutValidationWithCopying
 * testImportInvalidateCache
 * testImportCorruptWithCopying
 * testImportCacheEnabledWithoutSrcDir
 * testImportInvalidateCache

https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests

  was:
As discovered on CASSANDRA-1940, the tests in this class are flaky, at least 
the following:
 * testImportCorruptWithoutValidationWithCopying
 * testImportInvalidateCache
 * testImportCorruptWithCopying
 * testImportCacheEnabledWithoutSrcDir
 * testImportInvalidateCache

https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests


> Test failure: org.apache.cassandra.db.ImportTest flakiness
> --
>
> Key: CASSANDRA-19572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/bulk load
>Reporter: Brandon Williams
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> As discovered on CASSANDRA-19401, the tests in this class are flaky, at least 
> the following:
>  * testImportCorruptWithoutValidationWithCopying
>  * testImportInvalidateCache
>  * testImportCorruptWithCopying
>  * testImportCacheEnabledWithoutSrcDir
>  * testImportInvalidateCache
> https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness

2024-04-18 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19572:
-
 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
  Component/s: Tool/bulk load
Discovered By: User Report
Fix Version/s: 4.0.x
   4.1.x
   5.0.x
   5.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Test failure: org.apache.cassandra.db.ImportTest flakiness
> --
>
> Key: CASSANDRA-19572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/bulk load
>Reporter: Brandon Williams
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> As discovered on CASSANDRA-1940, the tests in this class are flaky, at least 
> the following:
>  * testImportCorruptWithoutValidationWithCopying
>  * testImportInvalidateCache
>  * testImportCorruptWithCopying
>  * testImportCacheEnabledWithoutSrcDir
>  * testImportInvalidateCache
> https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness

2024-04-18 Thread Brandon Williams (Jira)
Brandon Williams created CASSANDRA-19572:


 Summary: Test failure: org.apache.cassandra.db.ImportTest flakiness
 Key: CASSANDRA-19572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19572
 Project: Cassandra
  Issue Type: Bug
Reporter: Brandon Williams


As discovered on CASSANDRA-1940, the tests in this class are flaky, at least 
the following:
 * testImportCorruptWithoutValidationWithCopying
 * testImportInvalidateCache
 * testImportCorruptWithCopying
 * testImportCacheEnabledWithoutSrcDir
 * testImportInvalidateCache

https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19401) Nodetool import expects directory structure

2024-04-18 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19401:
-
Status: Changes Suggested  (was: Review In Progress)

> Nodetool import expects directory structure
> ---
>
> Key: CASSANDRA-19401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Norbert Schultz
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> According to the 
> [documentation|https://cassandra.apache.org/doc/4.1/cassandra/operating/bulk_loading.html]
>  the nodetool import should not rely on the folder structure of the imported 
> sst files:
> {quote}
> Because the keyspace and table are specified on the command line for nodetool 
> import, there is not the same requirement as with sstableloader, to have the 
> SSTables in a specific directory path. When importing snapshots or 
> incremental backups with nodetool import, the SSTables don’t need to be 
> copied to another directory.
> {quote}
> However when importing old cassandra snapshots, we figured out, that sstables 
> still need to be in a directory called like $KEYSPACE/$TABLENAME files, even 
> when keyspace and table name are already present as parameters for the 
> nodetool import call.
> Call we used:
> {code}
> nodetool import --copy-data mykeyspace mytable /full_path_to/test1
> {code}
> Log:
> {code}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,565 
> SSTableImporter.java:72 - Loading new SSTables for mykeyspace/mytable: 
> Options{srcPaths='[/full_path_to/test1]', resetLevel=true, 
> clearRepaired=true, verifySSTables=true, verifyTokens=true, 
> invalidateCaches=true, extendedVerify=false, copyData= true}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,566 
> SSTableImporter.java:173 - No new SSTables were found for mykeyspace/mytable
> {code}
> However, when we move the sstables (.db-Files) to 
> {{alternative/mykeyspace/mytable}}
> and import with
> {code}
> nodetool import --copy-data mykeyspace mytable 
> /fullpath/alternative/mykeyspace/mytable
> {code}
> the import works
> {code}
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:177 - Loading new SSTables and building secondary 
> indexes for mykeyspace/mytable: 
> [BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-2-big-Data.db'),
>  
> BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-1-big-Data.db')]
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:190 - Done loading load new SSTables for 
> mykeyspace/mytable
> {code}
> We experienced this in Cassandra 4.1.3 on Java 11 (Linux)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19401) Nodetool import expects directory structure

2024-04-18 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838627#comment-17838627
 ] 

Brandon Williams commented on CASSANDRA-19401:
--

The flakiness in the ImportTests seems to be caused by this patch, as without 
it 5.0 can pass 5k times 
[here|https://app.circleci.com/pipelines/github/driftx/cassandra/1586/workflows/2f172253-80a8-4c5f-b299-174067f0c4c6/jobs/84768/tests]
 and 
[here|https://app.circleci.com/pipelines/github/driftx/cassandra/1589/workflows/868187b9-32e5-49ce-98a4-db4b9c48d148/jobs/85046/tests].

> Nodetool import expects directory structure
> ---
>
> Key: CASSANDRA-19401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Norbert Schultz
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> According to the 
> [documentation|https://cassandra.apache.org/doc/4.1/cassandra/operating/bulk_loading.html]
>  the nodetool import should not rely on the folder structure of the imported 
> sst files:
> {quote}
> Because the keyspace and table are specified on the command line for nodetool 
> import, there is not the same requirement as with sstableloader, to have the 
> SSTables in a specific directory path. When importing snapshots or 
> incremental backups with nodetool import, the SSTables don’t need to be 
> copied to another directory.
> {quote}
> However when importing old cassandra snapshots, we figured out, that sstables 
> still need to be in a directory called like $KEYSPACE/$TABLENAME files, even 
> when keyspace and table name are already present as parameters for the 
> nodetool import call.
> Call we used:
> {code}
> nodetool import --copy-data mykeyspace mytable /full_path_to/test1
> {code}
> Log:
> {code}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,565 
> SSTableImporter.java:72 - Loading new SSTables for mykeyspace/mytable: 
> Options{srcPaths='[/full_path_to/test1]', resetLevel=true, 
> clearRepaired=true, verifySSTables=true, verifyTokens=true, 
> invalidateCaches=true, extendedVerify=false, copyData= true}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,566 
> SSTableImporter.java:173 - No new SSTables were found for mykeyspace/mytable
> {code}
> However, when we move the sstables (.db-Files) to 
> {{alternative/mykeyspace/mytable}}
> and import with
> {code}
> nodetool import --copy-data mykeyspace mytable 
> /fullpath/alternative/mykeyspace/mytable
> {code}
> the import works
> {code}
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:177 - Loading new SSTables and building secondary 
> indexes for mykeyspace/mytable: 
> [BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-2-big-Data.db'),
>  
> BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-1-big-Data.db')]
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:190 - Done loading load new SSTables for 
> mykeyspace/mytable
> {code}
> We experienced this in Cassandra 4.1.3 on Java 11 (Linux)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19401) Nodetool import expects directory structure

2024-04-18 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19401:
-
Status: Review In Progress  (was: Needs Committer)

> Nodetool import expects directory structure
> ---
>
> Key: CASSANDRA-19401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Norbert Schultz
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> According to the 
> [documentation|https://cassandra.apache.org/doc/4.1/cassandra/operating/bulk_loading.html]
>  the nodetool import should not rely on the folder structure of the imported 
> sst files:
> {quote}
> Because the keyspace and table are specified on the command line for nodetool 
> import, there is not the same requirement as with sstableloader, to have the 
> SSTables in a specific directory path. When importing snapshots or 
> incremental backups with nodetool import, the SSTables don’t need to be 
> copied to another directory.
> {quote}
> However when importing old cassandra snapshots, we figured out, that sstables 
> still need to be in a directory called like $KEYSPACE/$TABLENAME files, even 
> when keyspace and table name are already present as parameters for the 
> nodetool import call.
> Call we used:
> {code}
> nodetool import --copy-data mykeyspace mytable /full_path_to/test1
> {code}
> Log:
> {code}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,565 
> SSTableImporter.java:72 - Loading new SSTables for mykeyspace/mytable: 
> Options{srcPaths='[/full_path_to/test1]', resetLevel=true, 
> clearRepaired=true, verifySSTables=true, verifyTokens=true, 
> invalidateCaches=true, extendedVerify=false, copyData= true}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,566 
> SSTableImporter.java:173 - No new SSTables were found for mykeyspace/mytable
> {code}
> However, when we move the sstables (.db-Files) to 
> {{alternative/mykeyspace/mytable}}
> and import with
> {code}
> nodetool import --copy-data mykeyspace mytable 
> /fullpath/alternative/mykeyspace/mytable
> {code}
> the import works
> {code}
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:177 - Loading new SSTables and building secondary 
> indexes for mykeyspace/mytable: 
> [BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-2-big-Data.db'),
>  
> BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-1-big-Data.db')]
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:190 - Done loading load new SSTables for 
> mykeyspace/mytable
> {code}
> We experienced this in Cassandra 4.1.3 on Java 11 (Linux)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19570) Allow setting LIBFFI_TMPDIR

2024-04-18 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19570:
-
Change Category: Operability
 Complexity: Low Hanging Fruit
Component/s: Build
  Fix Version/s: 5.0.x
 5.x
 Status: Open  (was: Triage Needed)

> Allow setting LIBFFI_TMPDIR
> ---
>
> Key: CASSANDRA-19570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19570
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Thomas De Keulenaer
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> Since JNA >= 5.10.0 you can specify LIBFFI_TMPDIR because the vendored libffi 
> was upgraded.
> I think it is best to set this to the same value as jna.tmpdir. Perhaps in 
> cassandra-env.sh?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-18 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838617#comment-17838617
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

bq. Also, on Cassandra 5.x with JNA >= 5.10.0 you can specify LIBFFI_TMPDIR.

Can you open a new ticket for this?

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Thomas De Keulenaer
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19569) sstableupgrade is very slow

2024-04-18 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19569:
-
 Bug Category: Parent values: Degradation(12984)Level 1 values: Performance 
Bug/Regression(12997)
   Complexity: Normal
  Component/s: Local/Compaction
Discovered By: User Report
Fix Version/s: 4.1.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> sstableupgrade is very slow
> ---
>
> Key: CASSANDRA-19569
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19569
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction
>Reporter: Norbert Schultz
>Priority: Normal
> Fix For: 4.1.x
>
> Attachments: flamegraph_ok.png, flamegraph_sstableupgrade.png
>
>
> We are in the process of migrating cassandra from 3.11.x to 4.1.4 and 
> upgrading the sstables using sstableupgrade from Cassandra V4.1.4, from `me-` 
> to `nb-` Format
> Unfortunately, the process is very very slow (less than 0.5 MB/s).
> Some observations:
> - The process is only slow on (fast) SSDs, but not on ram disks.
> - The sstables consist of many partitions (this may be unrelated)
> - The upgrade process is fast, if we use `automatic_sstable_upgrade` instead 
> of the sstableupgradetool.
> - We give enough RAM (export MAX_HEAP_SIZE=8g)
> On profiling, we found out, that sstableupgrade is burning most CPU time on 
> {{posix_fadvise}} (see flamegraph_sstableupgrade.png ).
> My naive interpretation of the whole {{maybeReopenEarly}} to 
> {{posix_fadvise}} chain is, that the process just informs the linux kernel, 
> that the written data should not be cached. If we comment out the call to 
> {{NativeLibrary.trySkipCache}}, the conversion is running at expected 10MB/s 
> (see flamegraph_ok.png )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838391#comment-17838391
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

I have a patch 
[here|https://github.com/driftx/cassandra/tree/CASSANDRA-19565-4.1] that will 
set the ownership as follows:

{noformat}
drwxr-xr-x 6 cassandra cassandra  68 Apr 17 22:03 .
drwxr-xr-x 1 root  root  131 Apr 17 22:03 ..
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 21:48 commitlog
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 21:48 data
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 21:48 hints
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 21:48 saved_caches
{noformat}

Debian packaging already creates /var/lib/cassandra with the correct ownership. 
 Now I just need to figure out CI for packaging on 4.1.

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Thomas De Keulenaer
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838359#comment-17838359
 ] 

Brandon Williams commented on CASSANDRA-19534:
--

The p99 from easy-cass-stress does creep up on 4.1 as well but at a much slower 
rate so it's not as easily observable as 5.0.

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19565:
-
 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
  Component/s: Packaging
Discovered By: User Report
Fix Version/s: 5.0.x
   5.x
 Severity: Normal
 Assignee: Brandon Williams
   Status: Open  (was: Triage Needed)

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Thomas De Keulenaer
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838298#comment-17838298
 ] 

Brandon Williams edited comment on CASSANDRA-19565 at 4/17/24 5:01 PM:
---

If we change the '-M' flag to useradd to '-m' to create the home directory, 
that leaves us with:
{noformat}
drwx-- 6 cassandra cassandra 124 Apr 17 16:58 .
drwxr-xr-x 1 root  root  131 Apr 17 16:58 ..
-rw-r--r-- 1 cassandra cassandra  18 Aug  3  2022 .bash_logout
-rw-r--r-- 1 cassandra cassandra 141 Aug  3  2022 .bash_profile
-rw-r--r-- 1 cassandra cassandra 376 Aug  3  2022 .bashrc
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 16:55 commitlog
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 16:55 data
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 16:55 hints
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 16:55 saved_caches
{noformat}

Even though nothing else _should_ need to read or list this directory, we had 
more permissive perms before and perhaps should continue that here.  WDYT?


was (Author: brandon.williams):
If we change the '-M' flag to useradd to '-m' to create the home directory, 
that leaves us with:
{noformat}
drwx-- 6 cassandra cassandra 124 Apr 17 16:58 .
drwxr-xr-x 1 root  root  131 Apr 17 16:58 ..
-rw-r--r-- 1 cassandra cassandra  18 Aug  3  2022 .bash_logout
-rw-r--r-- 1 cassandra cassandra 141 Aug  3  2022 .bash_profile
-rw-r--r-- 1 cassandra cassandra 376 Aug  3  2022 .bashrc
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 16:55 commitlog
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 16:55 data
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 16:55 hints
drwxr-xr-x 2 cassandra cassandra   6 Apr 17 16:55 saved_caches
{noformat}

Even though nothing else _should_ need to read or list this directory, we had 
more perms before and perhaps should continue that here.  WDYT?

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas De Keulenaer
>Priority: Normal
> Fix For: 4.1.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838273#comment-17838273
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

I think updating something as far reaching as JNA in a minor release is a tough 
sell, especially if we are only gaining cosmetics which we can obviate the need 
for by changing the ownership in the packaging.  

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas De Keulenaer
>Priority: Normal
> Fix For: 4.1.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19401) Nodetool import expects directory structure

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838244#comment-17838244
 ] 

Brandon Williams commented on CASSANDRA-19401:
--

bq. ImportTest.testImportCorruptWithoutValidationWithCopying is just flaky and 
we discovered it here.

If you are saying that failure is unrelated to this ticket, I don't think 
that's accurate: 
https://app.circleci.com/pipelines/github/driftx/cassandra/1584/workflows/7797df46-e331-46de-ba5e-885814d8f9a8

> Nodetool import expects directory structure
> ---
>
> Key: CASSANDRA-19401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Norbert Schultz
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> According to the 
> [documentation|https://cassandra.apache.org/doc/4.1/cassandra/operating/bulk_loading.html]
>  the nodetool import should not rely on the folder structure of the imported 
> sst files:
> {quote}
> Because the keyspace and table are specified on the command line for nodetool 
> import, there is not the same requirement as with sstableloader, to have the 
> SSTables in a specific directory path. When importing snapshots or 
> incremental backups with nodetool import, the SSTables don’t need to be 
> copied to another directory.
> {quote}
> However when importing old cassandra snapshots, we figured out, that sstables 
> still need to be in a directory called like $KEYSPACE/$TABLENAME files, even 
> when keyspace and table name are already present as parameters for the 
> nodetool import call.
> Call we used:
> {code}
> nodetool import --copy-data mykeyspace mytable /full_path_to/test1
> {code}
> Log:
> {code}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,565 
> SSTableImporter.java:72 - Loading new SSTables for mykeyspace/mytable: 
> Options{srcPaths='[/full_path_to/test1]', resetLevel=true, 
> clearRepaired=true, verifySSTables=true, verifyTokens=true, 
> invalidateCaches=true, extendedVerify=false, copyData= true}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,566 
> SSTableImporter.java:173 - No new SSTables were found for mykeyspace/mytable
> {code}
> However, when we move the sstables (.db-Files) to 
> {{alternative/mykeyspace/mytable}}
> and import with
> {code}
> nodetool import --copy-data mykeyspace mytable 
> /fullpath/alternative/mykeyspace/mytable
> {code}
> the import works
> {code}
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:177 - Loading new SSTables and building secondary 
> indexes for mykeyspace/mytable: 
> [BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-2-big-Data.db'),
>  
> BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-1-big-Data.db')]
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:190 - Done loading load new SSTables for 
> mykeyspace/mytable
> {code}
> We experienced this in Cassandra 4.1.3 on Java 11 (Linux)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14572) Expose all table metrics in virtual table

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838242#comment-17838242
 ] 

Brandon Williams commented on CASSANDRA-14572:
--

I too see the OOM if I loop 'ant test-jvm-dtest-some 
-Dtest.name=org.apache.cassandra.distributed.test.ReadRepairTest' a few times.

> Expose all table metrics in virtual table
> -
>
> Key: CASSANDRA-14572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14572
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Legacy/Observability, Observability/Metrics
>Reporter: Chris Lohfink
>Assignee: Maxim Muzafarov
>Priority: Low
>  Labels: virtual-tables
> Fix For: 5.1
>
> Attachments: flight_recording_1270017199_13.jfr, keyspayces_group 
> responses times.png, keyspayces_group summary.png, select keyspaces_group by 
> string prefix.png, select keyspaces_group compare with wo.png, select 
> keyspaces_group without value.png, systemv_views.metrics_dropped_message.png, 
> thread_pools benchmark.png
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> While we want a number of virtual tables to display data in a way thats great 
> and intuitive like in nodetool. There is also much for being able to expose 
> the metrics we have for tooling via CQL instead of JMX. This is more for the 
> tooling and adhoc advanced users who know exactly what they are looking for.
> *Schema:*
> Initial idea is to expose data via {{((keyspace, table), metric)}} with a 
> column for each metric value. Could also use a Map or UDT instead of the 
> column based that can be a bit more specific to each metric type. To that end 
> there can be a {{metric_type}} column and then a UDT for each metric type 
> filled in, or a single value with more of a Map style. I am 
> purposing the column type though as with {{ALLOW FILTERING}} it does allow 
> more extensive query capabilities.
> *Implementations:*
> * Use reflection to grab all the metrics from TableMetrics (see: 
> CASSANDRA-7622 impl). This is easiest and least abrasive towards new metric 
> implementors... but its reflection and a kinda a bad idea.
> * Add a hook in TableMetrics to register with this virtual table when 
> registering
> * Pull from the CassandraMetrics registery (either reporter or iterate 
> through metrics query on read of virtual table)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19401) Nodetool import expects directory structure

2024-04-17 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19401:
-
Reviewers: Brandon Williams

> Nodetool import expects directory structure
> ---
>
> Key: CASSANDRA-19401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Norbert Schultz
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> According to the 
> [documentation|https://cassandra.apache.org/doc/4.1/cassandra/operating/bulk_loading.html]
>  the nodetool import should not rely on the folder structure of the imported 
> sst files:
> {quote}
> Because the keyspace and table are specified on the command line for nodetool 
> import, there is not the same requirement as with sstableloader, to have the 
> SSTables in a specific directory path. When importing snapshots or 
> incremental backups with nodetool import, the SSTables don’t need to be 
> copied to another directory.
> {quote}
> However when importing old cassandra snapshots, we figured out, that sstables 
> still need to be in a directory called like $KEYSPACE/$TABLENAME files, even 
> when keyspace and table name are already present as parameters for the 
> nodetool import call.
> Call we used:
> {code}
> nodetool import --copy-data mykeyspace mytable /full_path_to/test1
> {code}
> Log:
> {code}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,565 
> SSTableImporter.java:72 - Loading new SSTables for mykeyspace/mytable: 
> Options{srcPaths='[/full_path_to/test1]', resetLevel=true, 
> clearRepaired=true, verifySSTables=true, verifyTokens=true, 
> invalidateCaches=true, extendedVerify=false, copyData= true}
> INFO  [RMI TCP Connection(21)-127.0.0.1] 2024-02-15 10:41:06,566 
> SSTableImporter.java:173 - No new SSTables were found for mykeyspace/mytable
> {code}
> However, when we move the sstables (.db-Files) to 
> {{alternative/mykeyspace/mytable}}
> and import with
> {code}
> nodetool import --copy-data mykeyspace mytable 
> /fullpath/alternative/mykeyspace/mytable
> {code}
> the import works
> {code}
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:177 - Loading new SSTables and building secondary 
> indexes for mykeyspace/mytable: 
> [BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-2-big-Data.db'),
>  
> BigTableReader(path='/mnt/ramdisk/cassandra4/data/mykeyspace/mytable-561a12d0cbe611eead78fbfd293cee40/me-1-big-Data.db')]
> INFO  [RMI TCP Connection(23)-127.0.0.1] 2024-02-15 10:43:36,093 
> SSTableImporter.java:190 - Done loading load new SSTables for 
> mykeyspace/mytable
> {code}
> We experienced this in Cassandra 4.1.3 on Java 11 (Linux)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838156#comment-17838156
 ] 

Brandon Williams edited comment on CASSANDRA-19565 at 4/17/24 12:26 PM:


bq. But I admit, it is also unclear to me why this is only an issue in 
Cassandra 4.1.4 and not on 4.0.12.

I think the difference must be in the jna versions.  I've seen noexec on tmp 
many times, and the solution has always been to redefine java.io.tmpdir to a 
more forgiving path, but it seems newer jna will try to write other locations 
too (and then segfault if they don't work.)

In any case, if fixing the home dir perms solves this I'm not opposed to that; 
even though we haven't needed it in the past it's the most correct thing to do.


was (Author: brandon.williams):
bq. But I admit, it is also unclear to me why this is only an issue in 
Cassandra 4.1.4 and not on 4.0.12.

I think the difference must be in the jna versions.  I've seen noexec on tmp 
many times, and the solution has always been to redefine java.io.tmpdir to a 
more forgiving path, but it seems newer jna will try to write other locations 
too (and then segfault if they don't work.)

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas De Keulenaer
>Priority: Normal
> Fix For: 4.1.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838156#comment-17838156
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

bq. But I admit, it is also unclear to me why this is only an issue in 
Cassandra 4.1.4 and not on 4.0.12.

I think the difference must be in the jna versions.  I've seen noexec on tmp 
many times, and the solution has always been to redefine java.io.tmpdir to a 
more forgiving path, but it seems newer jna will try to write other locations 
too (and then segfault if they don't work.)

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas De Keulenaer
>Priority: Normal
> Fix For: 4.1.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838136#comment-17838136
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

I'm not opposed to changing the home dir owner/perms, but

bq. Setting the cassandra user's home directory (/var/lib/cassandra) to 
read+write+executable and owned by cassandra, solves the issue.

do we know why this is?  The owner/perms has been this way a long time, is your 
TMPDIR set to the home dir?

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas De Keulenaer
>Priority: Normal
> Fix For: 4.1.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838133#comment-17838133
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

Hmm, that was added as part of CASSANDRA-17470.  I suspect the config/noreplace 
is what is preventing it, and I think that was chosen to not modify any 
existing installs that may have changed it, but it's still curious how there's 
no effect on new installations.  Let me run some tests and see what's going on.

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas De Keulenaer
>Priority: Normal
> Fix For: 4.1.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838134#comment-17838134
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

bq.  at the end of line 166 is the culprit

Ah, that is because we were only changing the subdirs to not be world-writable, 
we only changed the umask of what was there before.

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas De Keulenaer
>Priority: Normal
> Fix For: 4.1.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19565) SIGSEGV on Cassandra v4.1.4

2024-04-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838116#comment-17838116
 ] 

Brandon Williams commented on CASSANDRA-19565:
--

The directory is created by useradd 
[here|https://github.com/apache/cassandra/blob/trunk/redhat/cassandra.spec#L140],
 and this is what the directory listing looks like on a fresh install:

{noformat}
]# ls -al /var/lib/cassandra/
total 0
drwxr-xr-x 6 root  root   68 Apr 17 11:13 .
drwxr-xr-x 1 root  root  131 Apr 17 11:13 ..
drwxr-xr-x 2 cassandra cassandra   6 Jan 23 19:52 commitlog
drwxr-xr-x 2 cassandra cassandra   6 Jan 23 19:52 data
drwxr-xr-x 2 cassandra cassandra   6 Jan 23 19:52 hints
drwxr-xr-x 2 cassandra cassandra   6 Jan 23 19:52 saved_caches
{noformat}

Maybe you have a modified /etc/login.defs with a different umask or similar?

> SIGSEGV on Cassandra v4.1.4
> ---
>
> Key: CASSANDRA-19565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19565
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas De Keulenaer
>Priority: Normal
> Fix For: 4.1.x
>
> Attachments: hs_err_pid1116450.log
>
>
> Hello,
> Since upgrading to v4.1. we cannat run CAssandra any more. Each start 
> immediately crashes:
> {{Apr 17 08:58:34 SVALD108 cassandra[1116450]: # A fatal error has been 
> detected by the Java Runtime Environment:
> Apr 17 08:58:34 SVALD108 cassandra[1116450]: #  SIGSEGV (0xb) at 
> pc=0x7fccaab4d152, pid=1116450, tid=1116451}}
> I have added the log from the coe dump.
> This issue is perhaps related to 
> https://davecturner.github.io/2021/08/30/seven-year-old-segfault.html ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19564) MemtablePostFlush deadlock leads to stuck nodes and crashes

2024-04-16 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837889#comment-17837889
 ] 

Brandon Williams commented on CASSANDRA-19564:
--

Can you note which memtable_allocation_type was used and how many 
memtable_flush_writers?

> MemtablePostFlush deadlock leads to stuck nodes and crashes
> ---
>
> Key: CASSANDRA-19564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19564
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Local/Memtable
>Reporter: Jon Haddad
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: image-2024-04-16-11-55-54-750.png, 
> image-2024-04-16-12-29-15-386.png, image-2024-04-16-13-43-11-064.png, 
> image-2024-04-16-13-53-24-455.png
>
>
> I've run into an issue on a 4.1.4 cluster where an entire node has locked up 
> due to what I believe is a deadlock in memtable flushing. Here's what I know 
> so far.  I've stitched together what happened based on conversations, logs, 
> and some flame graphs.
> *Log reports memtable flushing*
> The last successful flush happens at 12:19. 
> {noformat}
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 
> AbstractAllocatorMemtable.java:286 - Flushing largest CFS(Keyspace='ks', 
> ColumnFamily='version') to free up room. Used total: 0.24/0.33, live: 
> 0.16/0.20, flushing: 0.09/0.13, this: 0.13/0.15
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 ColumnFamilyStore.java:1012 
> - Enqueuing flush of ks.version, Reason: MEMTABLE_LIMIT, Usage: 660.521MiB 
> (13%) on-heap, 790.606MiB (15%) off-heap
> {noformat}
> *MemtablePostFlush appears to be blocked*
> At this point, MemtablePostFlush completed tasks stops incrementing, active 
> stays at 1 and pending starts to rise.
> {noformat}
> MemtablePostFlush   1    1   3446   0   0
> {noformat}
>  
> The flame graph reveals that PostFlush.call is stuck.  I don't have the line 
> number, but I know we're stuck in 
> {{org.apache.cassandra.db.ColumnFamilyStore.PostFlush#call}} given the visual 
> below:
> *!image-2024-04-16-13-43-11-064.png!*
> *Memtable flushing is now blocked.*
> All MemtableFlushWriter threads are Parked waiting on 
> {{{}OpOrder.Barrier.await{}}}. A wall clock profile of 30s reveals all time 
> is spent here.  Presumably we're waiting on the single threaded Post Flush.
> !image-2024-04-16-12-29-15-386.png!
> *Memtable allocations start to block*
> Eventually it looks like the NativeAllocator stops successfully allocating 
> memory. I assume it's waiting on memory to be freed, but since memtable 
> flushes are blocked, we wait indefinitely.
> Looking at a wall clock flame graph, all writer threads have reached the 
> allocation failure path of {{MemtableAllocator.allocate()}}.  I believe we're 
> waiting on {{signal.awaitThrowUncheckedOnInterrupt()}}
> {noformat}
>  MutationStage    48    828425      980253369      0    0{noformat}
> !image-2024-04-16-11-55-54-750.png!
>  
> *Compaction Stops*
> Since we write to the compaction history table, and that requires memtables, 
> compactions are now blocked as well.
>  
> !image-2024-04-16-13-53-24-455.png!
>  
> The node is now doing basically nothing and must be restarted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19561) finish_release.sh does not handle jfrog errors

2024-04-16 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837886#comment-17837886
 ] 

Brandon Williams commented on CASSANDRA-19561:
--

I don't really want to pull in jq just to use a non-standard error mechanism, 
but even so with the execution passed through the 'execute()' function, we 
can't pass the output to it or even grep.  Even if we could, if the script 
aborts at that you still can't rerun it to try again and finish things without 
some hackery.

> finish_release.sh does not handle jfrog errors
> --
>
> Key: CASSANDRA-19561
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19561
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Priority: Normal
>
> If there is any "soft" problem (not network) uploading to jfrog, 
> finish_release.sh will keep on going and then delete the packages regardless 
> of whether they could be uploaded or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19561) finish_release.sh does not handle jfrog errors

2024-04-16 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837886#comment-17837886
 ] 

Brandon Williams edited comment on CASSANDRA-19561 at 4/16/24 9:16 PM:
---

I don't really want to pull in jq just to use a non-standard error mechanism, 
but even so with the execution passed through the 'execute()' function, we 
can't pass the output to it or even grep.  Even if we could, if the script 
aborts at that point you still can't rerun it to try again and finish things 
without some hackery.


was (Author: brandon.williams):
I don't really want to pull in jq just to use a non-standard error mechanism, 
but even so with the execution passed through the 'execute()' function, we 
can't pass the output to it or even grep.  Even if we could, if the script 
aborts at that you still can't rerun it to try again and finish things without 
some hackery.

> finish_release.sh does not handle jfrog errors
> --
>
> Key: CASSANDRA-19561
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19561
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Priority: Normal
>
> If there is any "soft" problem (not network) uploading to jfrog, 
> finish_release.sh will keep on going and then delete the packages regardless 
> of whether they could be uploaded or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19564) MemtablePostFlush deadlock leads to stuck nodes

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19564:
-
   Complexity: Normal
  Component/s: Local/Memtable
Discovered By: User Report
Fix Version/s: 4.1.x
   Status: Open  (was: Triage Needed)

> MemtablePostFlush deadlock leads to stuck nodes
> ---
>
> Key: CASSANDRA-19564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19564
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Local/Memtable
>Reporter: Jon Haddad
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: image-2024-04-16-11-55-54-750.png, 
> image-2024-04-16-12-29-15-386.png, image-2024-04-16-13-43-11-064.png, 
> image-2024-04-16-13-53-24-455.png
>
>
> I've run into an issue on a 4.1.4 cluster where an entire node has locked up 
> due to what I believe is a deadlock in memtable flushing. Here's what I know 
> so far.  I've stitched together what happened based on conversations, logs, 
> and some flame graphs.
> *Log reports memtable flushing*
> The last successful flush happens at 12:19. 
> {noformat}
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 
> AbstractAllocatorMemtable.java:286 - Flushing largest CFS(Keyspace='ks', 
> ColumnFamily='version') to free up room. Used total: 0.24/0.33, live: 
> 0.16/0.20, flushing: 0.09/0.13, this: 0.13/0.15
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 ColumnFamilyStore.java:1012 
> - Enqueuing flush of ks.version, Reason: MEMTABLE_LIMIT, Usage: 660.521MiB 
> (13%) on-heap, 790.606MiB (15%) off-heap
> {noformat}
> *MemtablePostFlush appears to be blocked*
> At this point, MemtablePostFlush completed tasks stops incrementing, active 
> stays at 1 and pending starts to rise.
> {noformat}
> MemtablePostFlush   1    1   3446   0   0
> {noformat}
>  
> The flame graph reveals that PostFlush.call is stuck.  I don't have the line 
> number, but I know we're stuck in 
> {{org.apache.cassandra.db.ColumnFamilyStore.PostFlush#call}} given the visual 
> below:
> *!image-2024-04-16-13-43-11-064.png!*
> *Memtable flushing is now blocked.*
> All MemtableFlushWriter threads are Parked waiting on 
> {{{}OpOrder.Barrier.await{}}}. A wall clock profile of 30s reveals all time 
> is spent here.  Presumably we're waiting on the single threaded Post Flush.
> !image-2024-04-16-12-29-15-386.png!
> *Memtable allocations start to block*
> Eventually it looks like the NativeAllocator stops successfully allocating 
> memory. I assume it's waiting on memory to be freed, but since memtable 
> flushes are blocked, we wait indefinitely.
> Looking at a wall clock flame graph, all writer threads have reached the 
> allocation failure path of {{MemtableAllocator.allocate()}}.  I believe we're 
> waiting on {{signal.awaitThrowUncheckedOnInterrupt()}}
> {noformat}
>  MutationStage    48    828425      980253369      0    0{noformat}
> !image-2024-04-16-11-55-54-750.png!
>  
> *Compaction Stops*
> Since we write to the compaction history table, and that requires memtables, 
> compactions are now blocked as well.
>  
> !image-2024-04-16-13-53-24-455.png!
>  
> The node is now doing basically nothing and must be restarted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19551:
-
Fix Version/s: (was: 5.x)

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 3.0.31, 3.11.17, 4.0.13, 5.0-beta2, 5.1
>
> Attachments: ci_summary.html
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151]
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860]
>  then when {{get_env}} runs it will [overwrite the Java 
> version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244]
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19551:
-
  Fix Version/s: 3.0.31
 3.11.17
 4.0.13
 5.0-beta2
 5.1
  Since Version: NA
Source Control Link: 
https://github.com/riptano/ccm/commit/ce69e9df94f9e44f3f5b36f8ed4e2d07f734195f
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

I committed this and updated the cassandra-test tag.

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 3.0.31, 3.11.17, 4.0.13, 5.0-beta2, 5.x, 5.1
>
> Attachments: ci_summary.html
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151]
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860]
>  then when {{get_env}} runs it will [overwrite the Java 
> version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244]
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19561) finish_release.sh does not handle jfrog errors

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19561:
-
Description: If there is any "soft" problem (not network) uploading to 
jfrog, finish_release.sh will keep on going and then delete the packages 
regardless of whether they could be uploaded or not.  (was: If there is any 
problem uploading to jfrog, finish_release.sh will keep on going and then 
delete the packages regardless of whether they could be uploaded or not.)

> finish_release.sh does not handle jfrog errors
> --
>
> Key: CASSANDRA-19561
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19561
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Priority: Normal
>
> If there is any "soft" problem (not network) uploading to jfrog, 
> finish_release.sh will keep on going and then delete the packages regardless 
> of whether they could be uploaded or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19561) finish_release.sh does not handle jfrog errors

2024-04-16 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837751#comment-17837751
 ] 

Brandon Williams commented on CASSANDRA-19561:
--

The root cause seems to be that jfrog returns its errors in json instead of 
http return codes, so curl will always exit with zero:

{noformat}
drift@phoenix:/tmp$ curl -X PUT -T foo -ubad:password 
https://apache.jfrog.io/artifactory/cassandra/whatever?override=1
{
  "errors" : [ {
"status" : 401,
"message" : "Bad credentials"
  } ]
}drift@phoenix:/tmp$ echo $?
0
{noformat}

> finish_release.sh does not handle jfrog errors
> --
>
> Key: CASSANDRA-19561
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19561
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Priority: Normal
>
> If there is any problem uploading to jfrog, finish_release.sh will keep on 
> going and then delete the packages regardless of whether they could be 
> uploaded or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19561) finish_release.sh does not handle jfrog errors

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19561:
-
 Bug Category: Parent values: Degradation(12984)Level 1 values: Other 
Exception(12998)
   Complexity: Normal
  Component/s: Build
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> finish_release.sh does not handle jfrog errors
> --
>
> Key: CASSANDRA-19561
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19561
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Priority: Normal
>
> If there is any problem uploading to jfrog, finish_release.sh will keep on 
> going and then delete the packages regardless of whether they could be 
> uploaded or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19561) finish_release.sh does not handle jfrog errors

2024-04-16 Thread Brandon Williams (Jira)
Brandon Williams created CASSANDRA-19561:


 Summary: finish_release.sh does not handle jfrog errors
 Key: CASSANDRA-19561
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19561
 Project: Cassandra
  Issue Type: Bug
Reporter: Brandon Williams


If there is any problem uploading to jfrog, finish_release.sh will keep on 
going and then delete the packages regardless of whether they could be uploaded 
or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19559) prepare_release.sh should check for mvn

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19559:
-
Status: Review In Progress  (was: Needs Committer)

> prepare_release.sh should check for mvn
> ---
>
> Key: CASSANDRA-19559
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19559
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: NA
>
>
> Part of the 'prepare' phase of releasing includes publishing Maven artifacts, 
> which requires that it be installed.  The script should check for this since 
> it's quite easy to miss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19559) prepare_release.sh should check for mvn

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19559:
-
Reviewers: Berenguer Blasi

> prepare_release.sh should check for mvn
> ---
>
> Key: CASSANDRA-19559
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19559
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: NA
>
>
> Part of the 'prepare' phase of releasing includes publishing Maven artifacts, 
> which requires that it be installed.  The script should check for this since 
> it's quite easy to miss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19559) prepare_release.sh should check for mvn

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19559:
-
Status: Ready to Commit  (was: Review In Progress)

> prepare_release.sh should check for mvn
> ---
>
> Key: CASSANDRA-19559
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19559
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: NA
>
>
> Part of the 'prepare' phase of releasing includes publishing Maven artifacts, 
> which requires that it be installed.  The script should check for this since 
> it's quite easy to miss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19559) prepare_release.sh should check for mvn

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19559:
-
  Fix Version/s: 3.0.31
 3.11.17
 4.0.13
 4.1.5
 5.0-beta2
 5.1
 (was: NA)
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra-builds/commit/cdf0ec532635ce069c3406a176846209c9be8af5
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Thanks for the review! Committed.

> prepare_release.sh should check for mvn
> ---
>
> Key: CASSANDRA-19559
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19559
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.0.31, 3.11.17, 4.0.13, 4.1.5, 5.0-beta2, 5.1
>
>
> Part of the 'prepare' phase of releasing includes publishing Maven artifacts, 
> which requires that it be installed.  The script should check for this since 
> it's quite easy to miss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19559) prepare_release.sh should check for mvn

2024-04-16 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19559:
-
Status: Needs Committer  (was: Patch Available)

> prepare_release.sh should check for mvn
> ---
>
> Key: CASSANDRA-19559
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19559
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: NA
>
>
> Part of the 'prepare' phase of releasing includes publishing Maven artifacts, 
> which requires that it be installed.  The script should check for this since 
> it's quite easy to miss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-04-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837458#comment-17837458
 ] 

Brandon Williams commented on CASSANDRA-19534:
--

I do have to use a higher rate, but running both workloads concurrently 
reproduces the runaway buildup and latency.  On 4.1 this seems to stabilize 
once errors begin, though higher than any timeouts are set, so this is new 
behavior.

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837451#comment-17837451
 ] 

Brandon Williams commented on CASSANDRA-19551:
--

The bootstrap failure is CASSANDRA-18098

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
>  then when {{get_env}} runs it will [overwrite the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   9   10   >