[jira] [Updated] (CASSANDRASC-112) ClosedChannelException when downloading from S3

2024-03-05 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-112:
--
Authors: Yifan Cai
Test and Documentation Plan: unit; ci
 Status: Patch Available  (was: Open)

> ClosedChannelException when downloading from S3
> ---
>
> Key: CASSANDRASC-112
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-112
> Project: Sidecar for Apache Cassandra
>  Issue Type: Bug
>  Components: Rest API
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>
> {code:java}
> org.apache.cassandra.sidecar.exceptions.RestoreJobFatalException: 
> Unrecoverable error when downloading object.
> Caused by: java.nio.channels.ClosedChannelException
>   at 
> java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150)
>   at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:266)
>   at 
> org.apache.cassandra.sidecar.restore.StorageClient.lambda$subscribeRateLimitedWrite$6(StorageClient.java:271)
>   ... 22 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-112) ClosedChannelException when downloading from S3

2024-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRASC-112:
---
Labels: pull-request-available  (was: )

> ClosedChannelException when downloading from S3
> ---
>
> Key: CASSANDRASC-112
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-112
> Project: Sidecar for Apache Cassandra
>  Issue Type: Bug
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>
> {code:java}
> org.apache.cassandra.sidecar.exceptions.RestoreJobFatalException: 
> Unrecoverable error when downloading object.
> Caused by: java.nio.channels.ClosedChannelException
>   at 
> java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150)
>   at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:266)
>   at 
> org.apache.cassandra.sidecar.restore.StorageClient.lambda$subscribeRateLimitedWrite$6(StorageClient.java:271)
>   ... 22 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRASC-112) ClosedChannelException when downloading from S3

2024-03-05 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRASC-112:
--
 Bug Category: Parent values: Availability(12983)Level 1 values: Response 
Crash(12991)
   Complexity: Normal
  Component/s: Rest API
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

PR: https://github.com/apache/cassandra-sidecar/pull/103
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-sidecar?branch=CASSANDRASC-112%2Ftrunk

> ClosedChannelException when downloading from S3
> ---
>
> Key: CASSANDRASC-112
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-112
> Project: Sidecar for Apache Cassandra
>  Issue Type: Bug
>  Components: Rest API
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>
> {code:java}
> org.apache.cassandra.sidecar.exceptions.RestoreJobFatalException: 
> Unrecoverable error when downloading object.
> Caused by: java.nio.channels.ClosedChannelException
>   at 
> java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150)
>   at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:266)
>   at 
> org.apache.cassandra.sidecar.restore.StorageClient.lambda$subscribeRateLimitedWrite$6(StorageClient.java:271)
>   ... 22 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19418 - Changes to report additional bulk analytics job stats for instrumentation [cassandra-analytics]

2024-03-05 Thread via GitHub


arjunashok commented on code in PR #41:
URL: 
https://github.com/apache/cassandra-analytics/pull/41#discussion_r1513798245


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/CassandraBulkSourceRelation.java:
##
@@ -107,17 +112,25 @@ private void persist(@NotNull JavaPairRDD sortedRDD, Str
 {
 try
 {
-sortedRDD.foreachPartition(writeRowsInPartition(broadcastContext, 
columnNames));
+List results = sortedRDD
+ 
.mapPartitions(partitionsFlatMapFunc(broadcastContext, columnNames))
+ .collect();
+long rowCount = results.stream().mapToLong(res -> 
res.rowCount).sum();
+long totalBytesWritten = results.stream().mapToLong(res -> 
res.bytesWritten).sum();
+LOGGER.info("Bulk writer has written {} rows and {} bytes", 
rowCount, totalBytesWritten);
+recordSuccessfulJobStats(rowCount, totalBytesWritten);
 }
 catch (Throwable throwable)
 {
+recordFailureStats(throwable.getMessage());
 LOGGER.error("Bulk Write Failed", throwable);
 throw new RuntimeException("Bulk Write to Cassandra has failed", 
throwable);
 }
 finally
 {
 try
 {
+writerContext.publishJobStats();

Review Comment:
   So, the change is not propagating data back to the driver, but publishes 
stats from the executors. I am assuming here that the context is made available 
to the executors so what you are saying does not need to happen. Let me know if 
that makes sense.
   
   I have been able to validate this using the in-jvm-dtests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19418 - Changes to report additional bulk analytics job stats for instrumentation [cassandra-analytics]

2024-03-05 Thread via GitHub


arjunashok commented on code in PR #41:
URL: 
https://github.com/apache/cassandra-analytics/pull/41#discussion_r1513793354


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkWriterContext.java:
##
@@ -21,7 +21,9 @@
 
 import java.io.Serializable;
 
-public interface BulkWriterContext extends Serializable
+import org.apache.cassandra.spark.common.Reportable;
+
+public interface BulkWriterContext extends Serializable, Reportable

Review Comment:
   So, the functionality provided by the new interface is replacing the 
existing `dialHome` method in the `CassandraBulkWriterContext` .  
   
   The thinking is that this is tied to the "context" that is shared across 
executors, as we "record" initial stats and job status stats at the executor 
level and the "inflight" stats at the task level.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19418 - Changes to report additional bulk analytics job stats for instrumentation [cassandra-analytics]

2024-03-05 Thread via GitHub


arjunashok commented on code in PR #41:
URL: 
https://github.com/apache/cassandra-analytics/pull/41#discussion_r1513793217


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RingInstance.java:
##
@@ -49,6 +49,7 @@ public RingInstance(ReplicaMetadata replica)
  .datacenter(replica.datacenter())
  .state(replica.state())
  .status(replica.status())
+ .token("")

Review Comment:
   Answered in the response below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19418 - Changes to report additional bulk analytics job stats for instrumentation [cassandra-analytics]

2024-03-05 Thread via GitHub


arjunashok commented on code in PR #41:
URL: 
https://github.com/apache/cassandra-analytics/pull/41#discussion_r1513793094


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RingInstance.java:
##
@@ -125,40 +126,28 @@ private void writeObject(ObjectOutputStream out) throws 
IOException
 out.writeUTF(ringEntry.address());
 out.writeInt(ringEntry.port());
 out.writeUTF(ringEntry.datacenter());
-out.writeUTF(ringEntry.load());

Review Comment:
   Since We are now returning the `StreamResult` back from the tasks, the 
existing implementation will result in NPEs while serializing the contained 
`RingInstance`, due to many of these fields not being defined when we create 
the `RingInstance` from `ReplicaMetadata`.  The change removes the fields not 
being used from RingInstance from the serialization context.
   
   Likewise, the change is also explicitly setting the `token` field to a 
default for the same reason, since `token` is part of the equals/hashcode 
validations.
   
   Stack trace:
   
   ```
   Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 
in stage 1.0 (TID 40) (172.20.10.7 executor driver): 
com.esotericsoftware.kryo.KryoException: Error during Java serialization.
   Serialization trace:
   instance (org.apache.cassandra.spark.bulkwriter.CommitResult)
   commitResults (org.apache.cassandra.spark.bulkwriter.StreamResult)
at 
org.apache.cassandra.spark.bulkwriter.util.SbwJavaSerializer.write(SbwJavaSerializer.java:58)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19418 - Changes to report additional bulk analytics job stats for instrumentation [cassandra-analytics]

2024-03-05 Thread via GitHub


arjunashok commented on code in PR #41:
URL: 
https://github.com/apache/cassandra-analytics/pull/41#discussion_r1513792433


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/common/Reportable.java:
##
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.cassandra.spark.common;
+
+import java.util.Map;
+
+/**
+ * Interface to provide functionality to report Spark Job Statistics and/or 
properties
+ * that can optionally be instrumented. The default implementation merely logs 
these
+ * stats at the end of the job.
+ */
+public interface Reportable

Review Comment:
   This is not meant to be specific to the writer, so can be renamed to 
`JobStats` maybe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19418 - Changes to report additional bulk analytics job stats for instrumentation [cassandra-analytics]

2024-03-05 Thread via GitHub


arjunashok commented on code in PR #41:
URL: 
https://github.com/apache/cassandra-analytics/pull/41#discussion_r1513792138


##
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/common/Reportable.java:
##
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.cassandra.spark.common;

Review Comment:
   Makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18879) Modernize CQLSH datetime conversions

2024-03-05 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-18879:
-
Reviewers: Brandon Williams

> Modernize CQLSH datetime conversions
> 
>
> Key: CASSANDRA-18879
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18879
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Interpreter
>Reporter: Brad Schoening
>Assignee: Arun Ganesh
>Priority: Low
> Attachments: cassandra-cqlsh-stdout
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Python 3.x introduced many updates to datetime conversion which allows 
> simplified conversions.
> 1. For example, tracing.py defines a function datetime_from_utc_to_local() 
> but datetime now has a native function astimezone() which will convert UTC to 
> local time.
> Review the following users of datetime which apply conversions:
>  * cqlshmain.py
>  * formatting.py 
>  * tracing.py
> Example: 
> {code:java}
> >>> from dateutil import tz
> >>> import datetime
> >>> a = datetime.datetime.now().astimezone(tz.tzutc())
> >>> a
> datetime.datetime(2023, 9, 25, 11, 22, 36, 251705, tzinfo=tzutc())
> >>> b = a.astimezone()
> >>> b
> datetime.datetime(2023, 9, 25, 14, 22, 36, 251705, 
> tzinfo=datetime.timezone(datetime.timedelta(seconds=10800), 'EST')) {code}
> See [[PEP 495|http://example.com]]]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRASC-112) ClosedChannelException when downloading from S3

2024-03-05 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRASC-112:
-

 Summary: ClosedChannelException when downloading from S3
 Key: CASSANDRASC-112
 URL: https://issues.apache.org/jira/browse/CASSANDRASC-112
 Project: Sidecar for Apache Cassandra
  Issue Type: Bug
Reporter: Yifan Cai
Assignee: Yifan Cai



{code:java}
org.apache.cassandra.sidecar.exceptions.RestoreJobFatalException: Unrecoverable 
error when downloading object.
Caused by: java.nio.channels.ClosedChannelException
at 
java.base/sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:150)
at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:266)
at 
org.apache.cassandra.sidecar.restore.StorageClient.lambda$subscribeRateLimitedWrite$6(StorageClient.java:271)
... 22 more
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19454) Revert switch to approximate time in Dispatcher to avoid mixing with nanoTime() in downstream timeout calculations

2024-03-05 Thread Arun Ganesh (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823818#comment-17823818
 ] 

Arun Ganesh commented on CASSANDRA-19454:
-

Thanks!

Meanwhile, if you have any issues lying around that can help me understand the 
project better, I'd like to work on them. 

> Revert switch to approximate time in Dispatcher to avoid mixing with 
> nanoTime() in downstream timeout calculations
> --
>
> Key: CASSANDRA-19454
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19454
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Caleb Rackliffe
>Assignee: Arun Ganesh
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-15241 changed {{Dispatcher}} to use the {{approxTime}} 
> implementation of {{MonotonicClock}} rather than {{nanoTime()}}, but clock 
> drift between the two, can potentially cause queries to time out more 
> quickly. We should be able to revert the {{Dispatcher}} to use {{nanoTime()}} 
> again and similarly change {{QueriesTable} to {{nanoTime()}} as well for 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-website) branch asf-staging updated (e86c7f87c -> 5dca216ee)

2024-03-05 Thread git-site-role
This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a change to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


 discard e86c7f87c generate docs for fd550e9c
 new 5dca216ee generate docs for fd550e9c

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (e86c7f87c)
\
 N -- N -- N   refs/heads/asf-staging (5dca216ee)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/search-index.js |   2 +-
 site-ui/build/ui-bundle.zip | Bin 4883646 -> 4883646 bytes
 2 files changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19454) Revert switch to approximate time in Dispatcher to avoid mixing with nanoTime() in downstream timeout calculations

2024-03-05 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823788#comment-17823788
 ] 

Caleb Rackliffe commented on CASSANDRA-19454:
-

The draft PR looks good, +1

I'm running the tests, and I'll post a summary for you here soon...

> Revert switch to approximate time in Dispatcher to avoid mixing with 
> nanoTime() in downstream timeout calculations
> --
>
> Key: CASSANDRA-19454
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19454
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Caleb Rackliffe
>Assignee: Arun Ganesh
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-15241 changed {{Dispatcher}} to use the {{approxTime}} 
> implementation of {{MonotonicClock}} rather than {{nanoTime()}}, but clock 
> drift between the two, can potentially cause queries to time out more 
> quickly. We should be able to revert the {{Dispatcher}} to use {{nanoTime()}} 
> again and similarly change {{QueriesTable} to {{nanoTime()}} as well for 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-website) branch asf-staging updated (b178e036c -> e86c7f87c)

2024-03-05 Thread git-site-role
This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a change to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


 discard b178e036c generate docs for fd550e9c
 new e86c7f87c generate docs for fd550e9c

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (b178e036c)
\
 N -- N -- N   refs/heads/asf-staging (e86c7f87c)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/search-index.js |   2 +-
 site-ui/build/ui-bundle.zip | Bin 4883646 -> 4883646 bytes
 2 files changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-website) branch asf-staging updated (4edcc0e8c -> b178e036c)

2024-03-05 Thread git-site-role
This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a change to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


 discard 4edcc0e8c generate docs for fd550e9c
 new b178e036c generate docs for fd550e9c

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (4edcc0e8c)
\
 N -- N -- N   refs/heads/asf-staging (b178e036c)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/search-index.js |   2 +-
 site-ui/build/ui-bundle.zip | Bin 4883646 -> 4883646 bytes
 2 files changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19452) [Analytics] Use constant reference time during bulk read process

2024-03-05 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19452:
--
  Fix Version/s: NA
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra-analytics/commit/a13532272051d4e4608f92d53bdd997103e8ea19
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> [Analytics] Use constant reference time during bulk read process
> 
>
> Key: CASSANDRA-19452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19452
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Bulk reader leverages a time provider that returns the current time during 
> read to guide compaction and validation.
> As the current time value varies in spark executors, there is a chance that 
> rows/cells get expired inconsistently. Another issue is the validation on 
> no-expired rows/cells after compaction might fail, since they could expire 
> during read. The read can take minutes or even hours.
> It could lead to false data omission and job failure.
> The fix is to use constant reference time that is decided by Spark driver and 
> distribute to all executors. The reference time is used for compaction and 
> validation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-analytics) branch trunk updated: CASSANDRA-19452 Use constant reference time during bulk read process (#44)

2024-03-05 Thread ycai
This is an automated email from the ASF dual-hosted git repository.

ycai pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-analytics.git


The following commit(s) were added to refs/heads/trunk by this push:
 new a135322  CASSANDRA-19452 Use constant reference time during bulk read 
process (#44)
a135322 is described below

commit a13532272051d4e4608f92d53bdd997103e8ea19
Author: Yifan Cai <52585731+yifa...@users.noreply.github.com>
AuthorDate: Tue Mar 5 11:06:36 2024 -0800

CASSANDRA-19452 Use constant reference time during bulk read process (#44)

patch by Yifan Cai; reviewed by Francisco Guerrero, James Berragan for 
CASSANDRA-19452
---
 CHANGES.txt|   1 +
 .../cassandra/spark/data/CassandraDataLayer.java   |  28 -
 .../cassandra/spark/data/LocalDataLayer.java   |   7 ++
 .../org/apache/cassandra/spark/TestDataLayer.java  |   7 ++
 .../data/partitioner/JDKSerializationTests.java|   7 ++
 .../apache/cassandra/bridge/CassandraBridge.java   |   2 -
 .../org/apache/cassandra/spark/data/DataLayer.java |  13 +--
 .../{TimeProvider.java => ReaderTimeProvider.java} |  28 +++--
 .../apache/cassandra/spark/utils/TimeProvider.java |  41 ++-
 .../cassandra/spark/utils/test/TestSchema.java |  29 -
 .../bridge/CassandraBridgeImplementation.java  |   7 --
 .../spark/reader/AbstractStreamScanner.java|  50 ++---
 .../spark/reader/CompactionStreamScanner.java  |  32 --
 .../cassandra/spark/reader/SSTableReaderTests.java | 120 +
 14 files changed, 311 insertions(+), 61 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 8215822..92620a9 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 1.0.0
+ * Use constant reference time during bulk read process (CASSANDRA-19452)
  * Update access of ClearSnapshotStrategy (CASSANDRA-19442)
  * Bulk reader fails to produce a row when regular column values are null 
(CASSANDRA-19411)
  * Use XXHash32 for digest calculation of SSTables (CASSANDRA-19369)
diff --git 
a/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java
 
b/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java
index 40e0436..8ab1dd6 100644
--- 
a/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java
+++ 
b/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java
@@ -87,8 +87,10 @@ import 
org.apache.cassandra.spark.sparksql.LastModifiedTimestampDecorator;
 import org.apache.cassandra.spark.sparksql.RowBuilder;
 import org.apache.cassandra.spark.stats.Stats;
 import org.apache.cassandra.spark.utils.CqlUtils;
+import org.apache.cassandra.spark.utils.ReaderTimeProvider;
 import org.apache.cassandra.spark.utils.ScalaFunctions;
 import org.apache.cassandra.spark.utils.ThrowableUtils;
+import org.apache.cassandra.spark.utils.TimeProvider;
 import org.apache.cassandra.spark.validation.CassandraValidation;
 import org.apache.cassandra.spark.validation.SidecarValidation;
 import org.apache.cassandra.spark.validation.StartupValidatable;
@@ -122,7 +124,6 @@ public class CassandraDataLayer extends 
PartitionedDataLayer implements StartupV
 protected TokenPartitioner tokenPartitioner;
 protected Map availabilityHints;
 protected Sidecar.ClientConfig sidecarClientConfig;
-private SslConfig sslConfig;
 protected Map bigNumberConfigMap;
 protected boolean enableStats;
 protected boolean readIndexOffset;
@@ -133,7 +134,11 @@ public class CassandraDataLayer extends 
PartitionedDataLayer implements StartupV
 protected String lastModifiedTimestampField;
 // volatile in order to publish the reference for visibility
 protected volatile CqlTable cqlTable;
+protected transient TimeProvider timeProvider;
 protected transient SidecarClient sidecar;
+
+private SslConfig sslConfig;
+
 @VisibleForTesting
 transient Map instanceMap;
 
@@ -178,7 +183,8 @@ public class CassandraDataLayer extends 
PartitionedDataLayer implements StartupV
  boolean useIncrementalRepair,
  @Nullable String lastModifiedTimestampField,
  List requestedFeatures,
- @NotNull Map rfMap)
+ @NotNull Map rfMap,
+ TimeProvider timeProvider)
 {
 super(consistencyLevel, datacenter);
 this.snapshotName = snapshotName;
@@ -203,6 +209,7 @@ public class CassandraDataLayer extends 
PartitionedDataLayer implements StartupV
 aliasLastModifiedTimestamp(this.requestedFeatures, 
this.lastModifiedTimestampField);
 }
 this.rfMap = rfMap;
+this.timeProvider = timeProvider;
 this.maybeQuoteKeyspaceAndTable();
 this.initInstanceMap();
 t

Re: [PR] CASSANDRA-19452 Use constant reference time during bulk read process [cassandra-analytics]

2024-03-05 Thread via GitHub


yifan-c merged PR #44:
URL: https://github.com/apache/cassandra-analytics/pull/44


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] CASSANDRA-19452 Use constant reference time during bulk read process [cassandra-analytics]

2024-03-05 Thread via GitHub


yifan-c commented on code in PR #44:
URL: 
https://github.com/apache/cassandra-analytics/pull/44#discussion_r1513261310


##
cassandra-bridge/src/main/java/org/apache/cassandra/spark/data/DataLayer.java:
##
@@ -164,6 +164,11 @@ public CassandraVersion version()
 
 public abstract boolean isInPartition(int partitionId, BigInteger token, 
ByteBuffer key);
 
+/**
+ * @return a TimeProvider
+ */
+public abstract TimeProvider timeProvider();

Review Comment:
   Do not return DEFAULT time prover. The concrete implementation should return 
the correct value. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15452) Improve disk access patterns during compaction and streaming

2024-03-05 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-15452:
---
Description: 
On read heavy workloads Cassandra performs much better when using a low read 
ahead setting.   In my tests I've seen an 5x improvement in throughput and more 
than a 50% reduction in latency.  However, I've also observed that it can have 
a negative impact on compaction and streaming throughput. It especially 
negatively impacts cloud environments where small reads incur high costs in 
IOPS due to tiny requests.
 # We should investigate using POSIX_FADV_DONTNEED on files we're compacting to 
see if we can improve performance and reduce page faults. 
 # This should be combined with an internal read ahead style buffer that 
Cassandra manages, similar to a BufferedInputStream but with our own machinery. 
 This buffer should read fairly large blocks of data off disk at at time.  EBS, 
for example, allows 1 IOP to be up to 256KB.  A considerable amount of time is 
spent in blocking I/O during compaction and streaming. Reducing the frequency 
we read from disk should speed up all sequential I/O operations.
 # We can reduce system calls by buffering writes as well, but I think it will 
have less of an impact than the reads

  was:
On read heavy workloads Cassandra performs much better when using a low read 
ahead setting.   In my tests I've seen an 5x improvement in throughput and more 
than a 50% reduction in latency.  However, I've also observed that it can have 
a negative impact on compaction and streaming throughput. It especially 
negatively impacts cloud environments where small reads incur high costs in 
IOPS due to tiny requests.
 # We should investigate using POSIX_FADV_DONTNEED on files we're compacting to 
see if we can improve performance and reduce page faults. 
 # This should be combined with an internal read ahead style buffer that 
Cassandra manages, similar to a BufferedInputStream but with our own machinery. 
 This buffer should read fairly large blocks of data off disk at at time.  EBS, 
for example, allows 1 IOP to be up to 256KB.  A considerable amount of time is 
spent in blocking I/O during compaction and streaming. Reducing the frequency 
we read from disk should speed up all sequential I/O operations.


> Improve disk access patterns during compaction and streaming
> 
>
> Key: CASSANDRA-15452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths, Local/Compaction
>Reporter: Jon Haddad
>Assignee: Jordan West
>Priority: Normal
> Attachments: everyfs.txt, results.txt, sequential.fio
>
>
> On read heavy workloads Cassandra performs much better when using a low read 
> ahead setting.   In my tests I've seen an 5x improvement in throughput and 
> more than a 50% reduction in latency.  However, I've also observed that it 
> can have a negative impact on compaction and streaming throughput. It 
> especially negatively impacts cloud environments where small reads incur high 
> costs in IOPS due to tiny requests.
>  # We should investigate using POSIX_FADV_DONTNEED on files we're compacting 
> to see if we can improve performance and reduce page faults. 
>  # This should be combined with an internal read ahead style buffer that 
> Cassandra manages, similar to a BufferedInputStream but with our own 
> machinery.  This buffer should read fairly large blocks of data off disk at 
> at time.  EBS, for example, allows 1 IOP to be up to 256KB.  A considerable 
> amount of time is spent in blocking I/O during compaction and streaming. 
> Reducing the frequency we read from disk should speed up all sequential I/O 
> operations.
>  # We can reduce system calls by buffering writes as well, but I think it 
> will have less of an impact than the reads



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15452) Improve disk access patterns during compaction and streaming

2024-03-05 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad reassigned CASSANDRA-15452:
--

Assignee: Jordan West

> Improve disk access patterns during compaction and streaming
> 
>
> Key: CASSANDRA-15452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths, Local/Compaction
>Reporter: Jon Haddad
>Assignee: Jordan West
>Priority: Normal
> Attachments: everyfs.txt, results.txt, sequential.fio
>
>
> On read heavy workloads Cassandra performs much better when using a low read 
> ahead setting.   In my tests I've seen an 5x improvement in throughput and 
> more than a 50% reduction in latency.  However, I've also observed that it 
> can have a negative impact on compaction and streaming throughput. It 
> especially negatively impacts cloud environments where small reads incur high 
> costs in IOPS due to tiny requests.
>  # We should investigate using POSIX_FADV_DONTNEED on files we're compacting 
> to see if we can improve performance and reduce page faults. 
>  # This should be combined with an internal read ahead style buffer that 
> Cassandra manages, similar to a BufferedInputStream but with our own 
> machinery.  This buffer should read fairly large blocks of data off disk at 
> at time.  EBS, for example, allows 1 IOP to be up to 256KB.  A considerable 
> amount of time is spent in blocking I/O during compaction and streaming. 
> Reducing the frequency we read from disk should speed up all sequential I/O 
> operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19454) Revert switch to approximate time in Dispatcher to avoid mixing with nanoTime() in downstream timeout calculations

2024-03-05 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823701#comment-17823701
 ] 

Caleb Rackliffe commented on CASSANDRA-19454:
-

[~arkn98] I think a "bar" of not causing regressions in the existing test suite 
(which obviously includes tests from CASSANDRA-15241) is acceptable here. Once 
I have a chance to look at the PR, I can kick off a CI run myself (unless you 
have a paid CircleCI account and want to go that route).

> Revert switch to approximate time in Dispatcher to avoid mixing with 
> nanoTime() in downstream timeout calculations
> --
>
> Key: CASSANDRA-19454
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19454
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Caleb Rackliffe
>Assignee: Arun Ganesh
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CASSANDRA-15241 changed {{Dispatcher}} to use the {{approxTime}} 
> implementation of {{MonotonicClock}} rather than {{nanoTime()}}, but clock 
> drift between the two, can potentially cause queries to time out more 
> quickly. We should be able to revert the {{Dispatcher}} to use {{nanoTime()}} 
> again and similarly change {{QueriesTable} to {{nanoTime()}} as well for 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18661) Update cassandra-stress to use Apache Commons CLI

2024-03-05 Thread Brad Schoening (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823687#comment-17823687
 ] 

Brad Schoening commented on CASSANDRA-18661:


[~claude] [~smiklosovic] it's great to hear your update on the success you've 
had with this.  Stefan raises an important point about how to make this 
unifying change. There is so much legacy baggage in cassandra-stress I think 
the change is very much warranted, but we may need to keep a 
cassandra-stress-old, create a cassandra-stress-new or something and should be 
discussed on the ML.

> Update cassandra-stress to use Apache Commons CLI
> -
>
> Key: CASSANDRA-18661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/stress
>Reporter: Brad Schoening
>Assignee: Claude Warren
>Priority: Normal
>  Labels: lhf
>
> The Apache Commons CLI library provides an API for parsing command line 
> options with the package org.apache.commons.cli and this is already used by a 
> dozen of existing Cassandra utilities including:
> {quote}SSTableMetadataViewer, StandaloneScrubber, StandaloneSplitter, 
> SSTableExport, BulkLoader, and others.
> {quote}
> However, cassandra-stress is an outlier which uses its own custom classes to 
> parse command line options with classes such as OptionsSimple.  In addition, 
> the options syntax for username, password, and others are not aligned with 
> the format used by CQLSH.
> Currently, there are > 5K lines of code in 'settings' which appears to just 
> process command line args.
> This suggestion is to:
>  
> a) Upgrade cassandra-stress to use Apache Commons CLI (no new dependencies 
> are required as this library is already used by the project)
>  
> b) Align the cassandra-stress CLI options with those in CQLSH, 
>  
> {quote}For example, using the new syntax like CQLSH:
> {quote}
>  
> cassandra-stress -username foo -password bar
> {quote}and replacing the old syntax:
> {quote}
> cassandra-stress -mode username=foo and password=bar
>  
> This will simplify and unify the code base, eliminate code and reduce the 
> confusion between similar named classes such as 
> org.apache.cassandra.stress.settings.\{Option, OptionsMulti, OptionsSimple} 
> and org.apache.commons.cli.{Option, OptionGroup, Options)
>  
> Note: documentation will need to be updated as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18661) Update cassandra-stress to use Apache Commons CLI

2024-03-05 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823683#comment-17823683
 ] 

Stefan Miklosovic edited comment on CASSANDRA-18661 at 3/5/24 3:46 PM:
---

While I definitely appreciate the effort in this ticket to make it on par with 
other CLI tools, I would bring this to ML to see what broader audience thinks 
about this. There is a ton of legacy online with all options, all the docs etc 
so I wonder if we are not making more harm than good (even with very good 
intentions). Maybe supporting the old and the new way at the same time would be 
nice to have? Not sure how that would look like, I am just trying to figure out 
how to minimize the disruption.


was (Author: smiklosovic):
While I definitely appreciate the effort in this ticket to make it on par with 
other CLI tools, I would bring this to ML to see what broader audience thinks 
about this. There is a ton of legacy online with all options, all the docs etc 
so I wonder if we are not making more harm than good (even with very good 
intentions). Maybe supporting the old and the new way at the same time would be 
nice to have? Not sure how that would look like, I am just trying to figure out 
how to be at least disruptive towards users as possible. 

> Update cassandra-stress to use Apache Commons CLI
> -
>
> Key: CASSANDRA-18661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/stress
>Reporter: Brad Schoening
>Assignee: Claude Warren
>Priority: Normal
>  Labels: lhf
>
> The Apache Commons CLI library provides an API for parsing command line 
> options with the package org.apache.commons.cli and this is already used by a 
> dozen of existing Cassandra utilities including:
> {quote}SSTableMetadataViewer, StandaloneScrubber, StandaloneSplitter, 
> SSTableExport, BulkLoader, and others.
> {quote}
> However, cassandra-stress is an outlier which uses its own custom classes to 
> parse command line options with classes such as OptionsSimple.  In addition, 
> the options syntax for username, password, and others are not aligned with 
> the format used by CQLSH.
> Currently, there are > 5K lines of code in 'settings' which appears to just 
> process command line args.
> This suggestion is to:
>  
> a) Upgrade cassandra-stress to use Apache Commons CLI (no new dependencies 
> are required as this library is already used by the project)
>  
> b) Align the cassandra-stress CLI options with those in CQLSH, 
>  
> {quote}For example, using the new syntax like CQLSH:
> {quote}
>  
> cassandra-stress -username foo -password bar
> {quote}and replacing the old syntax:
> {quote}
> cassandra-stress -mode username=foo and password=bar
>  
> This will simplify and unify the code base, eliminate code and reduce the 
> confusion between similar named classes such as 
> org.apache.cassandra.stress.settings.\{Option, OptionsMulti, OptionsSimple} 
> and org.apache.commons.cli.{Option, OptionGroup, Options)
>  
> Note: documentation will need to be updated as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18661) Update cassandra-stress to use Apache Commons CLI

2024-03-05 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823683#comment-17823683
 ] 

Stefan Miklosovic commented on CASSANDRA-18661:
---

While I definitely appreciate the effort in this ticket to make it on par with 
other CLI tools, I would bring this to ML to see what broader audience thinks 
about this. There is a ton of legacy online with all options, all the docs etc 
so I wonder if we are not making more harm than good (even with very good 
intentions). Maybe supporting the old and the new way at the same time would be 
nice to have? Not sure how that would look like, I am just trying to figure out 
how to be at least disruptive towards users as possible. 

> Update cassandra-stress to use Apache Commons CLI
> -
>
> Key: CASSANDRA-18661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/stress
>Reporter: Brad Schoening
>Assignee: Claude Warren
>Priority: Normal
>  Labels: lhf
>
> The Apache Commons CLI library provides an API for parsing command line 
> options with the package org.apache.commons.cli and this is already used by a 
> dozen of existing Cassandra utilities including:
> {quote}SSTableMetadataViewer, StandaloneScrubber, StandaloneSplitter, 
> SSTableExport, BulkLoader, and others.
> {quote}
> However, cassandra-stress is an outlier which uses its own custom classes to 
> parse command line options with classes such as OptionsSimple.  In addition, 
> the options syntax for username, password, and others are not aligned with 
> the format used by CQLSH.
> Currently, there are > 5K lines of code in 'settings' which appears to just 
> process command line args.
> This suggestion is to:
>  
> a) Upgrade cassandra-stress to use Apache Commons CLI (no new dependencies 
> are required as this library is already used by the project)
>  
> b) Align the cassandra-stress CLI options with those in CQLSH, 
>  
> {quote}For example, using the new syntax like CQLSH:
> {quote}
>  
> cassandra-stress -username foo -password bar
> {quote}and replacing the old syntax:
> {quote}
> cassandra-stress -mode username=foo and password=bar
>  
> This will simplify and unify the code base, eliminate code and reduce the 
> confusion between similar named classes such as 
> org.apache.cassandra.stress.settings.\{Option, OptionsMulti, OptionsSimple} 
> and org.apache.commons.cli.{Option, OptionGroup, Options)
>  
> Note: documentation will need to be updated as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18661) Update cassandra-stress to use Apache Commons CLI

2024-03-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823674#comment-17823674
 ] 

Brandon Williams commented on CASSANDRA-18661:
--

I *think* that's a vestige of the old daemon mode for stress that was removed 
for security concerns in CASSANDRA-17535.

> Update cassandra-stress to use Apache Commons CLI
> -
>
> Key: CASSANDRA-18661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/stress
>Reporter: Brad Schoening
>Assignee: Claude Warren
>Priority: Normal
>  Labels: lhf
>
> The Apache Commons CLI library provides an API for parsing command line 
> options with the package org.apache.commons.cli and this is already used by a 
> dozen of existing Cassandra utilities including:
> {quote}SSTableMetadataViewer, StandaloneScrubber, StandaloneSplitter, 
> SSTableExport, BulkLoader, and others.
> {quote}
> However, cassandra-stress is an outlier which uses its own custom classes to 
> parse command line options with classes such as OptionsSimple.  In addition, 
> the options syntax for username, password, and others are not aligned with 
> the format used by CQLSH.
> Currently, there are > 5K lines of code in 'settings' which appears to just 
> process command line args.
> This suggestion is to:
>  
> a) Upgrade cassandra-stress to use Apache Commons CLI (no new dependencies 
> are required as this library is already used by the project)
>  
> b) Align the cassandra-stress CLI options with those in CQLSH, 
>  
> {quote}For example, using the new syntax like CQLSH:
> {quote}
>  
> cassandra-stress -username foo -password bar
> {quote}and replacing the old syntax:
> {quote}
> cassandra-stress -mode username=foo and password=bar
>  
> This will simplify and unify the code base, eliminate code and reduce the 
> confusion between similar named classes such as 
> org.apache.cassandra.stress.settings.\{Option, OptionsMulti, OptionsSimple} 
> and org.apache.commons.cli.{Option, OptionGroup, Options)
>  
> Note: documentation will need to be updated as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18661) Update cassandra-stress to use Apache Commons CLI

2024-03-05 Thread Claude Warren (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823670#comment-17823670
 ] 

Claude Warren commented on CASSANDRA-18661:
---

[~bschoeni]

I have managed to get Stress to use the commons-cli code (after adding some 
more functionality to commons-cli).  So we will have to wait for commons-cli 
1.7.0 to be  released.

However, there is a requirement in the code for the StressSettings to be 
serializable.  Is this an old Thrift requirement and can it be removed?  As I 
recall serialization is fraught with security issues, though this is only a 
test tool.  I just don't see how it would be used in the tool.  Any suggestions?

> Update cassandra-stress to use Apache Commons CLI
> -
>
> Key: CASSANDRA-18661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/stress
>Reporter: Brad Schoening
>Assignee: Claude Warren
>Priority: Normal
>  Labels: lhf
>
> The Apache Commons CLI library provides an API for parsing command line 
> options with the package org.apache.commons.cli and this is already used by a 
> dozen of existing Cassandra utilities including:
> {quote}SSTableMetadataViewer, StandaloneScrubber, StandaloneSplitter, 
> SSTableExport, BulkLoader, and others.
> {quote}
> However, cassandra-stress is an outlier which uses its own custom classes to 
> parse command line options with classes such as OptionsSimple.  In addition, 
> the options syntax for username, password, and others are not aligned with 
> the format used by CQLSH.
> Currently, there are > 5K lines of code in 'settings' which appears to just 
> process command line args.
> This suggestion is to:
>  
> a) Upgrade cassandra-stress to use Apache Commons CLI (no new dependencies 
> are required as this library is already used by the project)
>  
> b) Align the cassandra-stress CLI options with those in CQLSH, 
>  
> {quote}For example, using the new syntax like CQLSH:
> {quote}
>  
> cassandra-stress -username foo -password bar
> {quote}and replacing the old syntax:
> {quote}
> cassandra-stress -mode username=foo and password=bar
>  
> This will simplify and unify the code base, eliminate code and reduce the 
> confusion between similar named classes such as 
> org.apache.cassandra.stress.settings.\{Option, OptionsMulti, OptionsSimple} 
> and org.apache.commons.cli.{Option, OptionGroup, Options)
>  
> Note: documentation will need to be updated as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-03-05 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19398:
-
Status: Ready to Commit  (was: Review In Progress)

Great job, +1

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-03-05 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19398:
-
Reviewers: Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Status: Review In Progress  (was: Patch Available)

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-03-05 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823622#comment-17823622
 ] 

Berenguer Blasi commented on CASSANDRA-19398:
-

Added trunk which turned out green as well.

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-03-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823587#comment-17823587
 ] 

Brandon Williams edited comment on CASSANDRA-19398 at 3/5/24 1:23 PM:
--

This looks good to me if you want to start on trunk.  As you say you've only 
made it more deterministic so I don't think there should be any problem with 
the approach.


was (Author: brandon.williams):
This looks good to me if you want to start on trunk.  As you say you've only 
made it more deterministic so I don't there should be any problem with the 
approach.

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19391) Flush metadata snapshot table on every write

2024-03-05 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823600#comment-17823600
 ] 

Marcus Eriksson commented on CASSANDRA-19391:
-

attaching ci results for this + 19390, rebased on fairly current trunk

> Flush metadata snapshot table on every write
> 
>
> Key: CASSANDRA-19391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19391
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> We depend on the latest snapshot when starting up, flushing avoids gaps 
> between latest snapshot and the most recent local log entry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19390) Transformation.Kind should contain an explicit integer id

2024-03-05 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823599#comment-17823599
 ] 

Marcus Eriksson commented on CASSANDRA-19390:
-

attaching ci results for this + 19391, rebased on fairly current trunk

> Transformation.Kind should contain an explicit integer id
> -
>
> Key: CASSANDRA-19390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19390
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19391) Flush metadata snapshot table on every write

2024-03-05 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-19391:

Attachment: (was: ci_summary.html)

> Flush metadata snapshot table on every write
> 
>
> Key: CASSANDRA-19391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19391
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> We depend on the latest snapshot when starting up, flushing avoids gaps 
> between latest snapshot and the most recent local log entry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19391) Flush metadata snapshot table on every write

2024-03-05 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-19391:

Attachment: ci_summary.html
result_details.tar.gz

> Flush metadata snapshot table on every write
> 
>
> Key: CASSANDRA-19391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19391
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> We depend on the latest snapshot when starting up, flushing avoids gaps 
> between latest snapshot and the most recent local log entry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19391) Flush metadata snapshot table on every write

2024-03-05 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-19391:

Attachment: (was: result_details.tar.gz)

> Flush metadata snapshot table on every write
> 
>
> Key: CASSANDRA-19391
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19391
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> We depend on the latest snapshot when starting up, flushing avoids gaps 
> between latest snapshot and the most recent local log entry



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19390) Transformation.Kind should contain an explicit integer id

2024-03-05 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-19390:

Attachment: (was: ci_summary.html)

> Transformation.Kind should contain an explicit integer id
> -
>
> Key: CASSANDRA-19390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19390
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19390) Transformation.Kind should contain an explicit integer id

2024-03-05 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-19390:

Attachment: ci_summary.html
result_details.tar.gz

> Transformation.Kind should contain an explicit integer id
> -
>
> Key: CASSANDRA-19390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19390
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19390) Transformation.Kind should contain an explicit integer id

2024-03-05 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-19390:

Attachment: (was: result_details.tar.gz)

> Transformation.Kind should contain an explicit integer id
> -
>
> Key: CASSANDRA-19390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19390
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15452) Improve disk access patterns during compaction and streaming

2024-03-05 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823391#comment-17823391
 ] 

Jon Haddad edited comment on CASSANDRA-15452 at 3/5/24 12:43 PM:
-

 

I took another look at this.  This lets us extract every read operation against 
a single data file:
{noformat}
awk '$4 == "R" { print $0 }' everyfs.txt | grep '30-bti-Data.db' > 
30-bti-data.txt{noformat}
If you glance at the end of the data, the last entry is this:
{noformat}
23:47:12 CompactionExec 44651  R 2699    12483       0.00 
da-30-bti-Data.db{noformat}
{-}The data file is only 15KB{-}.  But we're doing over 6 thousand reads
{noformat}
wc -l ../research/30-bti-data.txt
    6420 ../research/30-bti-data.txt{noformat}
The 5th column is the number of bytes read.  Summing this:
{noformat}
awk '{ sum += $5; } END {print sum}' ../research/30-bti-data.txt
25571844{noformat}
= 25MB

-which is a lot to pull through the filesystem when in an optimal situation we 
would have done a single 16KB read.-

Since these numbers are really, really weird, I'm going back through and 
verifying there's not a bug in the tools, or my understanding of them.

Edit: I just realized the offset is expressed in KB, not bytes, my math was 
off.  I'm going to redo this test as I lost the instance.  I'm now trying to 
figure out if we're double reading. The last offset is at 12MB, and each read 
is recorded 2x.


was (Author: rustyrazorblade):
 

I took another look at this.  This lets us extract every read operation against 
a single data file:
{noformat}
awk '$4 == "R" { print $0 }' everyfs.txt | grep '30-bti-Data.db' > 
30-bti-data.txt{noformat}
If you glance at the end of the data, the last entry is this:
{noformat}
23:47:12 CompactionExec 44651  R 2699    12483       0.00 
da-30-bti-Data.db{noformat}
The data file is only 15KB.  But we're doing over 6 thousand reads
{noformat}
wc -l ../research/30-bti-data.txt
    6420 ../research/30-bti-data.txt{noformat}
The 5th column is the number of bytes read.  Summing this:
{noformat}
awk '{ sum += $5; } END {print sum}' ../research/30-bti-data.txt
25571844{noformat}
= 25MB

which is a lot to pull through the filesystem when in an optimal situation we 
would have done a single 16KB read.

Since these numbers are really, really weird, I'm going back through and 
verifying there's not a bug in the tools, or my understanding of them.

> Improve disk access patterns during compaction and streaming
> 
>
> Key: CASSANDRA-15452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15452
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths, Local/Compaction
>Reporter: Jon Haddad
>Priority: Normal
> Attachments: everyfs.txt, results.txt, sequential.fio
>
>
> On read heavy workloads Cassandra performs much better when using a low read 
> ahead setting.   In my tests I've seen an 5x improvement in throughput and 
> more than a 50% reduction in latency.  However, I've also observed that it 
> can have a negative impact on compaction and streaming throughput. It 
> especially negatively impacts cloud environments where small reads incur high 
> costs in IOPS due to tiny requests.
>  # We should investigate using POSIX_FADV_DONTNEED on files we're compacting 
> to see if we can improve performance and reduce page faults. 
>  # This should be combined with an internal read ahead style buffer that 
> Cassandra manages, similar to a BufferedInputStream but with our own 
> machinery.  This buffer should read fairly large blocks of data off disk at 
> at time.  EBS, for example, allows 1 IOP to be up to 256KB.  A considerable 
> amount of time is spent in blocking I/O during compaction and streaming. 
> Reducing the frequency we read from disk should speed up all sequential I/O 
> operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-03-05 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19398:
-
Reviewers: Brandon Williams

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-03-05 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823587#comment-17823587
 ] 

Brandon Williams commented on CASSANDRA-19398:
--

This looks good to me if you want to start on trunk.  As you say you've only 
made it more deterministic so I don't there should be any problem with the 
approach.

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-03-05 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823568#comment-17823568
 ] 

Berenguer Blasi commented on CASSANDRA-19398:
-

[~brandon.williams] I stole this one from you as agreed. I have submitted 
byteman latches to make the test's behavior more deterministic. I _think_ that 
is correct unless the original authors say otherwise.

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-03-05 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi reassigned CASSANDRA-19398:
---

Assignee: Berenguer Blasi  (was: Brandon Williams)

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-03-05 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-19398:

Test and Documentation Plan: See PR
 Status: Patch Available  (was: Open)

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-builds) branch trunk updated: ninja-fix – temporarily disabled arm building in cassandra-builds/jenkins-dsl/cassandra_job_dsl_seed.groovy

2024-03-05 Thread mck
This is an automated email from the ASF dual-hosted git repository.

mck pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git


The following commit(s) were added to refs/heads/trunk by this push:
 new f253806  ninja-fix – temporarily disabled arm building in 
cassandra-builds/jenkins-dsl/cassandra_job_dsl_seed.groovy
f253806 is described below

commit f2538069436c0e2a35c087671a5b11d85fecef70
Author: Mick Semb Wever 
AuthorDate: Tue Mar 5 09:42:42 2024 +0100

ninja-fix – temporarily disabled arm building in 
cassandra-builds/jenkins-dsl/cassandra_job_dsl_seed.groovy

 ref: CASSANDRA-19241 – Upgrade ci-cassandra.a.o agents to Ubuntu 22.04.3
---
 jenkins-dsl/cassandra_job_dsl_seed.groovy | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/jenkins-dsl/cassandra_job_dsl_seed.groovy 
b/jenkins-dsl/cassandra_job_dsl_seed.groovy
index fb46360..9251db9 100755
--- a/jenkins-dsl/cassandra_job_dsl_seed.groovy
+++ b/jenkins-dsl/cassandra_job_dsl_seed.groovy
@@ -19,7 +19,7 @@ def jobDescription = '''
 
 // architectures. blank is amd64
 def archs = ['', '-arm64']
-arm64_enabled = true
+arm64_enabled = false // TODO waiting on CASSANDRA-19241
 arm64_test_label_enabled = false
 def use_arm64_test_label() { return arm64_enabled && arm64_test_label_enabled }
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18934) Downgrade to 4.1 fails due to schema changes

2024-03-05 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823490#comment-17823490
 ] 

Maxwell Guo commented on CASSANDRA-18934:
-

hi [~claude], what about through slack ? 

> Downgrade to 4.1 fails due to schema changes
> 
>
> Key: CASSANDRA-18934
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18934
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown
>Reporter: David Capwell
>Assignee: Maxwell Guo
>Priority: Normal
> Fix For: 5.x
>
>
> We are required to support 5.0 downgrading to 4.1 as a migration step, but we 
> don’t have tests to show this is working… I wrote a quick test to make sure a 
> change we needed in Accord wouldn’t block the downgrade and see that we fail 
> right now.
> {code}
> ERROR 20:56:39 Exiting due to error while processing commit log during 
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>  Unexpected error deserializing mutation; saved to 
> /var/folders/h1/s_3p1x3s3hl0hltbpck67m0hgn/T/mutation418421767150092dat.
>   This may be caused by replaying a mutation against a table with the same 
> name but incompatible schema.  Exception follows: java.lang.RuntimeException: 
> Unknown column compaction_properties during deserialization
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:464)
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:397)
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:244)
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:147)
>   at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:191)
>   at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:223)
>   at 
> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:204)
> {code}
> This was caused by a schema change in CASSANDRA-18061
> {code}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one
>  * or more contributor license agreements.  See the NOTICE file
>  * distributed with this work for additional information
>  * regarding copyright ownership.  The ASF licenses this file
>  * to you under the Apache License, Version 2.0 (the
>  * "License"); you may not use this file except in compliance
>  * with the License.  You may obtain a copy of the License at
>  *
>  * http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.cassandra.distributed.upgrade;
> import java.io.IOException;
> import java.io.File;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.junit.Test;
> import org.apache.cassandra.distributed.api.IUpgradeableInstance;
> public class DowngradeTest extends UpgradeTestBase
> {
> @Test
> public void test() throws Throwable
> {
> AtomicBoolean first = new AtomicBoolean(true);
> new TestCase()
> .nodes(1)
> .withConfig(c -> {
> if (first.compareAndSet(true, false))
> c.set("storage_compatibility_mode", "CASSANDRA_4");
> })
> .downgradeTo(v41)
> .setup(cluster -> {})
> // Uncomment if you want to test what happens after reading the commit log, 
> which fails right now
> //.runBeforeNodeRestart((cluster, nodeId) -> {
> //IUpgradeableInstance inst = cluster.get(nodeId);
> //File f = new File((String) 
> inst.config().get("commitlog_directory"));
> //deleteRecursive(f);
> //})
> .runAfterClusterUpgrade(cluster -> {})
> .run();
> }
> private void deleteRecursive(File f)
> {
> if (f.isDirectory())
> {
> File[] children = f.listFiles();
> if (children != null)
> {
> for (File c : children)
> deleteRecursive(c);
> }
> }
> f.delete();
> }
> }
> {code}
> {code}
> diff --git 
> a/test/distributed/org/apache/cassandra/distributed/upgrade/UpgradeTestBase.java
>  
> b/test/distributed/org/apache/cassandra/distributed/upgrade/UpgradeTestBase.java
> index 5ee8780204..b4111e3b44 100644
> --- 
> a/test/distributed/org/apache/cassandra/distributed/upgrade/UpgradeTestBase.jav