Re: java driver with cassandra proxies (option: -Dcassandra.join_ring=false)

2023-10-12 Thread Jeff Jirsa
Just to be clear: - How many of the proxy nodes are you providing as contact points? One of them or all of them? It sounds like you're saying you're passing all of them, and only one is connecting, and the driver is declining to connect to the rest because they're not in system.peers. I'm not

Re: java driver with cassandra proxies (option: -Dcassandra.join_ring=false)

2023-10-12 Thread Regis Le Bretonnic
We have tested Stargate and were very disappointed... Originally our architecture was PHP microservices (with FPM) + cassandra proxies. But we were blocked because PHP driver is no more supported. We made tests to keep PHP + stargate but there were many issues, the main one (but not the only

Re: java driver with cassandra proxies (option: -Dcassandra.join_ring=false)

2023-10-12 Thread Erick Ramirez
Those nodes are not in the peers table(s) because you told them NOT to join the ring with `join_ring=false` so it is working by design. I'm not really sure what you're trying to achieve but if you want to separate the coordinator functions from the storage then what you probably want is to deploy

Re: java driver with cassandra proxies (option: -Dcassandra.join_ring=false)

2023-10-12 Thread Bowen Song via user
I'm not 100% sure, but it's worth trying to disable the token metadata , because the driver needs to read the "system.peers_v2" table for populating the token metadata. On 11/10/2023 19:15,

java driver with cassandra proxies (option: -Dcassandra.join_ring=false)

2023-10-11 Thread Regis Le Bretonnic
Hi (also posted in dev mailing list but not sure I can publish on it), We use datastax cassandra java driver v4.15.0 and we want to limit connexion only to Cassandra proxy nodes (Nodes with no data started with option: -Dcassandra.join_ring=false). For that: - we configured the driver to have

Re: Restricting data access at column and/or row level

2023-10-04 Thread Nitan Kainth
Thanks Stefan.What is tentative release time?Regards,NitanCell: 510 449 9629On Oct 3, 2023, at 7:26 PM, Miklosovic, Stefan wrote:Hi,columns can be restricted per user by Dynamic Data Masking (will be in

Re: Restricting data access at column and/or row level

2023-10-03 Thread Miklosovic, Stefan
Hi, columns can be restricted per user by Dynamic Data Masking (will be in 5.0). https://cassandra.apache.org/doc/trunk/cassandra/developing/cql/dynamic_data_masking.html I am not sure about specific rows. To my knowledge I do not think that is possible.

Restricting data access at column and/or row level

2023-10-03 Thread Nitan Kainth
Hi Team, I have a requirement to grant select privileges on a table to some user restricting few columns and few rows. Something like this: c1 | c2 | c3 | c4 +++ 5 | 1 | 5 | 1 10 | 1 | 1 | 1 1 | 1 | 1 | 1 8 | 1 | 1 | 1 2 | 1 | 1 | 1 4 | 1 | 1 | 1 7

[DISCUSS] disk_access_mode setting on cassandra.yaml

2023-09-30 Thread Paulo Motta
Hi, On the dev@ mailing list I proposed updating the default value of the advanced property "disk_access_mode" to a more stable default for typical workloads. See the discussion on [1] for details. I wanted to check if anyone had experiences (good or bad) with overriding this setting in the

Re: Upcoming Town Hall

2023-09-28 Thread Melissa Logan
Hi folks: The Cassandra 5.0 alpha preview was released a few weeks ago. In our virtual Town Hall today at 8:00 AM PT, Aaron Ploetz will walk us through new features including ACID transactions, SAI, Trie memtables, vector search, and others. See you soon! Details and how to join:

Cass- fqldump

2023-09-21 Thread Akshith Mull
Hi , Im getting following error while running fqltool dump to view the file geratted from fqltool, Let me know anyone has any inputs on this issue. fqltool dump /tmp/cassandrafullquerylog WARNING: package jdk.internal.util.jar not in java.base error: module java.base does not open

Upcoming Town Hall

2023-09-21 Thread Hugh Lashbrooke
Hi everyone, The Apache Cassandra September Town Hall is coming up next week on September 29 at 8 am PT. This month's meeting will feature a talk from a name that is sure to be very familiar to most of you! Aaron Ploetz will be going through the features that everyone is looking forward to in

Re: [HELP] Cassandra 4.1.1 Repeated Bootstrapping Failure

2023-09-11 Thread Bowen Song via user
Hi Scott, Thank you for pointing this out. I found it too, but I deemed it to be irrelevant because the following reasons: * it was fixed in 4.1.1, as you have correctly pointed out; and * the error message is slightly different, "writevAddresses" vs "writeAddress"; and * it actually

Re: [HELP] Cassandra 4.1.1 Repeated Bootstrapping Failure

2023-09-11 Thread C. Scott Andreas
Bowen, thanks for reaching out.My mind immediately jumped to a ticket which has very similar pathology: "CASSANDRA-18110: Streaming progress virtual table lock contention can trigger TCP_USER_TIMEOUT and fail streaming" -- but I see this was fixed in 4.1.1.On Sep 11, 2023, at 2:09 PM, Bowen

[HELP] Cassandra 4.1.1 Repeated Bootstrapping Failure

2023-09-11 Thread Bowen Song via user
*Description* When adding a new node to an existing cluster, the new node bootstrapping fails with the "io.netty.channel.unix.Errors$NativeIoException: writeAddress(..) failed: Connection timed out" error from the streaming source node. Resuming the bootstrap with "nodetool bootstrap

[RELEASE] Apache Cassandra 5.0-alpha1 released

2023-09-08 Thread Mick Semb Wever
The Cassandra team is pleased to announce the release of Apache Cassandra version 5.0-alpha1. DISCLAIMER, this alpha release does not contain the expected 5.0 features: Vector Search (CEP-30), Transactional Cluster Metadata (CEP-21) and Accord Transactions (CEP-15). These features will land in a

Issue with pagination using paging state

2023-09-08 Thread Ritesh Kumar
Hello Team,I am trying to achieve pagination in Cassandra using paging state mechanism.I am trying to paginate through records with LIMIT set to 250 and fetch size set to 50. I get the paging state for the next set of 50 records up until the 250 records are retrieved but how can I paginate further

Re: How the write path finds the N nodes to write to?

2023-08-30 Thread Abe Ratnofsky
> if the replication factor is 3 it just picks the other two nodes following > the ring clockwise. The coordinator for a given mutation is not necessarily a replica (depending on whether token-aware routing is used by the client) so it may have to forward to RF remote nodes, then wait for the

How the write path finds the N nodes to write to?

2023-08-30 Thread Gabriel Giussi
I know cassandra uses consistent hashing for choosing the node where a key should go to, and if I understand correctly from this image https://cassandra.apache.org/doc/latest/cassandra/_images/ring.svg if the replication factor is 3 it just picks the other two nodes following the ring clockwise. I

Re: Startup errors - 4.1.3

2023-08-30 Thread Jeff Jirsa
There are at least two bugs in the compaction lifecycle transaction log - one that can drop an ABORT / ADD in the wrong order (and prevent startup), and one that allows for invalid timestamps in the log file (and again, prevent startups). I believe it's safe to work around the former by removing

Startup errors - 4.1.3

2023-08-30 Thread Joe Obernberger
Hi all - I replaced a node in a 14 node cluster, and it rebuilt OK.  I started to see a lot of timeout errors, and discovered one of the nodes had this message constantly repeated: "waiting to acquire a permit to begin streaming" - so perhaps I hit this bug:

Re: Registration open for Community Over Code North America

2023-08-29 Thread Arya Goudarzi
unsubscribe On Mon, Aug 28, 2023 at 12:54 PM Rich Bowen wrote: > Hello! Registration is still open for the upcoming Community Over Code > NA event in Halifax, NS! We invite you to register for the event > https://communityovercode.org/registration/ > > Apache Committers, note that you have a

Registration open for Community Over Code North America

2023-08-28 Thread Rich Bowen
Hello! Registration is still open for the upcoming Community Over Code NA event in Halifax, NS! We invite you to register for the event https://communityovercode.org/registration/ Apache Committers, note that you have a special discounted rate for the conference at US$250. To take advantage of

[DISCUSSION] Removal of metrics-reporter-config in the next 5.0 release. Feedback needed

2023-08-28 Thread Maxim Muzafarov
Dear Cassandra users, Some feedback is needed from your side, as the change itself is related to the way the internal metrics can be exported from the Cassandra node. If you are not using the metrics-reporter-config [1] and exporting metrics to the physical file provided by it, no action is

Apache Cassandra Contributor Meeting this week

2023-08-27 Thread Hugh Lashbrooke
This month's Apache Cassandra Contributor Meeting is coming up on Tuesday, 29 August @ 10am PT. You'll find more details about the Contributor Meetings and how to join the Zoom call here: https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting This week we'll be

Re: Cassandra p95 latencies

2023-08-25 Thread Andrew Weaver
Do you have the SSTables per read metric for before and after you increased the key cache size? If it was high before, that may have been the culprit meaning compaction tuning is in order. On Fri, Aug 25, 2023, 12:35 PM Shaurya Gupta wrote: > Thanks everyone. > Updating this thread - > We

Re: Testing Cassandra connectivity at application startup

2023-08-25 Thread Andrew Weaver
For a readiness probe and for ongoing ECV checks, just making sure the driver is initialized is enough. I've seen problems recently with applications running "select cluster_name from system.local" for ECV checks. We haven't dug into it in detail yet but with a large number of clients it puts a

Re: Testing Cassandra connectivity at application startup

2023-08-25 Thread Raphael Mazelier
That's a good way to it! On 25/08/2023 20:10, Shaurya Gupta wrote: > We don't plan to open a new connection. It should use the same connection(s) > which the application will use. > > On Fri, Aug 25, 2023 at 10:59 AM Raphael Mazelier wrote: > >> Mind that a new connection is really costly for

Re: Testing Cassandra connectivity at application startup

2023-08-25 Thread Shaurya Gupta
We don't plan to open a new connection. It should use the same connection(s) which the application will use. On Fri, Aug 25, 2023 at 10:59 AM Raphael Mazelier wrote: > Mind that a new connection is really costly for C*. > So at startup it's fine. but not in a liveness or readiness check imo. >

Re: Testing Cassandra connectivity at application startup

2023-08-25 Thread C. Scott Andreas
“select * from …” without a predicate from a user table would be very expensive, yes. A query from a small, node-local system table such as “select * from system.peers” would make a better health check.  - Scott > On Aug 25, 2023, at 10:58 AM, Raphael Mazelier wrote: > >  > Mind that a

Re: Testing Cassandra connectivity at application startup

2023-08-25 Thread Raphael Mazelier
Mind that a new connection is really costly for C*. So at startup it's fine. but not in a liveness or readiness check imo. For the query why not select 1; ? -- Raphael Mazelier On 25/08/2023 19:38, Shaurya Gupta wrote: > Hi community > > We want to validate cassandra connectivity from the

Testing Cassandra connectivity at application startup

2023-08-25 Thread Shaurya Gupta
Hi community We want to validate cassandra connectivity from the application container when it starts up and before it reports as healthy to K8s. Is doing > select * from our_keyspace.table limit 1 fine Or is it an inefficient query and should not be fired on a prod cluster ? Any other

Re: Cassandra p95 latencies

2023-08-25 Thread Shaurya Gupta
Thanks everyone. Updating this thread - We increased the key cache size from 100 MB to 200 MB and we believe that has brought down the latency from 40 ms p95 to 6 ms p95. I think there is still scope for improvement as both writes and reads are presently at p95 6 ms. I would expect writes to be

Town Hall update

2023-08-22 Thread Hugh Lashbrooke
Due to a last-minute change, we have had to cancel this month's Apache Cassandra Town Hall which was going to feature Josh McKenzie speaking about the state of the Cassandra development community. We are working on finalising the speaker for September's Town Hall and will have the details up on

Re: Cassandra 5 & Support for JDK 17

2023-08-22 Thread Aaron Ploetz
Hello Sean, Cassandra 5.0 is indeed intended to run on Java 17. In fact, I just tried running 5.0-alpha , and it does indeed run on Java 17.0.8. There is a Jira ticket out there for tracking Java 17 issues with 5.0: CASSANDRA-16895

Cassandra 5 & Support for JDK 17

2023-08-22 Thread McIntyre, Sean
Hello. I have several applications that I am looking to move to JDK 17 and want to understand the plans for Cassandra to support this version of Java. I’ve read that version 5 may well support JDK17 and that it may be available by the end of this year. Are these firm plans? Are there

Re: Big Data Question

2023-08-21 Thread Jeff Jirsa
(Yes, just somewhat less likely to be the same order of speed-up in STCS where sstables are more likely to cross token boundaries, modulo some stuff around sstable splitting at token ranges a la 6696) On Mon, Aug 21, 2023 at 11:35 AM Dinesh Joshi wrote: > Minor correction, zero copy streaming

Re: Big Data Question

2023-08-21 Thread Dinesh Joshi
Minor correction, zero copy streaming aka faster streaming also works for STCS.DineshOn Aug 21, 2023, at 8:01 AM, Jeff Jirsa wrote:There's a lot of questionable advice scattered in this thread. Set aside most of the guidance like 2TB/node, it's old and super nuanced.If you're bare metal, do what

Re: Big Data Question

2023-08-21 Thread daemeon reiydelle
- k8s 1. depending on the version and networking, number of containers per node, nodepooling, etc. you can expect to see 1-2% additional storage IO latency (depends on whether all are on the same network vs. separate storage IO TCP network) 2. System overhead may be 3-15% depending

Re: Big Data Question

2023-08-21 Thread Patrick McFadin
...and a shameless plug for the Cassandra Summit in December. We have a talk from somebody that is doing 70TB per node and will be digging into all the aspects that make that work for them. I hope everyone in this thread is at that talk! I can't wait to hear all the questions. Patrick On Mon,

Re: Big Data Question

2023-08-21 Thread Jeff Jirsa
There's a lot of questionable advice scattered in this thread. Set aside most of the guidance like 2TB/node, it's old and super nuanced. If you're bare metal, do what your organization is good at. If you have millions of dollars in SAN equipment and you know how SANs work and fail and get backed

Re: Big Data Question

2023-08-21 Thread Joe Obernberger
For our scenario, the goal is to minimize down-time for a single (at least initially) data center system.  Data-loss is basically unacceptable.  I wouldn't say we have a "rusty slow data center" - we can certainly use SSDs and have servers connected via 10G copper to a fast back-plane.  For

[RELEASE] Apache Cassandra 3.11.16 released

2023-08-20 Thread Miklosovic, Stefan
The Cassandra team is pleased to announce the release of Apache Cassandra version 3.11.16. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. https://cassandra.apache.org/ Downloads of

RE: Big Data Question

2023-08-18 Thread Durity, Sean R via user
Cost of availability is a fair question at some level of the discussion. In my experience, high availability is one of the top 2 or 3 reasons why Cassandra is chosen as the data solution. So, if I am given a Cassandra use case to build out, I would normally assume high availability is needed,

Re: Materialized View inconsistency issue

2023-08-18 Thread Miklosovic, Stefan
Well you could always do it like this cqlsh> CREATE TABLE dating.visits2 (user_id int, visitor_id int, visit_month int, visit_date int, primary key (user_id, visitor_id, visit_month)) WITH CLUSTERING ORDER BY (visitor_id ASC, visit_month DESC ); This means that if you have, clearly, 6 months,

Re: Materialized View inconsistency issue

2023-08-18 Thread Regis Le Bretonnic
What you propose is another debate  Most of the time there are a product department and a tech department (I'm sure it is your case at netapp)... I'd like to have a voice loud enough to influence product requirements but it is not the way it works. I'm paid to make miracles and not to explain to

Re: Materialized View inconsistency issue

2023-08-18 Thread Miklosovic, Stefan
The 2 tables you propose Stefan can not natively order rows by time (they will be ordered by visitor_id), excepted if you sort rows after the select. So what? I think this is way better than dealing with MV which you will get inconsistent eventually. Do you want to have broken MV or you want to

Re: Materialized View inconsistency issue

2023-08-18 Thread Regis Le Bretonnic
Hi Stefan Happy to see that our use case interest you :-) I'm not sure that I explained well what we want. Imagine that sequence of events : - Julia visits Joe at t1 - Julia visits Joe at t2 - Karen visits Joe at t3 - Silvia visits Joe at t4 - Karen visits Joe at t5 - Karen visits Joe at t6 -

Re: Big Data Question

2023-08-17 Thread daemeon reiydelle
I started to respond, then realized I and the other OP posters are not thinking the same: What is the business case for availability, data los/reload/recoverability? You all argue for higher availability and damn the cost. But noone asked "can you lose access, for 20 minutes, to a portion of the

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
Was assuming reaper did incremental?  That was probably a bad assumption. nodetool repair -pr I know it well now! :) -Joe On 8/17/2023 4:47 PM, Bowen Song via user wrote: I don't have experience with Cassandra on Kubernetes, so I can't comment on that. For repairs, may I interest you with

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
I don't have experience with Cassandra on Kubernetes, so I can't comment on that. For repairs, may I interest you with incremental repairs? It will make repairs hell of a lot faster. Of course, occasional full repair is still needed, but that's another story. On 17/08/2023 21:36, Joe

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
Thank you.  Enjoying this conversation. Agree on blade servers, where each blade has a small number of SSDs.  Yeh/Nah to a kubernetes approach assuming fast persistent storage?  I think that might be easier to manage. In my current benchmarks, the performance is excellent, but the repairs

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
From my experience, that's not entirely true. For large nodes, the bottleneck is usually the JVM garbage collector. The the GC pauses can easily get out of control on very large heaps, and long STW pauses may also result in nodes flip up and down from other nodes' perspective, which often

Re: Big Data Question

2023-08-17 Thread daemeon reiydelle
A lot of (actually all) seem to be based on local nodes with 1gb networks of spinning rust. Much of what is mentioned below is TOTALLY wrong for cloud. So clarify whether you are "real world" or rusty slow data center world (definitely not modern DC either). E.g. should not handle more than 2tb

Re: Materialized View inconsistency issue

2023-08-17 Thread Miklosovic, Stefan
Why can't you do it like this? You would have two tables: create table visits (user_id bigint, visitor_id bigint, visit_date timestamp, primary key ((user_id, visitor_id), visit_date)) order by visit_date desc create table visitors_by_user_id (user_id bigint, visitor_id bigint, primary key

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
The optimal node size largely depends on the table schema and read/write pattern. In some cases 500 GB per node is too large, but in some other cases 10TB per node works totally fine. It's hard to estimate that without benchmarking. Again, just pointing out the obvious, you did not count the

RE: Big Data Question

2023-08-17 Thread Durity, Sean R via user
For a variety of reasons, we have clusters with 5 TB of disk per host as a “standard.” In our larger data clusters, it does take longer to add/remove nodes or do things like upgradesstables after an upgrade. These nodes have 3+TB of actual data on the drive. But, we were able to shrink the node

Re: Big Data Question

2023-08-17 Thread C. Scott Andreas
A few thoughts on this:– 80TB per machine is pretty dense. Consider the amount of data you'd need to re-replicate in the event of a hardware failure that takes down all 80TB (DIMM failure requiring replacement, non-reduntant PSU failure, NIC, etc).– 24GB of heap is also pretty generous.

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
Thanks for this - yeah - duh - forgot about replication in my example! So - is 2TBytes per Cassandra instance advisable?  Better to use more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so assume 80Tbytes per server, you could do: (1024*3)/80 = 39 servers, but you'd have to run

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
Just pointing out the obvious, for 1PB of data on nodes with 2TB disk each, you will need far more than 500 nodes. 1, it is unwise to run Cassandra with replication factor 1. It usually makes sense to use RF=3, so 1PB data will cost 3PB of storage space, minimal of 1500 such nodes. 2,

Re: 2 nodes marked as '?N' in 5 node cluster

2023-08-17 Thread Bowen Song via user
The first thing to look is the logs, specifically, the /var/log/cassandra/system.log file on each node. 5 seconds time drift is enough to cause Cassandra to fail. You should ensure the time difference between Cassandra nodes is very low by ensure time sync is working correctly, otherwise

Re: Open File Descriptors not cleared post upgrade from 3.11.9 to 4.0.5.

2023-08-16 Thread vaibhav khedkar
Thank you patrick. We have plans for upgrades anyway so keep this issue in mind and probably expedite it. I have updated and created a bug https://issues.apache.org/jira/browse/CASSANDRA-18770 in case you are interested. Thanks vaibhav On Wed, Aug 16, 2023 at 1:34 PM Patrick Lee wrote: > I

Re: Open File Descriptors not cleared post upgrade from 3.11.9 to 4.0.5.

2023-08-16 Thread Patrick Lee
I don’t have a ticket.  What I saw in a scenario was a cluster that was upgraded from 3.11 to 4.0.X, we added another ring that was running Java 11.  Nodes on the ring with Java 8 saw this issue you described while the other ring running Java 11 did not.  Then if I updated from Java 8 to Java 11 I

Re: Unsubscribe

2023-08-16 Thread C. Scott Andreas
Hi Mark,You can unsubscribe from this mailing list by sending a blank email to "user-unsubscr...@cassandra.apache.org" from the address that is subscribed to the list. Other members of the list are not able to take this action on someone's behalf.Details on how to join and leave lists are

Unsubscribe

2023-08-16 Thread Mark Furlong
Please unsubscribe from this list Thanks Mark Furlong Sr. Database Administrator mfurl...@ancestry.com M: 801-859-7427 O: 801-705-7115 1300 W Traverse Pkwy Lehi, UT 84043 [http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif] We empower journeys of personal

Re: Open File Descriptors not cleared post upgrade from 3.11.9 to 4.0.5.

2023-08-16 Thread vaibhav khedkar
Thanks Patrick, We do have plans to upgrade to *java 11* eventually but we will go through internal testing and would also need some time given the size of our infrastructure. Is it safe to assume that the issue exists in the combination of upgrades from 3.11.x to 4.0.x *and* running on JAVA 8

Re: Open File Descriptors not cleared post upgrade from 3.11.9 to 4.0.5.

2023-08-16 Thread Patrick Lee
I've actually noticed this as well on a few clusters I deal with but after upgrading Cassandra from 3.11 to 4 we also changed to use Java 11 shortly after the cluster upgrade. After I moved to Java 11 I have not experienced a problem. On Wed, Aug 16, 2023 at 12:12 PM vaibhav khedkar wrote: >

Re: Open File Descriptors not cleared post upgrade from 3.11.9 to 4.0.5.

2023-08-16 Thread vaibhav khedkar
Thank you Scott We are seeing it for all the tables (Filter, Data ..etc ) /nb-1-big-Statistics.db.tmp (deleted) /nb-3-big-Statistics.db.tmp (deleted) /nb-2-big-Data.db (deleted) /nb-2-big-Statistics.db.tmp (deleted) /nb-2-big-Index.db (deleted) /nb-2-big-Statistics.db.tmp (deleted)

Re: Open File Descriptors not cleared post upgrade from 3.11.9 to 4.0.5.

2023-08-16 Thread C. Scott Andreas
Vaibhav, thank you for reaching out and sharing this issue report.Could you run an `lsof` and share which SSTable files you see open (e.g., all SSTable components or a subset of them); and also share the value of the `disk_access_mode` property from your cassandra.yaml?Opening a Jira ticket

Open File Descriptors not cleared post upgrade from 3.11.9 to 4.0.5.

2023-08-16 Thread vaibhav khedkar
Hi everyone, We recently upgraded our fleet of ~2500 Cassandra instances from 3.11.9 to 4.0.5. After the upgrade, we are seeing a unique issue where the compacted SSTables's file descriptors are still present and are never cleared. This is causing false disk alerts. We have to restart nodes very

Re: Big Data Question

2023-08-16 Thread Jeff Jirsa
A lot of things depend on actual cluster config - compaction settings (LCS vs STCS vs TWCS) and token allocation (single token, vnodes, etc) matter a ton. With 4.0 and LCS, streaming for replacement is MUCH faster, so much so that most people should be fine with 4-8TB/node, because the rebuild

Big Data Question

2023-08-16 Thread Joe Obernberger
General question on how to configure Cassandra.  Say I have 1PByte of data to store.  The general rule of thumb is that each node (or at least instance of Cassandra) shouldn't handle more than 2TBytes of disk.  That means 500 instances of Cassandra. Assuming you have very fast persistent

Re: Materialized View inconsistency issue

2023-08-15 Thread Regis Le Bretonnic
Hi Josh... A long (and almost private) message to explain how we fix materialized views. Let me first explain our use case... I work for an european dating website. Users can received visits from other users (typically when someone looks at a member profile page), and we want to inform them for

Re: Running mixed 4.0 and 4.1 clusters

2023-08-14 Thread scott
Running mixed versions of Cassandra 3.x and 4.x is supported in the same cluster for the purpose of live upgrades, though certain features (such as repair) are not supported while in a mixed-version state. All of 3.0, 3.x, 4.0, and 4.1 can coexist for the purpose of upgrades. 4.0 and 4.1 are

Re: Cassandra p95 latencies

2023-08-14 Thread Elliott Sims via user
1. Check for Nagle/delayed-ack, but probably nodelay is getting set by the driver so it shouldn't be a problem. 2. Check for network latency (just regular old ping among hosts, during traffic) 3. Check your GC metrics and see if garbage collections line up with outliers. Some tuning can help

Running mixed 4.0 and 4.1 clusters

2023-08-14 Thread Doug Whitfield via user
Hi folks, I know it is impossible to run 3.x and 4.x nodes in the same cluster. Is it possible to run 4.0 and 4.1 nodes together? Is it a good idea? Doug Whitfield This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the

Re: Cassandra p95 latencies

2023-08-14 Thread Josh McKenzie
> The queries are rightly designed Data modeling in Cassandra is 100% gray space; there unfortunately is no right or wrong design. You'll need to share basic shapes / contours of your data model for other folks to help you; seemingly innocuous things in a data model can cause unexpected issues

Re: Materialized View inconsistency issue

2023-08-14 Thread Josh McKenzie
When it comes to denormalization in Cassandra today your options are to either do it yourself in your application layer or rely on Materialized Views to do it for you at the server layer. Neither are production-ready approaches out of the box (which is one of the biggest flaws in the "provide

Re: Cassandra FQL question-

2023-08-11 Thread Akshith Mull
And also we have Cassandra 4.0.5. [cqlsh 6.0.0 | Cassandra 4.0.5 | CQL spec 3.4.5 | Native protocol v5] Is this FQL & fqltool dump - will work on 4.0.x also right ? > On Aug 9, 2023, at 9:18 AM, Akshith Mull wrote: > > > These are the class path settings , do we need to add anything here

Re: Cassandra p95 latencies

2023-08-11 Thread Jeff Jirsa
You’re going to have to help us help you 4.0 is pretty widely deployed. I’m not aware of a perf regression Can you give us a schema (anonymized) and queries and show us a trace ? On Aug 10, 2023, at 10:18 PM, Shaurya Gupta wrote:The queries are rightly designed as I already explained. 40 ms is

Re: Repair errors

2023-08-11 Thread Surbhi Gupta
Try sstablescrub on the node where it is showing corrupted data. On Fri, Aug 11, 2023 at 8:38 AM Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Finally found a message on another node that seem relevant: > > INFO [CompactionExecutor:7413] 2023-08-11 11:36:22,397 >

Re: Repair errors

2023-08-11 Thread Joe Obernberger
Finally found a message on another node that seem relevant: INFO  [CompactionExecutor:7413] 2023-08-11 11:36:22,397 CompactionTask.java:164 - Compacting (d30b64ba-385c-11ee-8e74-edf5512ad115)

RE: Cassandra p95 latencies

2023-08-11 Thread Durity, Sean R via user
I would expect single digit ms latency on reads and writes. However, we have not done any performance testing on Apache Cassandra 4.x. Sean R. Durity INTERNAL USE From: Shaurya Gupta Sent: Friday, August 11, 2023 1:16 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Cassandra p95

Re: Cassandra p95 latencies

2023-08-10 Thread Shaurya Gupta
The queries are rightly designed as I already explained. 40 ms is way too high as compared to what I seen with other DBs and many a times with Cassandra 3.x versions. CPU consumed as I mentioned is not high, it is around 20%. On Thu, Aug 10, 2023 at 5:14 PM MyWorld wrote: > Hi, > P95 should not

Re: Cassandra p95 latencies

2023-08-10 Thread Abe Ratnofsky
40ms is definitely higher than expected. Have you run your queries with TRACING enabled to see where the latency is coming from?https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlshTracing.html40ms is also a fairly specific duration: https://eklitzke.org/the-caveats-of-tcp-nodelay> On

Re: Upgrade from 3.11.5 to 4.1.x

2023-08-10 Thread MyWorld
You can check in your lower environment. On Fri, 11 Aug, 2023, 06:25 Surbhi Gupta, wrote: > Thanks, > > I am looking to to upgrade to 4.1.x . > Please advise. > > Thanks > Surbhi > > On Thu, Aug 10, 2023 at 5:39 PM MyWorld wrote: > >> Though it's recommended to upgrade to latest version of

Re: Upgrade from 3.11.5 to 4.1.x

2023-08-10 Thread Surbhi Gupta
Thanks, I am looking to to upgrade to 4.1.x . Please advise. Thanks Surbhi On Thu, Aug 10, 2023 at 5:39 PM MyWorld wrote: > Though it's recommended to upgrade to latest version of 3.11.x and then to > ver 4 but even upgrading directly won't be a problem. Just check the > release notes. > >

Re: Upgrade from 3.11.5 to 4.1.x

2023-08-10 Thread MyWorld
Though it's recommended to upgrade to latest version of 3.11.x and then to ver 4 but even upgrading directly won't be a problem. Just check the release notes. However for production, I would recommend to go for 4.0.x latest stable version. Regards Ashish On Sat, 8 Jul, 2023, 05:44 Surbhi

Re: Materialized View inconsistency issue

2023-08-10 Thread MyWorld
Hi surbhi , There are 2 drawbacks associated with MV. 1. Inconsistent view 2. The lock it takes on the base table. This gets worse when you have huge number of clustering keys in a specific partition. It's better you re-design a seperate table and let your API do a parallel write on both.

Re: Cassandra p95 latencies

2023-08-10 Thread MyWorld
Hi, P95 should not be a problem if rightly designed. Levelled compaction strategy further reduces this, however it consume some resources. For read, caching is also helpful. Can you check your cpu iowait as it could be the reason for delay Regards, Ashish On Fri, 11 Aug, 2023, 04:58 Shaurya

Cassandra p95 latencies

2023-08-10 Thread Shaurya Gupta
Hi community What is the expected P95 latency for Cassandra Read and Write queries executed with Local_Quorum over a table with 3 replicas ? The queries are done using the partition + clustering key and row size in bytes is not too much, maybe 1-2 KB maximum. Assuming CPU is not a crunch ? We

Re: Materialized View inconsistency issue

2023-08-10 Thread Surbhi Gupta
Thanks everyone. On Wed, 9 Aug 2023 at 01:00, Regis Le Bretonnic wrote: > > Hi Surbhi > > We do use cassandra materialized views even if not recommended. > There are known issues you have to make with. Despite of them, we still use > VM. > What we observe is : > * there are inconsistency

Re: Cassandra FQL question-

2023-08-09 Thread Akshith Mull
These are the class path settings , do we need to add anything here or is it wrongly set? Any solution to fix this will be greatly appreciated, cat /usr/bin/fqltool | grep CLASSPATH if [ -z "$CASSANDRA_CONF" -o -z "$CLASSPATH" ]; then echo "You must set the CASSANDRA_CONF and CLASSPATH

Re: Cassandra FQL question-

2023-08-09 Thread Akshith Mull
Okay , thanks , let me try . Seems like the tool is exists as root user , but not able to run it. :/usr/bin$ ls -ltra | grep fqltool -rwxr-xr-x 1 root root 2388 May 6 2022 fqltool /tmp/cassandrafullquerylog$ fqltool Error: Could not find or load main class

Re: Materialized View inconsistency issue

2023-08-09 Thread Regis Le Bretonnic
Hi Surbhi We do use cassandra materialized views even if not recommended. There are known issues you have to make with. Despite of them, we still use VM. What we observe is : * there are inconsistency issues but few. Most of them are rows that should not exist in the MV... * we made a spark

Re: Materialized View inconsistency issue

2023-08-08 Thread C. Scott Andreas
That’s correct, yes. There is no current or upcoming version of Apache Cassandra in which materialized views are expected to be considered production-ready and maintain full consistency with their base table at this time.The feature is classified as “experimental” to indicate that this behavior is

Re: Materialized View inconsistency issue

2023-08-08 Thread manish khandelwal
>From 4.0.11's cassandra.yaml *## EXPERIMENTAL FEATURES ### Enables materialized view creation on this node.# Materialized views are considered experimental and are not recommended for production use.enable_materialized_views: false* So I

Materialized View inconsistency issue

2023-08-08 Thread Surbhi Gupta
Hi, We get complaints about Materialized View inconsistency issues. We are on 3.11.5 and on 3.11.5 Materialized Views were not production ready. We are ok to upgrade. On which version of cassandra MVs doesnt have inconsistency issues? Thanks Surbhi

Re: Cassandra FQL question-

2023-08-08 Thread Miklosovic, Stefan
Hey, I did same steps from extracted tarball of 4.1.3 and it just works. It also works when I install Debian package. The error you get seems like the class path is not set correctly so it can not load it. You can probably debug this by wrapping last line in fqltool in "echo" to see what is

Cassandra FQL question-

2023-08-08 Thread Akshith Mull
Hi All, We have cassandra 4.x version, I have enabled FQL as per the documentation- https://cassandra.apache.org/doc/latest/cassandra/operating/fqllogging.html nodetool enablefullquerylog --path /tmp/cassandrafullquerylog Files generated. -rw-r--r-- 1131072 Aug 8 16:22 metadata.cq4t

<    1   2   3   4   5   6   7   8   9   10   >