Re: Materialized View's additional PrimaryKey column
Thank you again for Clear information Jon! i give up 珞 Android’de Yahoo Postadan gönderildi 0:53’’26e’ 26 Tem 2019 Cum tarihinde, Jon Haddad şunu yazdı: The issues I have with MVs aren't related to how they aren't correctly synchronized, although I'm not happy about that either. My issue with them are in every cluster I've seen that uses them, the cluster has been unstable, and I've put a lot of time into helping teams undo them. You will almost certainly have several hours or days of downtime as a result of using them. There's a good reason they're marked as experimental (and disabled by default). You should maintain the other tables yourself. Jon On Thu, Jul 25, 2019 at 12:22 AM mehmet bursali wrote: Hi Jon, thanks for your suggestion (or warning :) ). yes, i've read sth. about your point and i know that just because of using MVs, there are really several issues open in JIRA on bootstrapping, compaction and incremental repair stuff but, after reading almost all jira tickets (with comments and history) related to using MVs, AFAU all that issues come out by either loosing syncronization between base table and MV by deleting columns or rows values on base table or having a huge system that has large and dynamic number of nodes/data/workloads. We use 3.11.3 version and most of the critical issues were fixed on 3.10 but of course I might be miss sth so i 'll be glad if you point me some specific jira ticket. We have a certain use case that require updates on filtering (clustering) columns.Our motivation for using MV was avoiding updates (delete + create) on primaryKey columns because we suppose that cassandra developers can manage this unpreferred operation better then us. I'm really confused now. On Wednesday, July 24, 2019, 11:30:15 PM GMT+3, Jon Haddad wrote: I really, really advise against using MVs. I've had to help a number of teams move off them. Not sure what list of bugs you read, but if the list didn't include "will destabilize your cluster to the point of constant downtime" then the list was incomplete. Jon On Wed, Jul 24, 2019 at 6:32 AM mehmet bursali wrote: + additional info: our production environment is a multiDC cluster that consist of 6 nodes in 2 DataCenters On Wednesday, July 24, 2019, 3:35:11 PM GMT+3, mehmet bursali wrote: Hi Cassandra folks,I'm planning to use Materialized View (MV) on production for some specific cases. I've read a lot of blogs, technical documents about the risks of using it and everything seems ok for our use case. My question is about consistency(also durability) evaluation of MV usage with an additional primary key column. İn one of our case, we select an UDT column of base table as addtional primary key column on MV. (UDT possible values are non nullable and restricted with domain.) . After inserting a record in base table, this additonal column (MVs primary key column) value also will be updated for 1 or 2 time. So in our case, for each update operation that will be occured on base table there are going to be delete and create operations inside MV. Does it matter from consistency(also durability) perspective that using additional primary key column whether as partition column or clustering column?
Re: Performance impact with ALLOW FILTERING clause.
If you're thinking about rewriting your data to be more performant when doing analytics, you might as well go the distance and put it in an analytics friendly format like Parquet. My 2 cents. On Thu, Jul 25, 2019 at 11:01 AM ZAIDI, ASAD A wrote: > Thank you all for your insights. > > > > When spark-connector adds allows filtering to a query, it makes the query > to just ‘run’ no matter if it is expensive for larger table OR not so > expensive for table with fewer rows. > > In my particular case, nodes are reaching 2TB/per node load in 50 node > cluster. When bunch of such queries run , causes impact on server > resources. > > > > Since allow filtering is an expensive operation - I’m trying find knobs > which if I turn, mitigate the impact. > > > > What I think , correct me if I am wrong , is – it is query design itself > which is not optimized per table design - that in turn causing connector > to add allow filtering implicitly. I’m not thinking to add secondary > indexes on tables because they’ve their own overheads. kindly share if > there are other means which we can use to influence connector not to use > allow filtering. > > > > Thanks again. > > Asad > > > > > > > > *From:* Jeff Jirsa [mailto:jji...@gmail.com] > *Sent:* Thursday, July 25, 2019 10:24 AM > *To:* cassandra > *Subject:* Re: Performance impact with ALLOW FILTERING clause. > > > > "unpredictable" is such a loaded word. It's quite predictable, but it's > often mispredicted by users. > > > > "ALLOW FILTERING" basically tells the database you're going to do a query > that will require scanning a bunch of data to return some subset of it, and > you're not able to provide a WHERE clause that's sufficiently fine grained > to avoid the scan. It's a loose equivalent of doing a full table scan in > SQL databases - sometimes it's a valid use case, but it's expensive, you're > ignoring all of the indexes, and you're going to do a lot more work. > > > > It's predictable, though - you're probably going to walk over some range > of data. Spark is grabbing all of the data to load into RDDs, and it > probably does it by slicing up the range, doing a bunch of range scans. > > > > It's doing that so it can get ALL of the data and do the filtering / > joining / searching in-memory in spark, rather than relying on cassandra to > do the scanning/searching on disk. > > > > On Thu, Jul 25, 2019 at 6:49 AM ZAIDI, ASAD A wrote: > > Hello Folks, > > > > I was going thru documentation and saw at many places saying ALLOW > FILTERING causes performance unpredictability. Our developers says ALLOW > FILTERING clause is implicitly added on bunch of queries by spark-Cassandra > connector and they cannot control it; however at the same time we see > unpredictability in application performance – just as documentation says. > > > > I’m trying to understand why would a connector add a clause in query when > this can cause negative impact on database/application performance. Is that > data model that is driving connector make its decision and add allow > filtering to query automatically or if there are other reason this clause > is added to the code. I’m not a developer though I want to know why > developer don’t have any control on this to happen. > > > > I’ll appreciate your guidance here. > > > > Thanks > > Asad > > > > > >
Re: Materialized View's additional PrimaryKey column
The issues I have with MVs aren't related to how they aren't correctly synchronized, although I'm not happy about that either. My issue with them are in every cluster I've seen that uses them, the cluster has been unstable, and I've put a lot of time into helping teams undo them. You will almost certainly have several hours or days of downtime as a result of using them. There's a good reason they're marked as experimental (and disabled by default). You should maintain the other tables yourself. Jon On Thu, Jul 25, 2019 at 12:22 AM mehmet bursali wrote: > Hi Jon, thanks for your suggestion (or warning :) ). > yes, i've read sth. about your point and i know that just because of > using MVs, there are really several issues open in JIRA on bootstrapping, > compaction and incremental repair stuff but, after reading almost all > jira tickets (with comments and history) related to using MVs, AFAU all > that issues come out by either loosing syncronization between base table > and MV by deleting columns or rows values on base table or having a huge > system that has large and dynamic number of nodes/data/workloads. We use > 3.11.3 version and most of the critical issues were fixed on 3.10 but of > course I might be miss sth so i 'll be glad if you point me some specific > jira ticket. > We have a certain use case that require updates on filtering (clustering) > columns.Our motivation for using MV was avoiding updates (delete + > create) on primaryKey columns because we suppose that cassandra developers > can manage this unpreferred operation better then us. I'm really confused > now. > > > > On Wednesday, July 24, 2019, 11:30:15 PM GMT+3, Jon Haddad < > j...@jonhaddad.com> wrote: > > > I really, really advise against using MVs. I've had to help a number of > teams move off them. Not sure what list of bugs you read, but if the list > didn't include "will destabilize your cluster to the point of constant > downtime" then the list was incomplete. > > Jon > > On Wed, Jul 24, 2019 at 6:32 AM mehmet bursali > wrote: > > + additional info: our production environment is a multiDC cluster that > consist of 6 nodes in 2 DataCenters > > > > > On Wednesday, July 24, 2019, 3:35:11 PM GMT+3, mehmet bursali > wrote: > > > Hi Cassandra folks, > I'm planning to use Materialized View (MV) on production for some specific > cases. I've read a lot of blogs, technical documents about the risks of > using it and everything seems ok for our use case. > My question is about consistency(also durability) evaluation of MV usage > with an additional primary key column. İn one of our case, we select an > UDT column of base table as addtional primary key column on MV. (UDT > possible values are non nullable and restricted with domain.) . After > inserting a record in base table, this additonal column (MVs primary key > column) > value also will be updated for 1 or 2 time. So in our case, for each > update operation that will be occured on base table there are going to be > delete and create operations inside MV. > Does it matter from consistency(also durability) perspective that using > additional primary key column whether as partition column or clustering > column? > >
RE: Performance impact with ALLOW FILTERING clause.
Thank you all for your insights. When spark-connector adds allows filtering to a query, it makes the query to just ‘run’ no matter if it is expensive for larger table OR not so expensive for table with fewer rows. In my particular case, nodes are reaching 2TB/per node load in 50 node cluster. When bunch of such queries run , causes impact on server resources. Since allow filtering is an expensive operation - I’m trying find knobs which if I turn, mitigate the impact. What I think , correct me if I am wrong , is – it is query design itself which is not optimized per table design - that in turn causing connector to add allow filtering implicitly. I’m not thinking to add secondary indexes on tables because they’ve their own overheads. kindly share if there are other means which we can use to influence connector not to use allow filtering. Thanks again. Asad From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Thursday, July 25, 2019 10:24 AM To: cassandra Subject: Re: Performance impact with ALLOW FILTERING clause. "unpredictable" is such a loaded word. It's quite predictable, but it's often mispredicted by users. "ALLOW FILTERING" basically tells the database you're going to do a query that will require scanning a bunch of data to return some subset of it, and you're not able to provide a WHERE clause that's sufficiently fine grained to avoid the scan. It's a loose equivalent of doing a full table scan in SQL databases - sometimes it's a valid use case, but it's expensive, you're ignoring all of the indexes, and you're going to do a lot more work. It's predictable, though - you're probably going to walk over some range of data. Spark is grabbing all of the data to load into RDDs, and it probably does it by slicing up the range, doing a bunch of range scans. It's doing that so it can get ALL of the data and do the filtering / joining / searching in-memory in spark, rather than relying on cassandra to do the scanning/searching on disk. On Thu, Jul 25, 2019 at 6:49 AM ZAIDI, ASAD A mailto:az1...@att.com>> wrote: Hello Folks, I was going thru documentation and saw at many places saying ALLOW FILTERING causes performance unpredictability. Our developers says ALLOW FILTERING clause is implicitly added on bunch of queries by spark-Cassandra connector and they cannot control it; however at the same time we see unpredictability in application performance – just as documentation says. I’m trying to understand why would a connector add a clause in query when this can cause negative impact on database/application performance. Is that data model that is driving connector make its decision and add allow filtering to query automatically or if there are other reason this clause is added to the code. I’m not a developer though I want to know why developer don’t have any control on this to happen. I’ll appreciate your guidance here. Thanks Asad
Re: Dropped mutations
Thanks Jeff, does internal mean local node operations - in this case mutation response from local node and cross node means the time it took to get response back from other nodes depending on the consistency level choosen? On Thu, Jul 25, 2019 at 11:51 AM Jeff Jirsa wrote: > This means your database is seeing commands that have already timed out by > the time it goes to execute them, so it ignores them and gives up instead > of working on work items that have already expired. > > The first log line shows 5 second latencies, the second line 6s and 8s > latencies, which sounds like either really bad disks or really bad JVM GC > pauses. > > > On Thu, Jul 25, 2019 at 8:45 AM Ayub M wrote: > >> Hello, how do I read dropped mutations error messages - whats internal >> and cross node? For mutations it fails on cross-node and read_repair/read >> it fails on internal. What does it mean? >> >> INFO [ScheduledTasks:1] 2019-07-21 11:44:46,150 >> MessagingService.java:1281 - MUTATION messages were dropped in last 5000 >> ms: 0 internal and 65 cross node. Mean internal dropped latency: 0 ms and >> Mean cross-node dropped latency: 4966 ms >> INFO [ScheduledTasks:1] 2019-07-19 05:01:10,620 >> MessagingService.java:1281 - READ_REPAIR messages were dropped in last 5000 >> ms: 9 internal and 8 cross node. Mean internal dropped latency: 6013 ms and >> Mean cross-node dropped latency: 8164 ms >> >> -- >> >> Regards, >> Ayub >> > -- Regards, Ayub
Differing snitches in different datacenters
Quick and hopefully easy question for the list. Background is existing cluster (1 DC) will be migrated to AWS-hosted cluster via standing up a second datacenter, existing cluster will be subsequently decommissioned. We currently use GossipingPropertyFileSnitch and are thinking about using Ec2MultiRegionSnitch in the new AWS DC - that'd position us nicely if in the future we want to run a multi-DC cluster in AWS. My question is: are there any issues with one DC using GossipingPropertyFileSnitch and the other using Ec2MultiRegionSnitch? This setup would be temporary, existing until the new DC nodes have rebuilt and the old DC is decommissioned. Thanks, Voytek Jarnot
Re: Unable to integrate jmx_prometheus_javaagent
Nevermind, after hours of investigation, I found the solution myself just after having the mail sent to the list ... Even though some resources on the web highlight the importance to wrap what follows "-javaagent:" between "", this seems to be the issue; note that the log complains about it could not find: "/etc/cassandra/prometheus/jmx_prometheus_javaagent-0.12.0.jar Note the leading double-quote here ... Removing the quotes makes it work like a charm. Sorry for bothering! BR, Marc On 25.07.19 18:02, Marc Richter wrote: Hi everyone, I have an existing Cassandra node (3.7). Now, I'd like to be able to grab metrics from it for my Prometheus + Grafana based monitoring. I downloaded "jmx_prometheus_javaagent-0.12.0.jar" from [1], copied it to "/etc/cassandra/prometheus/jmx_prometheus_javaagent-0.12.0.jar". I also downloaded "cassandra.yml" from [2] and saved it to "/etc/cassandra/prometheus/jmx_prometheus_javaagent_cassandra.yml". Next, I appended the following to my cassandra-env.sh: ``` PROMETHEUS_AGENT='-javaagent:"/etc/cassandra/prometheus/jmx_prometheus_javaagent-0.12.0.jar=7070:/etc/cassandra/prometheus/jmx_prometheus_javaagent_cassandra.yml"' JVM_OPTS="$JVM_OPTS $PROMETHEUS_AGENT" ``` When I now try to start my Cassandra node, it fails and writes this to my logfile: ``` Error opening zip file or JAR manifest missing : "/etc/cassandra/prometheus/jmx_prometheus_javaagent-0.12.0.jar Error occurred during initialization of VM agent library failed to init: instrument ``` I'm using the official Cassandra Docker image [3], tag 3.7. I found the steps I did here in many online resources. I could not find any issue which matches what I'm facing. Does anybody have an idea? BR, Marc [1] https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar [2] https://raw.githubusercontent.com/prometheus/jmx_exporter/master/example_configs/cassandra.yml [3] https://hub.docker.com/_/cassandra - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Unable to integrate jmx_prometheus_javaagent
Hi everyone, I have an existing Cassandra node (3.7). Now, I'd like to be able to grab metrics from it for my Prometheus + Grafana based monitoring. I downloaded "jmx_prometheus_javaagent-0.12.0.jar" from [1], copied it to "/etc/cassandra/prometheus/jmx_prometheus_javaagent-0.12.0.jar". I also downloaded "cassandra.yml" from [2] and saved it to "/etc/cassandra/prometheus/jmx_prometheus_javaagent_cassandra.yml". Next, I appended the following to my cassandra-env.sh: ``` PROMETHEUS_AGENT='-javaagent:"/etc/cassandra/prometheus/jmx_prometheus_javaagent-0.12.0.jar=7070:/etc/cassandra/prometheus/jmx_prometheus_javaagent_cassandra.yml"' JVM_OPTS="$JVM_OPTS $PROMETHEUS_AGENT" ``` When I now try to start my Cassandra node, it fails and writes this to my logfile: ``` Error opening zip file or JAR manifest missing : "/etc/cassandra/prometheus/jmx_prometheus_javaagent-0.12.0.jar Error occurred during initialization of VM agent library failed to init: instrument ``` I'm using the official Cassandra Docker image [3], tag 3.7. I found the steps I did here in many online resources. I could not find any issue which matches what I'm facing. Does anybody have an idea? BR, Marc [1] https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar [2] https://raw.githubusercontent.com/prometheus/jmx_exporter/master/example_configs/cassandra.yml [3] https://hub.docker.com/_/cassandra - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Dropped mutations
Hello Jeff, Request you to help on how to visualise the terms 1. Internal mutations 2. Cross node mutations 3. Mean internal dropped latency 4. Cross node dropped latency Thanks, Rajsekhar On Thu, 25 Jul, 2019, 9:21 PM Jeff Jirsa, wrote: > This means your database is seeing commands that have already timed out by > the time it goes to execute them, so it ignores them and gives up instead > of working on work items that have already expired. > > The first log line shows 5 second latencies, the second line 6s and 8s > latencies, which sounds like either really bad disks or really bad JVM GC > pauses. > > > On Thu, Jul 25, 2019 at 8:45 AM Ayub M wrote: > >> Hello, how do I read dropped mutations error messages - whats internal >> and cross node? For mutations it fails on cross-node and read_repair/read >> it fails on internal. What does it mean? >> >> INFO [ScheduledTasks:1] 2019-07-21 11:44:46,150 >> MessagingService.java:1281 - MUTATION messages were dropped in last 5000 >> ms: 0 internal and 65 cross node. Mean internal dropped latency: 0 ms and >> Mean cross-node dropped latency: 4966 ms >> INFO [ScheduledTasks:1] 2019-07-19 05:01:10,620 >> MessagingService.java:1281 - READ_REPAIR messages were dropped in last 5000 >> ms: 9 internal and 8 cross node. Mean internal dropped latency: 6013 ms and >> Mean cross-node dropped latency: 8164 ms >> >> -- >> >> Regards, >> Ayub >> >
Re: Dropped mutations
This means your database is seeing commands that have already timed out by the time it goes to execute them, so it ignores them and gives up instead of working on work items that have already expired. The first log line shows 5 second latencies, the second line 6s and 8s latencies, which sounds like either really bad disks or really bad JVM GC pauses. On Thu, Jul 25, 2019 at 8:45 AM Ayub M wrote: > Hello, how do I read dropped mutations error messages - whats internal and > cross node? For mutations it fails on cross-node and read_repair/read it > fails on internal. What does it mean? > > INFO [ScheduledTasks:1] 2019-07-21 11:44:46,150 > MessagingService.java:1281 - MUTATION messages were dropped in last 5000 > ms: 0 internal and 65 cross node. Mean internal dropped latency: 0 ms and > Mean cross-node dropped latency: 4966 ms > INFO [ScheduledTasks:1] 2019-07-19 05:01:10,620 > MessagingService.java:1281 - READ_REPAIR messages were dropped in last 5000 > ms: 9 internal and 8 cross node. Mean internal dropped latency: 6013 ms and > Mean cross-node dropped latency: 8164 ms > > -- > > Regards, > Ayub >
Dropped mutations
Hello, how do I read dropped mutations error messages - whats internal and cross node? For mutations it fails on cross-node and read_repair/read it fails on internal. What does it mean? INFO [ScheduledTasks:1] 2019-07-21 11:44:46,150 MessagingService.java:1281 - MUTATION messages were dropped in last 5000 ms: 0 internal and 65 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 4966 ms INFO [ScheduledTasks:1] 2019-07-19 05:01:10,620 MessagingService.java:1281 - READ_REPAIR messages were dropped in last 5000 ms: 9 internal and 8 cross node. Mean internal dropped latency: 6013 ms and Mean cross-node dropped latency: 8164 ms -- Regards, Ayub
Re: Performance impact with ALLOW FILTERING clause.
"unpredictable" is such a loaded word. It's quite predictable, but it's often mispredicted by users. "ALLOW FILTERING" basically tells the database you're going to do a query that will require scanning a bunch of data to return some subset of it, and you're not able to provide a WHERE clause that's sufficiently fine grained to avoid the scan. It's a loose equivalent of doing a full table scan in SQL databases - sometimes it's a valid use case, but it's expensive, you're ignoring all of the indexes, and you're going to do a lot more work. It's predictable, though - you're probably going to walk over some range of data. Spark is grabbing all of the data to load into RDDs, and it probably does it by slicing up the range, doing a bunch of range scans. It's doing that so it can get ALL of the data and do the filtering / joining / searching in-memory in spark, rather than relying on cassandra to do the scanning/searching on disk. On Thu, Jul 25, 2019 at 6:49 AM ZAIDI, ASAD A wrote: > Hello Folks, > > > > I was going thru documentation and saw at many places saying ALLOW > FILTERING causes performance unpredictability. Our developers says ALLOW > FILTERING clause is implicitly added on bunch of queries by spark-Cassandra > connector and they cannot control it; however at the same time we see > unpredictability in application performance – just as documentation says. > > > > I’m trying to understand why would a connector add a clause in query when > this can cause negative impact on database/application performance. Is that > data model that is driving connector make its decision and add allow > filtering to query automatically or if there are other reason this clause > is added to the code. I’m not a developer though I want to know why > developer don’t have any control on this to happen. > > > > I’ll appreciate your guidance here. > > > > Thanks > > Asad > > > > >
Re: Performance impact with ALLOW FILTERING clause.
Hi Asad, That’s because of the way Spark works. Essentially, when you execute a Spark job, it pulls the full content of the datastore (Cassandra in your case) in it RDDs and works with it “in memory”. While Spark uses “data locality” to read data from the nodes that have the required data on its local disks, it’s still reading all data from Cassandra tables. To do so it’s sending ‘select * from Table ALLOW FILTERING’ query to Cassandra. From Spark you don’t have much control on the initial query to fill the RDDs, sometimes you’ll read the whole table even if you only need one row. Regards, Jacques-Henri Berthemet From: "ZAIDI, ASAD A" Reply to: "user@cassandra.apache.org" Date: Thursday 25 July 2019 at 15:49 To: "user@cassandra.apache.org" Subject: Performance impact with ALLOW FILTERING clause. Hello Folks, I was going thru documentation and saw at many places saying ALLOW FILTERING causes performance unpredictability. Our developers says ALLOW FILTERING clause is implicitly added on bunch of queries by spark-Cassandra connector and they cannot control it; however at the same time we see unpredictability in application performance – just as documentation says. I’m trying to understand why would a connector add a clause in query when this can cause negative impact on database/application performance. Is that data model that is driving connector make its decision and add allow filtering to query automatically or if there are other reason this clause is added to the code. I’m not a developer though I want to know why developer don’t have any control on this to happen. I’ll appreciate your guidance here. Thanks Asad
Performance impact with ALLOW FILTERING clause.
Hello Folks, I was going thru documentation and saw at many places saying ALLOW FILTERING causes performance unpredictability. Our developers says ALLOW FILTERING clause is implicitly added on bunch of queries by spark-Cassandra connector and they cannot control it; however at the same time we see unpredictability in application performance – just as documentation says. I’m trying to understand why would a connector add a clause in query when this can cause negative impact on database/application performance. Is that data model that is driving connector make its decision and add allow filtering to query automatically or if there are other reason this clause is added to the code. I’m not a developer though I want to know why developer don’t have any control on this to happen. I’ll appreciate your guidance here. Thanks Asad
Cassandra OutOfMemoryError
Hi I am using Apace Cassandra version : [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4] I am running a 5 node cluster and recently added one node to the cluster. Cluster is running with G1 GC garbage collector with 16GB -Xmx. Cluster is having one materialised view also; On the newly added node I got OutOfMemory Error. Heap Dump analysis shows below error: BatchlogTasks:1 at java.lang.OutOfMemoryError.()V (OutOfMemoryError.java:48) at java.util.HashMap.resize()[Ljava/util/HashMap$Node; (HashMap.java:704) at java.util.HashMap.putVal(ILjava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object; (HashMap.java:663) at java.util.HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (HashMap.java:612) at java.util.HashSet.add(Ljava/lang/Object;)Z (HashSet.java:220) at org.apache.cassandra.batchlog.BatchlogManager.finishAndClearBatches(Ljava/util/ArrayList;Ljava/util/Set;Ljava/util/Set;)V (BatchlogManager.java:281) at org.apache.cassandra.batchlog.BatchlogManager.processBatchlogEntries(Lorg/apache/cassandra/cql3/UntypedResultSet;ILcom/google/common/util/concurrent/RateLimiter;)V (BatchlogManager.java:261) at org.apache.cassandra.batchlog.BatchlogManager.replayFailedBatches()V (BatchlogManager.java:210) at org.apache.cassandra.batchlog.BatchlogManager$$Lambda$269.run()V (Unknown Source) at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()V (DebuggableScheduledThreadPoolExecutor.java:118) at java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (Executors.java:511) at java.util.concurrent.FutureTask.runAndReset()Z (FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)Z (ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V (ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:624) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(Ljava/lang/Runnable;)V (NamedThreadFactory.java:81) at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$4.run()V (Unknown Source) at java.lang.Thread.run()V (Thread.java:748) I have found system.bacthes file have huge data on this node. nodetool -u cassandra -pw cassandra tablestats system.batches -H Total number of tables: 65 Keyspace : system Read Count: 3990928 Read Latency: 0.07400208372589032 ms Write Count: 4898771 Write Latency: 0.012194797838069997 ms Pending Flushes: 0 Table: batches SSTable count: 5 Space used (live): 50.89 GiB Space used (total): 50.89 GiB Space used by snapshots (total): 0 bytes Off heap memory used (total): 1.05 GiB SSTable Compression Ratio: 0.38778672943000886 Number of partitions (estimate): 727971046 Memtable cell count: 12 Memtable data size: 918 bytes Memtable off heap memory used: 0 bytes Memtable switch count: 10 Local read count: 0 Local read latency: NaN ms Local write count: 618894 Local write latency: 0.010 ms Pending flushes: 0 Percent repaired: 0.0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 906.25 MiB Bloom filter off heap memory used: 906.25 MiB Index summary off heap memory used: 155.86 MiB Compression metadata off heap memory used: 10.6 MiB Compacted partition minimum bytes: 30 Compacted partition maximum bytes: 258 Compacted partition mean bytes: 136 Average live cells per slice (last five minutes): 149.0 Maximum live cells per slice (last five minutes): 149 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1 Dropped Mutations: 0 bytes *Can someone please help, what can be the issue ?* -- Raman Gugnani
Re: high write latency on a single table
awesome! so we can make a further investigation by using cassandra exporter on this link. https://github.com/criteo/cassandra_exporter This exporter gives detailed information for read/write operations on each column by using metrics below.. org:apache:cassandra:metrics:columnfamily:.* ( reads from table metrics in cassandra https://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics ) On Wednesday, July 24, 2019, 11:51:28 PM GMT+3, CPC wrote: Hi Mehmet, Yes prometheus and opscenter On Wed, 24 Jul 2019 at 17:09, mehmet bursali wrote: hi,do you use any perfomance monitoring tool like prometheus? On Monday, July 22, 2019, 1:16:58 PM GMT+3, CPC wrote: Hi everybody, State column contains "R" or "D" values. Just a single character. As Rajsekhar said, only difference is the table can contain high number of cell count. In the mean time we made a major compaction and data per node was 5-6 gb. On Mon, Jul 22, 2019, 10:56 AM Rajsekhar Mallick wrote: Hello Team, The difference in write latencies between both the tables though significant,but the higher latency being 11.353 ms is still acceptable. Overall Writes not being an issue, but write latency for this particular table on the higher side does point towards data being written to the table.Few things which I noticed, is the data in cell count column in nodetool tablehistogram o/p for message_history_state table is scattered.The partition size histogram for the tables is consistent, but the column count histogram for the impacted table isn't uniform.May be we can start thinking on these lines. I would also wait for some expert advice here. Thanks On Mon,the 22 Jul, 2019, 12:31 PM Ben Slater, wrote: Is the size of the data in your “state” column variable? The higher write latencies at the 95%+ could line up with large volumes of data for particular rows in that column (the one column not in both tables)? CheersBen --- Ben Slater Chief Product Officer Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message. On Mon, 22 Jul 2019 at 16:46, CPC wrote: Hi guys, Any idea? I thought it might be a bug but could not find anything related on jira. On Fri, Jul 19, 2019, 12:45 PM CPC wrote: Hi Rajsekhar, Here the details: 1)[cassadm@bipcas00 ~]$ nodetool tablestats tims.MESSAGE_HISTORY Total number of tables: 259 Keyspace : tims Read Count: 208256144 Read Latency: 7.655146714749506 ms Write Count: 2218205275 Write Latency: 1.7826005103175133 ms Pending Flushes: 0 Table: MESSAGE_HISTORY SSTable count: 41 Space used (live): 976964101899 Space used (total): 976964101899 Space used by snapshots (total): 3070598526780 Off heap memory used (total): 185828820 SSTable Compression Ratio: 0.8219217809913125 Number of partitions (estimate): 8175715 Memtable cell count: 73124 Memtable data size: 26543733 Memtable off heap memory used: 27829672 Memtable switch count: 1607 Local read count: 7871917 Local read latency: 1.187 ms Local write count: 172220954 Local write latency: 0.021 ms Pending flushes: 0 Percent repaired: 0.0 Bloom filter false positives: 130 Bloom filter false ratio: 0.0 Bloom filter space used: 10898488 Bloom filter off heap memory used: 10898160 Index summary off heap memory used: 2480140 Compression metadata off heap memory used: 144620848 Compacted partition minimum bytes: 36 Compacted partition maximum bytes: 557074610 Compacted partition mean bytes: 155311 Average live cells per slice (last five minutes): 25.56639344262295 Maximum live cells per slice (last five minutes): 5722 Average tombstones per slice (last five minutes): 1.8681948424068768 Maximum tombstones per slice (last five minutes): 770 Dropped Mutations: 97812 [cassadm@bipcas00 ~]$ nodetool tablestats tims.MESSAGE_HISTORY_STATE Total number of tables: 259 Keyspace : tims Read Count: 208257486 Read Latency: 7.655137315414438 ms Write Count: 2218218966 Write Latency: 1.7825896304427324 ms
Re: Materialized View's additional PrimaryKey column
Hi Jon, thanks for your suggestion (or warning :) ). yes, i've read sth. about your point and i know that just because of using MVs, there are really several issues open in JIRA on bootstrapping, compaction and incremental repair stuff but, after reading almost all jira tickets (with comments and history) related to using MVs, AFAU all that issues come out by either loosing syncronization between base table and MV by deleting columns or rows values on base table or having a huge system that has large and dynamic number of nodes/data/workloads. We use 3.11.3 version and most of the critical issues were fixed on 3.10 but of course I might be miss sth so i 'll be glad if you point me some specific jira ticket. We have a certain use case that require updates on filtering (clustering) columns.Our motivation for using MV was avoiding updates (delete + create) on primaryKey columns because we suppose that cassandra developers can manage this unpreferred operation better then us. I'm really confused now. On Wednesday, July 24, 2019, 11:30:15 PM GMT+3, Jon Haddad wrote: I really, really advise against using MVs. I've had to help a number of teams move off them. Not sure what list of bugs you read, but if the list didn't include "will destabilize your cluster to the point of constant downtime" then the list was incomplete. Jon On Wed, Jul 24, 2019 at 6:32 AM mehmet bursali wrote: + additional info: our production environment is a multiDC cluster that consist of 6 nodes in 2 DataCenters On Wednesday, July 24, 2019, 3:35:11 PM GMT+3, mehmet bursali wrote: Hi Cassandra folks,I'm planning to use Materialized View (MV) on production for some specific cases. I've read a lot of blogs, technical documents about the risks of using it and everything seems ok for our use case. My question is about consistency(also durability) evaluation of MV usage with an additional primary key column. İn one of our case, we select an UDT column of base table as addtional primary key column on MV. (UDT possible values are non nullable and restricted with domain.) . After inserting a record in base table, this additonal column (MVs primary key column) value also will be updated for 1 or 2 time. So in our case, for each update operation that will be occured on base table there are going to be delete and create operations inside MV. Does it matter from consistency(also durability) perspective that using additional primary key column whether as partition column or clustering column?