Re: Log4j vulnerability
Hi users, I just add to it that there was recently added a dependency check ant target (by myself) to scan the deps on CVE's. People can execute that themselves by "ant dependency-check" and it will scan the database of vulnerabilities automatically against Cassandra libraries we ship. Regards On Sat, 11 Dec 2021 at 18:44, Brandon Williams wrote: > > https://issues.apache.org/jira/browse/CASSANDRA-5883 > > As that ticket shows, Apache Cassandra has never used log4j2. > > On Sat, Dec 11, 2021 at 11:07 AM Abdul Patel wrote: > > > > Hi all, > > > > Any idea if any of open source Cassandra versions are impacted with log4j > > vulnerability which was reported on dec 9th
Re: Schema collision results in multiple data directories per table
Hi Tom, while I am not completely sure what might cause your issue, I just want to highlight that schema agreements were overhauled in 4.0 (1) a lot so that may be somehow related to what that ticket was trying to fix. Regards (1) https://issues.apache.org/jira/browse/CASSANDRA-15158 On Fri, 1 Oct 2021 at 18:43, Tom Offermann wrote: > > When adding a datacenter to a keyspace (following the Last Pickle [Data > Center Switch][lp] playbook), I ran into a "Configuration exception merging > remote schema" error. The nodes in one datacenter didn't converge to the new > schema version, and after restarting them, I saw the symptoms described in > this Datastax article on [Fixing a table schema collision][ds], where there > were two data directories for each table in the keyspace on the nodes that > didn't converge. I followed the recovery steps in the Datastax article to > move the data from the older directories to the new directories, ran > `nodetool refresh`, and that fixed the problem. > > [lp]: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html > [ds]: > https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useCreateTableCollisionFix.html > > While the Datastax article was super helpful for helping me recover, I'm left > wondering *why* this happened. If anyone can shed some light on that, or > offer advice on how I can avoid getting in this situation in the future, I > would be most appreciative. I'll describe the steps I took in more detail in > the thread. > > ## Steps > > 1. The day before, I had added the second datacenter ('dc2') to the > system_traces, system_distributed, and system_auth keyspaces and ran > `nodetool rebuild` for each of the 3 keyspaces. All of that went smoothly > with no issues. > > 2. For a large keyspace, I added the second datacenter ('dc2') with an `ALTER > KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': > '2', 'dc2': '3'};` statement. Immediately, I saw this error in the log: > ``` > "ERROR 16:45:47 Exception in thread Thread[MigrationStage:1,5,main]" > "org.apache.cassandra.exceptions.ConfigurationException: Column family ID > mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected > 20739eb0-d92e-11e6-b42f-e7eb6f21c481)" > "\tat > org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat > org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat > org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat > org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat > org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat > org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:594) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_232]" > "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_232]" > "\tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[na:1.8.0_232]" > "\tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_232]" > "\tat > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) > [apache-cassandra-3.11.5.jar:3.11.5]" > "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]" > ``` > > I also saw this: > ``` > "ERROR 16:46:48 Configuration exception merging remote schema" > "org.apache.cassandra.exceptions.ConfigurationException: Column family ID > mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected > 20739eb0-d92e-11e6-b42f-e7eb6f21c481)" > "\tat > org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat org.apache.cassandra.config.Schema.updateTable(Schema.java:687) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat > org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482) > ~[apache-cassandra-3.11.5.jar:3.11.5]" > "\tat >
Some more tooling around Cassandra 4.0
Hi users, I would like to highlight some tooling we put together at Instaclustr focusing / updating it to the recent Cassandra 4.0 release. We wrote a short and descriptive blog about that here (1). All these tools are completely free of charge and Apache 2.0 licensed. We hope you find them useful, in case you have any questions, feel free to reach us on GitHub issues. (1) https://www.instaclustr.com/cassandra-tools-updated-cassandra-4-0/ Regards Stefan Miklosovic
Re: Change of Cassandra TTL
Hi Raman, we at Instaclustr have created a CLI tool (1) which can strip TTLs from your SSTables and you can import that back to your node. Maybe that is something you find handy. We had some customers who had data which expired and they wanted to resurrect them - so they took SSTables with expired TTLs, removed them and voila, they had it back. So I can imagine you do this and then you re-enable TTL on it which is different. (1) https://github.com/instaclustr/cassandra-ttl-remover Regards. On Tue, 14 Sept 2021 at 16:24, raman gugnani wrote: > > Thanks Eric for the update. > > On Tue, 14 Sept 2021 at 16:50, Erick Ramirez > wrote: >> >> You'll need to write an ETL app (most common case is with Spark) to scan >> through the existing data and update it with a new TTL. You'll need to make >> sure that the ETL job is throttled down so it doesn't overload your >> production cluster. Cheers! > > > > -- > Raman Gugnani
Re: Cassandra 4 alpha/alpha2
Hi, I have tested both alpha and alpha2 and 3.11.5 on Centos 7.7.1908 and all went fine (I have some custom images for my own purposes). Update between alpha and alpha2 was just about mere version bump. Cheers On Thu, 31 Oct 2019 at 20:40, Abdul Patel wrote: > > Hey Everyone > > Did anyone was successfull to install either alpha or alpha2 version for > cassandra 4.0? > Found 2 issues : > 1> cassandra-env.sh: > JAVA_VERSION varianle is not defined. > Jvm-server.options file is not defined. > > This is fixable and after adding those , the error for cassandra-env.sh > errora went away. > > 2> second and major issue the cassandea binary when i try to start says > syntax error. > > /bin/cassandea: line 198:exec: : not found. > > Anyone has any idea on second issue? > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Disk space utilization by from some Cassandra
Hi, for example compaction uses a lot of disk space. It is quite common so it is not safe to have your disk utilised like on 85% because compactions would not have room to comapact and that node would be stuck. This happens in production quite often. Hence, having it on 50% and having big buffer to do compaction is good idea. If it is all compacted, it should go back to normal under 50% (or what figure you have). On Wed, 21 Aug 2019 at 14:33, wrote: > > Good day, > > > > I’m running the monitoring script for disk space utilization set the > benchmark to 50%. Currently am getting the alerts from some of the nodes > > About disk space greater than 50%. > > > > Is there a way, I can quickly figure out why the space has increased and how > I can maintain the disk space used by Cassandra to be below the benchmark at > all the times. > > > > Any ideas would be much appreciated. > > > > Sent from Mail for Windows 10 > > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Cassandra copy command
Hi Rahul, how did you add that dc3 to cluster? The rule of thumb here is to do rebuild from each node, for example like here https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html On Wed, 21 Aug 2019 at 12:57, Rahul Reddy wrote: > > Hi sefan, > > I'm adding new DC3 to exiting cluster and see discripencies couple of > millions in Nodetool cfstats in new DC. > > My table size is 50gb > I'm trying to run copy entire table. > > Copy table to 'full_tablr.csv' with delimiter ','; > > If I run above command from dc3. Does it get the data only from dc3? > > > > On Wed, Aug 21, 2019, 6:46 AM Stefan Miklosovic > wrote: >> >> Hi Rahul, >> >> what is your motivation behind this? Why do you want to make sure the >> count is same? What is the purpose of that? All you should care about >> is that Cassandra will return you right results. It was designed from >> the very bottom to do that for you, you should not be bothered too >> much about such discrepancies, they will be always there in general. >> But the important fact is that once queried, you can rest assured it >> is returned (and consequentially repaired if data not match) as they >> should. >> >> What copy command you are talking about precisely, why you cant use just >> repair? >> >> On Wed, 21 Aug 2019 at 12:14, Rahul Reddy wrote: >> > >> > Hello, >> > >> > I have 3 datacenters . Want to make sure record count is same in all dc's >> > . If I run copy command node1 in dc1 does it get the data from only dc1? >> > Nodetool cfstats I'm seeing discrepancies in partitions count is it >> > because we didn't run cleanup after adding few nodes and remove them?. To >> > rule out any discripencies I want to run copy command from 3 DC's and >> > compare. Please let me know if copy command extracts data from the DC only >> > I ran it from? >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Cassandra copy command
Hi Rahul, what is your motivation behind this? Why do you want to make sure the count is same? What is the purpose of that? All you should care about is that Cassandra will return you right results. It was designed from the very bottom to do that for you, you should not be bothered too much about such discrepancies, they will be always there in general. But the important fact is that once queried, you can rest assured it is returned (and consequentially repaired if data not match) as they should. What copy command you are talking about precisely, why you cant use just repair? On Wed, 21 Aug 2019 at 12:14, Rahul Reddy wrote: > > Hello, > > I have 3 datacenters . Want to make sure record count is same in all dc's . > If I run copy command node1 in dc1 does it get the data from only dc1? > Nodetool cfstats I'm seeing discrepancies in partitions count is it because > we didn't run cleanup after adding few nodes and remove them?. To rule out > any discripencies I want to run copy command from 3 DC's and compare. Please > let me know if copy command extracts data from the DC only I ran it from? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: New column
You have to basically create new table and include that column either as part of primary key or you make it a clustering column. Avoid using allow filtering, it should not be used in production nor any serious app. On Sun, 18 Aug 2019 at 21:57, Rahul Reddy wrote: > > Hello, > > We have a table and want to add column and select based on existing entire > primary key plus new column using allow filtering. Since my where clause has > all the primary key + new column does the allow filtering scan only the > partions which are listed or does it has to scan whole table? What is the > best approach add new column and query it based on existing primary key plus > new column? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Cassandra DataStax Java Driver in combination with Java EE / EJBs
Hi Ralph, yes this is completely fine, even advisable. You can further extend this idea to have sessions per keyspace for example if you really insist, and it could be injectable based on some qualifier ... thats up to you. On Wed, 12 Jun 2019 at 11:31, John Sanda wrote: > > Hi Ralph, > > A session is intended to be a long-lived, i.e., application-scoped object. > You only need one session per cluster. I think what you are doing with the > @Singleton is fine. In my opinion though, EJB really does not offer much > value when working with Cassandra. I would be inclined to just use CDI. > > Cheers > > John > > On Tue, Jun 11, 2019 at 5:38 PM Ralph Soika wrote: >> >> Hi, >> >> I have a question concerning the Cassandra DataStax Java Driver in >> combination with Java EE and EJBs. >> >> I have implemented a Rest Service API based on Java EE8. In my application I >> have for example a jax-rs rest resource to write data into cassandra >> cluster. My first approach was to create in each method call >> >> a new Casssandra Cluster and Session object, >> write my data into cassandra >> and finally close the session and the cluster object. >> >> This works but it takes a lot of time (2-3 seconds) until the cluster object >> / session is opened for each request. >> >> So my second approach is now a @Singleton EJB providing the session object >> for my jax-rs resources. My service implementation to hold the Session >> object looks something like this: >> >> >> @Singleton >> public class ClusterService { >> private Cluster cluster; >> private Session session; >> >> @PostConstruct >> private void init() throws ArchiveException { >> cluster=initCluster(); >> session = initArchiveSession(); >> } >> >> @PreDestroy >> private void tearDown() throws ArchiveException { >> // close session and cluster object >> if (session != null) { >> session.close(); >> } >> if (cluster != null) { >> cluster.close(); >> } >> } >> >> public Session getSession() { >> if (session==null) { >> try { >> init(); >> } catch (ArchiveException e) { >> logger.warning("unable to get falid session: " + >> e.getMessage()); >> e.printStackTrace(); >> } >> } >> return session; >> } >> >>. >> >> } >> >> >> And my rest service calls now looking like this: >> >> >> @Path("/archive") >> @Stateless >> public class ArchiveRestService { >> >> @EJB >> ClusterService clusterService; >> >> @POST >> @Consumes({ MediaType.APPLICATION_XML, MediaType.TEXT_XML }) >> public Response postData(XMLDocument xmlDocument) { >> Session session = clusterService.getSession(); >> session.execute(); >> ... >> } >> ... >> } >> >> >> The result is now a super-fast behavior! Seems to be clear because my rest >> service no longer need to open a new session for each request. >> >> My question is: Is this approach with a @Singleton ClusterService EJB valid >> or is there something I should avoid? >> As far as I can see this works pretty fine and is really fast. I am running >> the application on a Wildfly 15 server which is Java EE8. >> >> Thanks for your comments >> >> Ralph >> >> >> >> >> -- >> >> Imixs Software Solutions GmbH >> Web: www.imixs.com Phone: +49 (0)89-452136 16 >> Office: Agnes-Pockels-Bogen 1, 80992 München >> Registergericht: Amtsgericht Muenchen, HRB 136045 >> Geschaeftsführer: Gaby Heinle u. Ralph Soika >> >> Imixs is an open source company, read more: www.imixs.org > > > > -- > > - John - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Cluster schema version choosing
My guess is that the "latest" schema would be chosen but I am definitely interested in in-depth explanation. On Tue, 21 May 2019 at 00:28, Alexey Korolkov wrote: > > Hello team, > In some circumstances, my cluster was split onto two schema versions > (half on one version, and rest on another) > In the process of resolving this issue, I restarted some nodes. > Eventually, nodes migrated to one schema, but it was not clear why they > choose exactly this version of schema? > I haven't found any explainings of the factor on which they picking schema > version, > please help me to find out the algorithm of choosing schema or classes in > source code responsible for this. > > > > > > -- > Sincerely yours, Korolkov Aleksey - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Exception while running two CQL queries in Parallel
what are your replication factors for that keyspace? why are you using each quorum? might be handy https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigSerialConsistency.html On Wed, 1 May 2019 at 17:57, Bhavesh Prajapati wrote: > > I had two queries run on same row in parallel (that’s a use-case). While > Batch Query 2 completed successfully, query 1 failed with exception. > > Following are driver logs and sequence of log events. > > > > QUERY 1: STARTED > > 2019-04-30T13:14:50.858+ CQL update "EACH_QUORUM" "UPDATE dir SET > bid='value' WHERE repoid='06A7490B5CBFA1DE0A494027' IF EXISTS;" > > > > QUERY 2: STARTED > > 2019-04-30T13:14:51.161+ CQL BEGIN BATCH > > 2019-04-30T13:14:51.161+ CQL batch-update "06A7490B5CBFA1DE0A494027" > > 2019-04-30T13:14:51.161+ CQL batch-delete "06A7490B5CBFA1DE0A494027" > > 2019-04-30T13:14:51.161+ CQL APPLY BATCH > > 2019-04-30T13:14:51.165+ Cassandra delete directory call completed > successfully for "06A7490B5CBFA1DE0A494027" > > QUERY 2: COMPLETED - WITH SUCCESS > > > > QUERY 1: FAILED > > 2019-04-30T13:14:52.311+ CQL > "org.springframework.cassandra.support.exception.CassandraWriteTimeoutException" > "Cassandra timeout during write query at consistency SERIAL (5 replica were > required but only 0 acknowledged the write); nested exception is > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout > during write query at consistency SERIAL (5 replica were required but only 0 > acknowledged the write)" > > org.springframework.cassandra.support.exception.CassandraWriteTimeoutException: > Cassandra timeout during write query at consistency SERIAL (5 replica were > required but only 0 acknowledged the write); nested exception is > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout > during write query at consistency SERIAL (5 replica were required but only 0 > acknowledged the write) > > at > org.springframework.cassandra.support.CassandraExceptionTranslator.translateExceptionIfPossible(CassandraExceptionTranslator.java:95) > ~[spring-cql-1.5.18.RELEASE.jar!/:?] > > at > org.springframework.cassandra.core.CqlTemplate.potentiallyConvertRuntimeException(CqlTemplate.java:946) > ~[spring-cql-1.5.18.RELEASE.jar!/:?] > > at > org.springframework.cassandra.core.CqlTemplate.translateExceptionIfPossible(CqlTemplate.java:930) > ~[spring-cql-1.5.18.RELEASE.jar!/:?] > > > > What could have caused this exception ? > > How to resolve or handle such situation ? > > > > Thanks, > > Bhavesh - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: gc_grace config for time serie database
I am wrong in this paragraph: >> On the other hand, a node was down, it was TTLed on healthy nodes and >> tombstone was created, then you start the first one which was down and >> as it counts down you hit that node with update. It does not matter how long that dead node was dead. Once you start the DB it will compute TTL value regardless, it does not suddenly stop to take time it was dead into account. It would just mean it would TTLed and it should not as other healthy nodes could receive updates after they stopped to make hints. But you say you dont ever update so it is not applicable here. It is interesting question. I wont give you an ultimate answer. Maybe somebody else gives their opinion on this. I am curious what consequences that has if any if you set it to be equal. On Wed, 17 Apr 2019 at 23:12, onmstester onmstester wrote: > > I do not use table default ttl (every row has its own TTL) and also no update > occurs to the rows. > I suppose that (because of immutable nature of everything in cassandra) > cassandra would keep only the insertion timestamp + the original ttl and > computes ttl of a row using these two and current timestamp of the system > whenever needed (when you select ttl or when the compaction occurs). > So there should be something like this attached to every row: "this row > inserted at 4/17/2019 12:20 PM and should be deleted in 2 months", so > whatever happens to the row replicas, my intention of removing it at 6/17 > should not be changed! > > Would you suggest that my idea of "gc_grace = max_hint = 3 hours" for a time > serie db is not reasonable? > > Sent using Zoho Mail > > > > On Wed, 17 Apr 2019 17:13:02 +0430 Stefan Miklosovic > wrote > > TTL value is decreasing every second and it is set to original TTL > value back after some update occurs on that row (see example below). > Does not it logically imply that if a node is down for some time and > updates are occurring on live nodes and handoffs are saved for three > hours and after three hours it stops to do them, your data on other > nodes would not be deleted as TTLS are reset upon every update and > countdown starts again, which is correct, but they would be deleted on > that node which was down because it didnt receive updates so if you > query that node, data will not be there but they should. > > On the other hand, a node was down, it was TTLed on healthy nodes and > tombstone was created, then you start the first one which was down and > as it counts down you hit that node with update. So there is not a > tombstone on the previously dead node but there are tombstones on > healthy ones and if you delete tombstones after 3 hours, previously > dead node will never get that info and it your data might actually end > up being resurrected as they would be replicated to always healthy > nodes as part of the repair. > > Do you see some flaw in my reasoning? > > cassandra@cqlsh> DESCRIBE TABLE test.test; > > CREATE TABLE test.test ( > id uuid PRIMARY KEY, > value text > ) WITH bloom_filter_fp_chance = 0.6 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 60 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > > > cassandra@cqlsh> select ttl(value) from test.test where id = > 4f860bf0-d793-4408-8330-a809c6cf6375; > > ttl(value) > > 25 > > (1 rows) > cassandra@cqlsh> UPDATE test.test SET value = 'c' WHERE id = > 4f860bf0-d793-4408-8330-a809c6cf6375; > cassandra@cqlsh> select ttl(value) from test.test where id = > 4f860bf0-d793-4408-8330-a809c6cf6375; > > ttl(value) > > 59 > > (1 rows) > cassandra@cqlsh> select * from test.test ; > > id | value > --+--- > 4f860bf0-d793-4408-8330-a809c6cf6375 | c > > > On Wed, 17 Apr 2019 at 19:18, fald 1970 wrote: > > > > > > > > Hi, > > > > According to these Facts: > > 1. If a node is down for longer than max_hint_window_in_ms (3 hours by > > default), the coordinator stops writing new hints. > > 2. The main purpose of gc_grace property is to prevent Zombie data and also > > it d
Re: gc_grace config for time serie database
TTL value is decreasing every second and it is set to original TTL value back after some update occurs on that row (see example below). Does not it logically imply that if a node is down for some time and updates are occurring on live nodes and handoffs are saved for three hours and after three hours it stops to do them, your data on other nodes would not be deleted as TTLS are reset upon every update and countdown starts again, which is correct, but they would be deleted on that node which was down because it didnt receive updates so if you query that node, data will not be there but they should. On the other hand, a node was down, it was TTLed on healthy nodes and tombstone was created, then you start the first one which was down and as it counts down you hit that node with update. So there is not a tombstone on the previously dead node but there are tombstones on healthy ones and if you delete tombstones after 3 hours, previously dead node will never get that info and it your data might actually end up being resurrected as they would be replicated to always healthy nodes as part of the repair. Do you see some flaw in my reasoning? cassandra@cqlsh> DESCRIBE TABLE test.test; CREATE TABLE test.test ( id uuid PRIMARY KEY, value text ) WITH bloom_filter_fp_chance = 0.6 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 60 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; cassandra@cqlsh> select ttl(value) from test.test where id = 4f860bf0-d793-4408-8330-a809c6cf6375; ttl(value) 25 (1 rows) cassandra@cqlsh> UPDATE test.test SET value = 'c' WHERE id = 4f860bf0-d793-4408-8330-a809c6cf6375; cassandra@cqlsh> select ttl(value) from test.test where id = 4f860bf0-d793-4408-8330-a809c6cf6375; ttl(value) 59 (1 rows) cassandra@cqlsh> select * from test.test ; id | value --+--- 4f860bf0-d793-4408-8330-a809c6cf6375 | c On Wed, 17 Apr 2019 at 19:18, fald 1970 wrote: > > > > Hi, > > According to these Facts: > 1. If a node is down for longer than max_hint_window_in_ms (3 hours by > default), the coordinator stops writing new hints. > 2. The main purpose of gc_grace property is to prevent Zombie data and also > it determines for how long the coordinator should keep hinted files > > When we use Cassandra for Time series data which: > A) Every row of data has TTL and there would be no explicit delete so not so > much worried about zombies > B) At every minute there should be hundredrs of write requets to each node, > so if one of the node was down for longer than max_hint_window_in_ms, we > should run manual repair on that node, so anyway stored hints on the > coordinator won't be necessary. > > So Finally the question, is this a good idea to set gc_grace equal to > max_hint_window_in_ms (/1000 to convert to seconds), > for example set them both to 3 hours (why should keep the tombstones for 10 > days when they won't be needed at all)? > > Best Regards > Federica Albertini - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Bloom filter false positives high
Lastly I wonder if that number is very same from every node you connect your nodetool to. Do all nodes see very similar false positives ratio / number? On Wed, 17 Apr 2019 at 21:41, Stefan Miklosovic wrote: > > One thing comes to my mind but my reasoning is questionable as I am > not an expert in this. > > If you think about this, the whole concept of Bloom filter is to check > if some record is in particular SSTable. False positive mean that, > obviously, filter thought it was there but in fact it is not. So > Cassandra did a look unnecessarily. Why does it think that it is there > in such number of cases? You either make a lot of same requests on > same partition key over time hence querying same data over and over > again (but would not that data be cached?) or there was a lot of data > written with same partition key so it thinks it is there but > clustering column is different. As ts is of type timeuuid, isnt it > true that you are doing a lot of queries with some date? It might be > true that hash is done only on partition keys and not on clustering > columns so filter gives you "yes" and it goes there, checks it > clustering column is equal what you queried and its not there. But as > I say I might be wrong ... > > More to it, your read_repair_chance is 0.0 so it will never do a > repair after successful read (e.g. you have rf 3 and cl quorum so one > node is somehow behind) so if you dont run repairs maybe it is just > somehow unsychronized but that is really just my guess. > > On Wed, 17 Apr 2019 at 21:39, Martin Mačura wrote: > > > > We cannot run any repairs on these tables. Whenever we tried it > > (incremental or full or partitioner range), it caused a node to run out of > > disk space during anticompaction. We'll try again once Cassandra 4.0 is > > released. > > > > On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic > > wrote: > >> > >> if you invoke nodetool it gets false positives number from this metric > >> > >> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578 > >> > >> You get high false positives so this accumulates them > >> > >> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572 > >> > >> If you follow that, that number is computed here > >> > >> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55 > >> > >> In order to have that number so high, the difference has to be so big > >> so lastFalsePositiveCount is imho significantly lower > >> > >> False positives are ever increased only in BigTableReader where it get > >> complicated very quickly and I am not sure why it is called to be > >> honest. > >> > >> Is all fine with db as such? Do you run repairs? Does that number > >> increses or decreases over time? Has repair or compaction some effect > >> on it? > >> > >> On Wed, 17 Apr 2019 at 20:48, Martin Mačura wrote: > >> > > >> > Both tables use the default bloom_filter_fp_chance of 0.01 ... > >> > > >> > CREATE TABLE ... ( > >> >a int, > >> >b int, > >> >bucket timestamp, > >> >ts timeuuid, > >> >c int, > >> > ... > >> >PRIMARY KEY ((a, b, bucket), ts, c) > >> > ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC) > >> >AND bloom_filter_fp_chance = 0.01 > >> >AND compaction = {'class': > >> > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', > >> > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', > >> > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction': > >> > 'false'} > >> >AND dclocal_read_repair_chance = 0.0 > >> >AND default_time_to_live = 63072000 > >> >AND gc_grace_seconds = 10800 > >> > ... > >> >AND read_repair_chance = 0.0 > >> >AND speculative_retry = 'NONE'; > >> > > >> > > >> > CREATE TABLE ... ( > >> >c int, > >> >b int, > >> >bucket timestamp, > >> >ts timeuuid, > >> > ... > >> >PRIMARY KEY ((c, b, bucket), ts) > >> > ) WITH CLUSTERING ORDER BY (ts DESC) > >> >AND bloom_filter_fp_chance = 0.01 > >> >AND compaction = {'class': > >> >
Re: Bloom filter false positives high
One thing comes to my mind but my reasoning is questionable as I am not an expert in this. If you think about this, the whole concept of Bloom filter is to check if some record is in particular SSTable. False positive mean that, obviously, filter thought it was there but in fact it is not. So Cassandra did a look unnecessarily. Why does it think that it is there in such number of cases? You either make a lot of same requests on same partition key over time hence querying same data over and over again (but would not that data be cached?) or there was a lot of data written with same partition key so it thinks it is there but clustering column is different. As ts is of type timeuuid, isnt it true that you are doing a lot of queries with some date? It might be true that hash is done only on partition keys and not on clustering columns so filter gives you "yes" and it goes there, checks it clustering column is equal what you queried and its not there. But as I say I might be wrong ... More to it, your read_repair_chance is 0.0 so it will never do a repair after successful read (e.g. you have rf 3 and cl quorum so one node is somehow behind) so if you dont run repairs maybe it is just somehow unsychronized but that is really just my guess. On Wed, 17 Apr 2019 at 21:39, Martin Mačura wrote: > > We cannot run any repairs on these tables. Whenever we tried it (incremental > or full or partitioner range), it caused a node to run out of disk space > during anticompaction. We'll try again once Cassandra 4.0 is released. > > On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic > wrote: >> >> if you invoke nodetool it gets false positives number from this metric >> >> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578 >> >> You get high false positives so this accumulates them >> >> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572 >> >> If you follow that, that number is computed here >> >> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55 >> >> In order to have that number so high, the difference has to be so big >> so lastFalsePositiveCount is imho significantly lower >> >> False positives are ever increased only in BigTableReader where it get >> complicated very quickly and I am not sure why it is called to be >> honest. >> >> Is all fine with db as such? Do you run repairs? Does that number >> increses or decreases over time? Has repair or compaction some effect >> on it? >> >> On Wed, 17 Apr 2019 at 20:48, Martin Mačura wrote: >> > >> > Both tables use the default bloom_filter_fp_chance of 0.01 ... >> > >> > CREATE TABLE ... ( >> >a int, >> >b int, >> >bucket timestamp, >> >ts timeuuid, >> >c int, >> > ... >> >PRIMARY KEY ((a, b, bucket), ts, c) >> > ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC) >> >AND bloom_filter_fp_chance = 0.01 >> >AND compaction = {'class': >> > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', >> > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', >> > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction': >> > 'false'} >> >AND dclocal_read_repair_chance = 0.0 >> >AND default_time_to_live = 63072000 >> >AND gc_grace_seconds = 10800 >> > ... >> >AND read_repair_chance = 0.0 >> >AND speculative_retry = 'NONE'; >> > >> > >> > CREATE TABLE ... ( >> >c int, >> >b int, >> >bucket timestamp, >> >ts timeuuid, >> > ... >> >PRIMARY KEY ((c, b, bucket), ts) >> > ) WITH CLUSTERING ORDER BY (ts DESC) >> >AND bloom_filter_fp_chance = 0.01 >> >AND compaction = {'class': >> > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', >> > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', >> > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction': >> > 'false'} >> >AND dclocal_read_repair_chance = 0.0 >> >AND default_time_to_live = 63072000 >> >AND gc_grace_seconds = 10800 >> > ... >> >AND read_repair_chance = 0.0 >> >AND speculative_retry = 'NONE'; >> > >> > On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic >> > wrote: >> >> >> >> What is your bloom_filt
Re: Bloom filter false positives high
if you invoke nodetool it gets false positives number from this metric https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578 You get high false positives so this accumulates them https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572 If you follow that, that number is computed here https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55 In order to have that number so high, the difference has to be so big so lastFalsePositiveCount is imho significantly lower False positives are ever increased only in BigTableReader where it get complicated very quickly and I am not sure why it is called to be honest. Is all fine with db as such? Do you run repairs? Does that number increses or decreases over time? Has repair or compaction some effect on it? On Wed, 17 Apr 2019 at 20:48, Martin Mačura wrote: > > Both tables use the default bloom_filter_fp_chance of 0.01 ... > > CREATE TABLE ... ( >a int, >b int, >bucket timestamp, >ts timeuuid, >c int, > ... >PRIMARY KEY ((a, b, bucket), ts, c) > ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC) >AND bloom_filter_fp_chance = 0.01 >AND compaction = {'class': > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction': > 'false'} >AND dclocal_read_repair_chance = 0.0 >AND default_time_to_live = 63072000 >AND gc_grace_seconds = 10800 > ... >AND read_repair_chance = 0.0 >AND speculative_retry = 'NONE'; > > > CREATE TABLE ... ( >c int, >b int, >bucket timestamp, >ts timeuuid, > ... >PRIMARY KEY ((c, b, bucket), ts) > ) WITH CLUSTERING ORDER BY (ts DESC) >AND bloom_filter_fp_chance = 0.01 >AND compaction = {'class': > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', > 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction': > 'false'} >AND dclocal_read_repair_chance = 0.0 >AND default_time_to_live = 63072000 >AND gc_grace_seconds = 10800 > ... >AND read_repair_chance = 0.0 >AND speculative_retry = 'NONE'; > > On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic > wrote: >> >> What is your bloom_filter_fp_chance for either table? I guess it is >> bigger for the first one, bigger that number is between 0 and 1, less >> memory it will use (17 MiB against 54.9 Mib) which means more false >> positives you will get. >> >> On Wed, 17 Apr 2019 at 19:59, Martin Mačura wrote: >> > >> > Hi, >> > I have a table with poor bloom filter false ratio: >> >SSTable count: 1223 >> >Space used (live): 726.58 GiB >> >Number of partitions (estimate): 8592749 >> >Bloom filter false positives: 35796352 >> >Bloom filter false ratio: 0.68472 >> >Bloom filter space used: 17.82 MiB >> >Compacted partition maximum bytes: 386857368 >> > >> > It's a time series, TWCS compaction, window size 1 day, data partitioned >> > in daily buckets, TTL 2 years. >> > >> > I have another table with a similar schema, but it is not affected for >> > some reason: >> >SSTable count: 1114 >> >Space used (live): 329.87 GiB >> >Number of partitions (estimate): 25460768 >> >Bloom filter false positives: 156942 >> >Bloom filter false ratio: 0.00010 >> >Bloom filter space used: 54.9 MiB >> >Compacted partition maximum bytes: 20924300 >> > >> > Thanks for any advice, >> > >> > Martin >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Bloom filter false positives high
What is your bloom_filter_fp_chance for either table? I guess it is bigger for the first one, bigger that number is between 0 and 1, less memory it will use (17 MiB against 54.9 Mib) which means more false positives you will get. On Wed, 17 Apr 2019 at 19:59, Martin Mačura wrote: > > Hi, > I have a table with poor bloom filter false ratio: >SSTable count: 1223 >Space used (live): 726.58 GiB >Number of partitions (estimate): 8592749 >Bloom filter false positives: 35796352 >Bloom filter false ratio: 0.68472 >Bloom filter space used: 17.82 MiB >Compacted partition maximum bytes: 386857368 > > It's a time series, TWCS compaction, window size 1 day, data partitioned in > daily buckets, TTL 2 years. > > I have another table with a similar schema, but it is not affected for some > reason: >SSTable count: 1114 >Space used (live): 329.87 GiB >Number of partitions (estimate): 25460768 >Bloom filter false positives: 156942 >Bloom filter false ratio: 0.00010 >Bloom filter space used: 54.9 MiB >Compacted partition maximum bytes: 20924300 > > Thanks for any advice, > > Martin - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Issue while updating a record in 3 node cassandra cluster deployed using kubernetes
>> I have a 3 node cassandra cluster with Replication factor as 2 and >> read-write consistency set to QUORUM. I am not sure what you want to achieve with this. If you have three nodes and RF 2, for each write there will be two replicas, right ... If one of your replicas is down out of two in total, you will never reach quorum as one node is down and one is up and that is not quorum if half of your nodes is up. If one of your nodes fails and the record is on that one and some other, your query fails too so your cluster is not protected against any failed nodes. On Tue, 9 Apr 2019 at 23:10, Mahesh Daksha wrote: > > Hello All, > > I have a 3 node cassandra cluster with Replication factor as 2 and read-write > consistency set to QUORUM. We are using Spring data cassandra. All > infrastructure is deployed using kubernetes. > > Now in normal use case many records gets inserted to cassandra table. Then we > try to modify/update one of the record using save method of repo, like below: > > ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta); > > After execution of above statement we never see any exception or error. But > still this update state goes silent/fail intermittently. That is at times the > record in the db gets updated successfully where as other time it fails. Also > in the above query when we print tmpRec it contains the updated and correct > value every time. Still in the db these updated values doesn't get reflected. > > We check the the cassandra transport TRACE logs on all nodes and found the > our queries are getting logged there and are being executed also with out any > error or exception. > > Now another weird observation is this all thing works erfectly fine if I am > using single cassandra node (in kubernetes) or if we deploy above infra using > ansible (even works for 3 nodes for Ansible). > > It looks some issue is specifically with the kubernetes 3 node deployment of > cassandra. Primarily looks like replication among nodes causing this. > > Please suggest. > > > > I have a 3 node cassandra cluster with Replication factor as 2 and read-write > consistency set to QUORUM. We are using Spring data cassandra. All > infrastructure is deployed using kubernetes. > > Now in normal use case many records gets inserted to cassandra table. Then we > try to modify/update one of the record using save method of repo, like below: > > ChunkMeta tmpRec = chunkMetaRepository.save(chunkMeta); > > After execution of above statement we never see any exception or error. But > still this update fail intermittently. That is when we check the record in > the db sometime it gets updated successfully where as other time it fails. > Also in the above query when we print tmpRec it contains the updated and > correct value. Still in the db these updated values doesnt get reflected. > > We check the the cassandra transport TRACE logs on all nodes and found the > our queries are getting logged there and are being executed also. > > Now another weird observation is this all thing works if I am using single > cassandra node (in kubernetes) or if we deploy above infra using ansible > (even works for 3 nodes for Ansible). > > It looks some issue is specifically with the kubernetes 3 node deployment of > cassandra. Primarily looks like replication among nodes causing this. > > Please suggest. > > Below are the contents of my cassandra Docker file: > > FROM ubuntu:16.04 > > RUN apt-get update && apt-get install -y python sudo lsof vim dnsutils > net-tools && apt-get clean && \ > addgroup testuser && useradd -g testuser testuser && usermod --password > testuser testuser; > > RUN mkdir -p /opt/test && \ > mkdir -p /opt/test/data; > > ADD jre8.tar.gz /opt/test/ > ADD apache-cassandra-3.11.0-bin.tar.gz /opt/test/ > > RUN chmod 755 -R /opt/test/jre && \ > ln -s /opt/test/jre/bin/java /usr/bin/java && \ > mv /opt/test/apache-cassandra* /opt/test/cassandra; > > RUN mkdir -p /opt/test/cassandra/logs; > > ENV JAVA_HOME /opt/test/jre > RUN export JAVA_HOME > > COPY version.txt /opt/test/cassandra/version.txt > > WORKDIR /opt/test/cassandra/bin/ > > RUN mkdir -p /opt/test/data/saved_caches && \ > mkdir -p /opt/test/data/commitlog && \ > mkdir -p /opt/test/data/hints && \ > chown -R testuser:testuser /opt/test/data && \ > chown -R testuser:testuser /opt/test; > > USER testuser > > CMD cp /etc/cassandra/cassandra.yml ../conf/conf.yml && perl -p -e > 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : $&/eg; s/\$\{([^}]+)\}//eg' > ../conf/conf.yml > ../conf/cassandra.yaml && rm ../conf/conf.yml && > ./cassandra -f > > Please note conf.yml is basically cassandra.yml file having properties > related to cassandra. > > > Thanks, > > Mahesh Daksha - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: time tracking for down node for nodetool repair
Ah I see it is the default for hinted handoffs. I was somehow thinking its bigger figure I do not know why :) I would say you should run repairs continuously / periodically so you would not even have to do some thinking about that and it should run in the background in a scheduled manner if possible. Regards On Tue, 9 Apr 2019 at 04:19, Kunal wrote: > > Hello everyone.. > > > > I have a 6 node Cassandra datacenter, 3 nodes on each datacenter. If one of > the node goes down and remain down for more than 3 hr, I have to run nodetool > repair. Just wanted to ask if Cassandra automatically tracks the time when > one of the Cassandra node goes down or do I need to write code to track the > time and run repair when node comes back online after 3 hrs. > > > Thanks in anticipation. > > Regards, > Kunal Vaid - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: time tracking for down node for nodetool repair
Hi Kunal, where do you have that "more than 3 hours" from? Regards On Tue, 9 Apr 2019 at 04:19, Kunal wrote: > > Hello everyone.. > > > > I have a 6 node Cassandra datacenter, 3 nodes on each datacenter. If one of > the node goes down and remain down for more than 3 hr, I have to run nodetool > repair. Just wanted to ask if Cassandra automatically tracks the time when > one of the Cassandra node goes down or do I need to write code to track the > time and run repair when node comes back online after 3 hrs. > > > Thanks in anticipation. > > Regards, > Kunal Vaid - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Procedures for moving part of a C* cluster to a different datacenter
On Wed, 3 Apr 2019 at 18:38, Oleksandr Shulgin wrote: > > On Wed, Apr 3, 2019 at 12:28 AM Saleil Bhat (BLOOMBERG/ 731 LEX) > wrote: >> >> >> The standard procedure for doing this seems to be add a 3rd datacenter to >> the cluster, stream data to the new datacenter via nodetool rebuild, then >> decommission the old datacenter. A more detailed review of this procedure >> can be found here: >> http://thelastpickle.com/blog/2019/02/26/data-center-switch.html >> >> However, I see two problems with the above protocol. First, it requires >>changes on the application layer because of the datacenter name change; e.g. >>all applications referring to the datacenter ‘Orlando’ will now have to be >>changed to refer to ‘Tampa’. > > > Alternatively, you may omit DC specification in the client and provide > internal network addresses as the contact points. I am afraid you are mixing two things together. I believe OP means that he has to change local dc in DCAwareRoundRobinPolicy. I am not sure what contact points have to do with that. If there is at least one contact point from DC nobody removes all should be fine. The process in the article is right. Before transitioning to new DC one has to be sure that all writes and reads still target old dc too after you alter a keyspace and add new dc there so you dont miss any write when something goes south and you have to switch it back. Thats achieved by local_one / local_quorum and DCAwareRoundRobinPolicy with localDc pointing to the old one. Then you do rebuild and you restart your app in such way that new DC will be in that policy so new writes and reads are going primarily to that DC and once all is fine you drop the old one (you can do maybe additional repair to be sure). I think the rolling restart of the app is inevitable but if services are in some kind of HA setup I dont see a problem with that. From outside it would look like there is not any downtime. OP has a problem with repair on nodes and it is true that can be time consuming, even not doable, but there are workarounds around that and I do not want to go into here. You can speed this process significantly when you are smart about that and you repair in smaller chunks so you dont clog your cluster completely, its called subrange repair. >> As such, I was wondering what peoples’ thoughts were on the following >> alternative procedure: >> 1) Kill one node in the old datacenter >> 2) Add a new node in the new datacenter but indicate that it is to REPLACE >> the one just shutdown; this node will bootstrap, and all the data which it >> is supposed to be responsible for will be streamed to it > > > I don't think this is going to work. First, I believe streaming for > bootstrap or for replacing a node is DC-local, so the first node won't have > any peers to stream from. Even if it would stream from the remote DC, this > single node will own 100% of the ring and will most likely die of the load > well before it finishes streaming. > > Regards, > -- > Alex > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Multi-DC replication and hinted handoff
Hi Jens, I am reading Cassandra The definitive guide and there is a chapter 9 - Reading and Writing Data and section The Cassandra Write Path and this sentence in it: If a replica does not respond within the timeout, it is presumed to be down and a hint is stored for the write. So your node might be actually fine eventually but it just can not cope with the load and it will reply too late after a coordinator has sufficient replies from other replicas. So it makes a hint for that write and for that node. I am not sure how is this related to turning off handoffs completely. I can do some tests locally if time allows to investigate various scenarios. There might be some subtle differences On Wed, 3 Apr 2019 at 07:19, Jens Fischer wrote: > Yes, Apache Cassandra 3.11.2 (no DSE). > > On 2. Apr 2019, at 19:40, sankalp kohli wrote: > > Are you using OSS C*? > > On Fri, Mar 29, 2019 at 1:49 AM Jens Fischer wrote: > >> Hi, >> >> I have a Cassandra setup with multiple data centres. The vast majority of >> writes are LOCAL_ONE writes to data center DC-A. One node (lets call this >> node A1) in DC-A has accumulated large amounts of hint files (~100 GB). In >> the logs of this node I see lots of messages like the following: >> >> INFO [HintsDispatcher:26] 2019-03-28 01:49:25,217 >> HintsDispatchExecutor.java:289 - Finished hinted handoff of file >> db485ac6-8acd-4241-9e21-7a2b540459de-1553419324363-1.hints to endpoint / >> 10.10.2.55: db485ac6-8acd-4241-9e21-7a2b540459de >> >> The node 10.10.2.55 is in DC-B, lets call this node B1. There is no >> indication whatsoever that B1 was down: Nothing in our monitoring, nothing >> in the logs of B1, nothing in the logs of A1. Are there any other >> situations where hints to B1 are stored at A1? Other than A1's failure >> detection detecting B1 as down I mean. For example could the reason for the >> hints be that B1 is overloaded and can not handle the intake from the A1? >> Or that the network connection between DC-A and DC-B is to slow? >> >> While researching this I also found the following information on Stack >> Overflow from Ben Slater regarding hints and multi-dc replication: >> >> Another factor here is the consistency level you are using - a LOCAL_* >> consistency level will only require writes to be written to the local DC >> for the operation to be considered a success (and hints will be stored for >> replication to the other DC). >> (…) >> The hints are the records of writes that have been made in one DC that >> are not yet replicated to the other DC (or even nodes within a DC). I think >> your options to avoid them are: (1) write with ALL or QUOROM (not LOCAL_*) >> consistency - this will slow down your writes but will ensure writes go >> into both DCs before the op completes (2) Don't replicate the data to the >> second DC (by setting the replication factor to 0 for the second DC in the >> keyspace definition) (3) Increase the capacity of the second DC so it can >> keep up with the writes (4) Slow down your writes so the second DC can keep >> up. >> >> >> Source: https://stackoverflow.com/a/37382726 >> >> This reads like hints are used for “normal” (async) replication between >> data centres, i.e. hints could show up without any nodes being down >> whatsoever. This could explain what I am seeing. Does anyone now more about >> this? Does that mean I will see hints even if I disable hinted handoff? >> >> Any pointers or help are greatly appreciated! >> >> Thanks in advance >> Jens >> >> Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen >> Schneider, Hermann Schweizer. >> Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer >> 127/137/50792, USt.-IdNr. DE272208908 >> > > Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen > Schneider, Hermann Schweizer. > Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer > 127/137/50792, USt.-IdNr. DE272208908 >
Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
It is just a C* in Docker Compose with static IP addresses as long as all containers run. I am just killing Cassandra process and starting it again in each container. On Fri, 15 Mar 2019 at 10:47, Jeff Jirsa wrote: > Are your IPs changing as you restart the cluster? Kubernetes or Mesos or > something where your data gets scheduled on different machines? If so, if > it gets an IP that was previously in the cluster, it’ll stomp on the old > entry in the gossiper maps > > > > -- > Jeff Jirsa > > > On Mar 14, 2019, at 3:42 PM, Fd Habash wrote: > > I can conclusively say, none of these commands were run. However, I think > this is the likely scenario … > > > > If you have a cluster of three nodes 1,2,3 … > >- If 3 shows as DN >- Restart C* on 1 & 2 >- Nodetool status should NOT show node 3 IP at all. > > > > Restarting the cluster while a node is down resets gossip state. > > > > There is a good chance this is what happened. > > > > Plausible? > > > > > Thank you > > > > *From: *Jeff Jirsa > *Sent: *Thursday, March 14, 2019 11:06 AM > *To: *cassandra > *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't > exist ingossip > > > > Two things that wouldn't be a bug: > > > > You could have run removenode > > You could have run assassinate > > > > Also could be some new bug, but that's much less likely. > > > > > > On Thu, Mar 14, 2019 at 2:50 PM Fd Habash wrote: > > I have a node which I know for certain was a cluster member last week. It > showed in nodetool status as DN. When I attempted to replace it today, I > got this message > > > > ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception > encountered during startup > > java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx > because it doesn't exist in gossip > > at > org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449) > ~[apache-cassandra-2.2.8.jar:2.2.8] > > > > > > DN 10.xx.xx.xx 388.43 KB 256 6.9% > bdbd632a-bf5d-44d4-b220-f17f258c4701 1e > > > > Under what conditions does this happen? > > > > > > > Thank you > > > > > > -- *Stefan Miklosovic**Senior Software Engineer* M: +61459911436 <https://www.instaclustr.com> <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr> <https://www.linkedin.com/company/instaclustr> Read our latest technical blog posts here <https://www.instaclustr.com/blog/>. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message. Instaclustr values your privacy. Our privacy policy can be found at https://www.instaclustr.com/company/policies/privacy-policy
Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip
Hi Fd, I tried this on 3 nodes cluster. I killed node 2, both node1 and node3 reported node2 to be DN, then I killed node1 and node3 and I restarted them and node2 was reported like this: [root@spark-master-1 /]# nodetool status Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack DN 172.19.0.8 ? 256 64.0% bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab r1 Datacenter: dc1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 172.19.0.5 382.75 KiB 256 64.4% 2a062140-2428-4092-b48b-7495d083d7f9 rack1 UN 172.19.0.9 171.41 KiB 256 71.6% 9590b791-ad53-4b5a-b4c7-b00408ed02dd rack3 Prior to killing of node1 and node3, node2 was indeed marked as DN but it was part of the "Datacenter: dc1" output where both node1 and node3 were. But after killing both node1 and node3 (so cluster was totally down), after restarting them, node2 was reported like that. I do not know what is the difference here. Are gossiping data somewhere stored on the disk? I would say so, otherwise there is no way how could node1 / node3 report that node2 is down but at the same time I dont get why it is "out of the list" where node1 and node3 are. On Fri, 15 Mar 2019 at 02:42, Fd Habash wrote: > I can conclusively say, none of these commands were run. However, I think > this is the likely scenario … > > > > If you have a cluster of three nodes 1,2,3 … > >- If 3 shows as DN >- Restart C* on 1 & 2 >- Nodetool status should NOT show node 3 IP at all. > > > > Restarting the cluster while a node is down resets gossip state. > > > > There is a good chance this is what happened. > > > > Plausible? > > > > > Thank you > > > > *From: *Jeff Jirsa > *Sent: *Thursday, March 14, 2019 11:06 AM > *To: *cassandra > *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't > exist ingossip > > > > Two things that wouldn't be a bug: > > > > You could have run removenode > > You could have run assassinate > > > > Also could be some new bug, but that's much less likely. > > > > > > On Thu, Mar 14, 2019 at 2:50 PM Fd Habash wrote: > > I have a node which I know for certain was a cluster member last week. It > showed in nodetool status as DN. When I attempted to replace it today, I > got this message > > > > ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception > encountered during startup > > java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx > because it doesn't exist in gossip > > at > org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449) > ~[apache-cassandra-2.2.8.jar:2.2.8] > > > > > > DN 10.xx.xx.xx 388.43 KB 256 6.9% > bdbd632a-bf5d-44d4-b220-f17f258c4701 1e > > > > Under what conditions does this happen? > > > > > > > Thank you > > > > > Stefan Miklosovic
Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.
Hi Leena, as already suggested in my previous email, you could use Apache Spark and Cassandra Spark connector (1). I have checked TTLs and I believe you should especially read this section (2) about TTLs. Seems like thats what you need to do, ttls per row. The workflow would be that you read from your source table, making transformations per row (via some mapping) and then you would save it to new table. This would import it "all" but until you switch to the new table and records are still being saved into the original one, I am not sure how to cover "the gap" in such sense that once you make the switch, you would miss records which were created in the first table after you did the loading. You could maybe leverage Spark streaming (Cassandra connector knows that too) so you would make this transformation on the fly with new ones. (1) https://github.com/datastax/spark-cassandra-connector (2) https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md#using-a-different-value-for-each-row On Thu, 14 Mar 2019 at 00:13, Leena Ghatpande wrote: > Understand, 2nd table would be a better approach. So what would be the > best way to copy 70M rows from current table to the 2nd table with ttl set > on each record as the first table? > > -- > *From:* Durity, Sean R > *Sent:* Wednesday, March 13, 2019 8:17 AM > *To:* user@cassandra.apache.org > *Subject:* RE: [EXTERNAL] Re: Migrate large volume of data from one table > to another table within the same cluster when COPY is not an option. > > > Correct, there is no current flag. I think there SHOULD be one. > > > > > > *From:* Dieudonné Madishon NGAYA > *Sent:* Tuesday, March 12, 2019 7:17 PM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: Migrate large volume of data from one table to > another table within the same cluster when COPY is not an option. > > > > Hi Sean, you can’t flag in Cassandra.yaml not allowing allow filtering , > the only thing you can do will be from your data model . > > Don’t ask Cassandra to query all data from table but the ideal query will > be using single partition. > > > > On Tue, Mar 12, 2019 at 6:46 PM Stefan Miklosovic < > stefan.mikloso...@instaclustr.com> wrote: > > Hi Sean, > > > > for sure, the best approach would be to create another table which would > treat just that specific query. > > > > How do I set the flag for not allowing allow filtering in cassandra.yaml? > I read a doco and there seems to be nothing about that. > > > > Regards > > > > On Wed, 13 Mar 2019 at 06:57, Durity, Sean R > wrote: > > If there are 2 access patterns, I would consider having 2 tables. The > first one with the ID, which you say is the majority use case. Then have a > second table that uses a time-bucket approach as others have suggested: > > (time bucket, id) as primary key > > Choose a time bucket (day, week, hour, month, whatever) that would hold > less than 100 MB of data in the time-bucket partition. > > > > You could include all relevant data in the second table to meet your > query. OR, if that data seems too large or too volatile to duplicate, just > include your primary key and look-up the data in the primary table as > needed. > > > > If you use allow filtering, you are setting yourself up for failure to > scale. I tell my developers, “if you use allow filtering, you are doing it > wrong.” In fact, I think the Cassandra admin should be able to set a flag > in cassandra.yaml to not allow filtering at all. The cluster should be able > to protect itself from bad queries. > > > > > > > > *From:* Leena Ghatpande > *Sent:* Tuesday, March 12, 2019 9:02 AM > *To:* Stefan Miklosovic ; > user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: Migrate large volume of data from one table to > another table within the same cluster when COPY is not an option. > > > > Our data model cannot be like below as you have recommended as majority of > the reads need to select the data by the partition key (id) only, not by > date. > > You could remodel your data in such way that you would make primary key > like this > > ((date), hour-minute, id) > > or > > ((date, hour-minute), id) > > > > > > By adding the date as clustering column, yes the idea was to use the Allow > Filtering on the date and pull the records. Understand that it is not > recommended to do this, but we have been doing this on another existing > large table and have not run into any issue so far. But want to understand > if there is a better approach to this? > > > > Thanks > > > -- > > *From
Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.
Hi Sean, for sure, the best approach would be to create another table which would treat just that specific query. How do I set the flag for not allowing allow filtering in cassandra.yaml? I read a doco and there seems to be nothing about that. Regards On Wed, 13 Mar 2019 at 06:57, Durity, Sean R wrote: > If there are 2 access patterns, I would consider having 2 tables. The > first one with the ID, which you say is the majority use case. Then have a > second table that uses a time-bucket approach as others have suggested: > > (time bucket, id) as primary key > > Choose a time bucket (day, week, hour, month, whatever) that would hold > less than 100 MB of data in the time-bucket partition. > > > > You could include all relevant data in the second table to meet your > query. OR, if that data seems too large or too volatile to duplicate, just > include your primary key and look-up the data in the primary table as > needed. > > > > If you use allow filtering, you are setting yourself up for failure to > scale. I tell my developers, “if you use allow filtering, you are doing it > wrong.” In fact, I think the Cassandra admin should be able to set a flag > in cassandra.yaml to not allow filtering at all. The cluster should be able > to protect itself from bad queries. > > > > > > > > *From:* Leena Ghatpande > *Sent:* Tuesday, March 12, 2019 9:02 AM > *To:* Stefan Miklosovic ; > user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: Migrate large volume of data from one table to > another table within the same cluster when COPY is not an option. > > > > Our data model cannot be like below as you have recommended as majority of > the reads need to select the data by the partition key (id) only, not by > date. > > You could remodel your data in such way that you would make primary key > like this > > ((date), hour-minute, id) > > or > > ((date, hour-minute), id) > > > > > > By adding the date as clustering column, yes the idea was to use the Allow > Filtering on the date and pull the records. Understand that it is not > recommended to do this, but we have been doing this on another existing > large table and have not run into any issue so far. But want to understand > if there is a better approach to this? > > > > Thanks > > > -- > > *From:* Stefan Miklosovic > *Sent:* Monday, March 11, 2019 7:12 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Migrate large volume of data from one table to another > table within the same cluster when COPY is not an option. > > > > The query which does not work should be like this, I made a mistake there > > > > cqlsh> SELECT * from my_keyspace.my_table where number > 2; > > InvalidRequest: Error from server: code=2200 [Invalid query] > message="Cannot execute this query as it might involve data filtering and > thus may have unpredictable performance. If you want to execute this query > despite the performance unpredictability, use ALLOW FILTERING" > > > > > > On Tue, 12 Mar 2019 at 10:10, Stefan Miklosovic < > stefan.mikloso...@instaclustr.com> wrote: > > Hi Leena, > > > > "We are thinking of creating a new table with a date field as a > clustering column to be able to query for date ranges, but partition key to > clustering key will be 1-1. Is this a good approach?" > > > > If you want to select by some time range here, I am wondering how would > making datetime a clustering column help you here? You still have to > provide primary key, right? > > > > E.g. select * from your_keyspace.your_table where id=123 and my_date > > yesterday and my_date < tomorrow (you got the idea) > > > > If you make my_date clustering column, you cant not do this below, because > you still have to specify partition key fully and then clustering key > (optionally) where you can further order and do ranges. But you cant do a > query without specifying partition key. Well, you can use ALLOW FILTERING > but you do not want to do this at all in your situation as it would scan > everything. > > > > select * from your_keyspace.your_table where my_date > yesterday and > my_date < tomorrow > > > > cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '1'}; > > cqlsh> CREATE TABLE my_keyspace.my_table (id uuid, number int, PRIMARY KEY > ((id), number)); > > > > cqlsh> SELECT * from my_keyspace.my_table ; > > > > id | number > > --+ > > 6e23f79a-8b67-47
Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.
The query which does not work should be like this, I made a mistake there cqlsh> SELECT * from my_keyspace.my_table where number > 2; InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING" On Tue, 12 Mar 2019 at 10:10, Stefan Miklosovic < stefan.mikloso...@instaclustr.com> wrote: > Hi Leena, > > "We are thinking of creating a new table with a date field as a > clustering column to be able to query for date ranges, but partition key to > clustering key will be 1-1. Is this a good approach?" > > If you want to select by some time range here, I am wondering how would > making datetime a clustering column help you here? You still have to > provide primary key, right? > > E.g. select * from your_keyspace.your_table where id=123 and my_date > > yesterday and my_date < tomorrow (you got the idea) > > If you make my_date clustering column, you cant not do this below, because > you still have to specify partition key fully and then clustering key > (optionally) where you can further order and do ranges. But you cant do a > query without specifying partition key. Well, you can use ALLOW FILTERING > but you do not want to do this at all in your situation as it would scan > everything. > > select * from your_keyspace.your_table where my_date > yesterday and > my_date < tomorrow > > cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '1'}; > cqlsh> CREATE TABLE my_keyspace.my_table (id uuid, number int, PRIMARY KEY > ((id), number)); > > cqlsh> SELECT * from my_keyspace.my_table ; > > id | number > --+ > 6e23f79a-8b67-47e0-b8e0-50be78bb1c7f | 3 > abdc0184-a695-427d-b63b-57cdf7a45f00 | 1 > 90fe112e-0f74-4cbc-8767-67bdc9c8c3b0 | 4 > 8cff3eb7-1aff-4dc7-9969-60190c7e4675 | 2 > > cqlsh> SELECT * from my_keyspace.my_table where id = > '6e23f79a-8b67-47e0-b8e0-50be78bb1c7f' and number > 2; > InvalidRequest: Error from server: code=2200 [Invalid query] > message="Invalid STRING constant (6e23f79a-8b67-47e0-b8e0-50be78bb1c7f) for > "id" of type uuid" > > cqlsh> SELECT * from my_keyspace.my_table where id = > 6e23f79a-8b67-47e0-b8e0-50be78bb1c7f and number > 2; > > id | number > --+ > 6e23f79a-8b67-47e0-b8e0-50be78bb1c7f | 3 > > You could remodel your data in such way that you would make primary key > like this > > ((date), hour-minute, id) > > or > > ((date, hour-minute), id) > > I would prefer the second one because if you expect a lot of data per day, > they would all end up on same set of replicas as hash of partition key > would be same whole day if you have same date all day so I think you would > end up with hotspots. You want to have your data spread more evenly so the > second one seems to be better to me. > > You can also investigate how to do this with materialized view but I am > not sure about the performance here. > > If you want to copy data you can do this e.g. by Cassandra Spark > connector, you would just read table and as you read it you would write to > another one. That is imho the fastest approach and the least error prone. > You can do that on live production data and you can just make a "switch" > afterwards. Not sure about ttls but that should be transparent while > copying that. > > On Tue, 12 Mar 2019 at 03:04, Leena Ghatpande > wrote: > >> We have a table with over 70M rows with a partition key that is unique. We >> have a created datetime stamp on each record, and we have a need to >> select all rows created for a date range. Secondary index is not an option >> as its high cardinality and could slow performance doing a full scan on 70M >> rows. >> >> >> We are thinking of creating a new table with a date field as a clustering >> column to be able to query for date ranges, but partition key to clustering >> key will be 1-1. Is this a good approach? >> >> To do this, we need to copy this large volume of data from table1 to >> table2 within the same cluster, while updates are still happening to >> table1. We need to do this real time without impacting our customers. COPY >> is not an option, as we have ttl's on each row on table1 that need to be >> applied to table2 as well. >> >> >> So what would be the best approach >> >>1. To be able select data using date range without impacting >>performance. This operation will be needed only on adhoc basis and it wont >>be as frequent . >>2. Best way to migrate large volume of data with ttl from one table >>to another within the same cluster. >> >> >> Any other suggestions also will be greatly appreciated. >> >> >> > > Stefan Miklosovic > Stefan Miklosovic
Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.
Hi Leena, "We are thinking of creating a new table with a date field as a clustering column to be able to query for date ranges, but partition key to clustering key will be 1-1. Is this a good approach?" If you want to select by some time range here, I am wondering how would making datetime a clustering column help you here? You still have to provide primary key, right? E.g. select * from your_keyspace.your_table where id=123 and my_date > yesterday and my_date < tomorrow (you got the idea) If you make my_date clustering column, you cant not do this below, because you still have to specify partition key fully and then clustering key (optionally) where you can further order and do ranges. But you cant do a query without specifying partition key. Well, you can use ALLOW FILTERING but you do not want to do this at all in your situation as it would scan everything. select * from your_keyspace.your_table where my_date > yesterday and my_date < tomorrow cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '1'}; cqlsh> CREATE TABLE my_keyspace.my_table (id uuid, number int, PRIMARY KEY ((id), number)); cqlsh> SELECT * from my_keyspace.my_table ; id | number --+ 6e23f79a-8b67-47e0-b8e0-50be78bb1c7f | 3 abdc0184-a695-427d-b63b-57cdf7a45f00 | 1 90fe112e-0f74-4cbc-8767-67bdc9c8c3b0 | 4 8cff3eb7-1aff-4dc7-9969-60190c7e4675 | 2 cqlsh> SELECT * from my_keyspace.my_table where id = '6e23f79a-8b67-47e0-b8e0-50be78bb1c7f' and number > 2; InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid STRING constant (6e23f79a-8b67-47e0-b8e0-50be78bb1c7f) for "id" of type uuid" cqlsh> SELECT * from my_keyspace.my_table where id = 6e23f79a-8b67-47e0-b8e0-50be78bb1c7f and number > 2; id | number --+ 6e23f79a-8b67-47e0-b8e0-50be78bb1c7f | 3 You could remodel your data in such way that you would make primary key like this ((date), hour-minute, id) or ((date, hour-minute), id) I would prefer the second one because if you expect a lot of data per day, they would all end up on same set of replicas as hash of partition key would be same whole day if you have same date all day so I think you would end up with hotspots. You want to have your data spread more evenly so the second one seems to be better to me. You can also investigate how to do this with materialized view but I am not sure about the performance here. If you want to copy data you can do this e.g. by Cassandra Spark connector, you would just read table and as you read it you would write to another one. That is imho the fastest approach and the least error prone. You can do that on live production data and you can just make a "switch" afterwards. Not sure about ttls but that should be transparent while copying that. On Tue, 12 Mar 2019 at 03:04, Leena Ghatpande wrote: > We have a table with over 70M rows with a partition key that is unique. We > have a created datetime stamp on each record, and we have a need to > select all rows created for a date range. Secondary index is not an option > as its high cardinality and could slow performance doing a full scan on 70M > rows. > > > We are thinking of creating a new table with a date field as a clustering > column to be able to query for date ranges, but partition key to clustering > key will be 1-1. Is this a good approach? > > To do this, we need to copy this large volume of data from table1 to > table2 within the same cluster, while updates are still happening to > table1. We need to do this real time without impacting our customers. COPY > is not an option, as we have ttl's on each row on table1 that need to be > applied to table2 as well. > > > So what would be the best approach > >1. To be able select data using date range without impacting >performance. This operation will be needed only on adhoc basis and it wont >be as frequent . >2. Best way to migrate large volume of data with ttl from one table to >another within the same cluster. > > > Any other suggestions also will be greatly appreciated. > > > Stefan Miklosovic
Re: data modelling
Hi Bobbie, as Kenneth already mentioned, you should model your schema based on what queries you are expecting to do and read related literature. From what I see your table is named "customer_sensor_tagids" so its quite possible you would have tagids as a part of primary key? Something like: select * from keyspace.customer_sensor_tagids where tag_id = 11358097. This implies that you would have as many records per customer and sensor ids as many tag_id's there are. If you want to query such table and you know customerid and sensorid in advance, you could query like select * from keyspace.customer_sensor_tagids where customerid = X and sensorid =Y and tag_id = 11358097 so your primary key would look like (customerid, sensorid, tagid) or ((customerid, sensorid), tagid) If you do not know customerid nor sensorid while doing a query, you would have to make tag_id a partition key and customerid and sensorid clustering columns, optionally ordered, thats up to you. Now you may object that there would be data duplication as you would have to have "as many tables as queries" which might be true but thats not in general a problem. Thats the cost you "pay" for having queries super fast and tailored for your use case. I suggest to read more about data modelling in general. On Wed, 6 Mar 2019 at 11:19, Bobbie Haynes wrote: > Hi >Could you help modelling this usecase > >I have below table ..I will update tagid's columns set(bigit) based on > PK. I have created the secondary index column on tagid to query like below.. > > Select * from keyspace.customer_sensor_tagids where tagids CONTAINS > 11358097; > > this query is doing the range scan because of the secondary index.. and > causing performance issues > > If i create a MV on Tagid's can i be able to query like above.. please > suggest a Datamodel for this scenario.Apprecite your help on this. > > --- > > --- > example of Tagids for each row:- >4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, 71756, > 8538307, 9936868, 10883336, 10954034, 10958062, 10976553, 10976554, > 10980255, 11009971, 11043805, 11075379, 11078819, 11167844, 11358097, > 11479340, 11481769, 11481770, 11481771, 11481772, 11693597, 11709012, > 12193230, 12421500, 12421516, 12421781, 12422011, 12422368, 12422501, > 12422512, 12422553, 12422555, 12423381, 12423382 > > > > --- > > --- > >CREATE TABLE keyspace.customer_sensor_tagids ( > customerid bigint, > sensorid bigint, > XXX frozen, > XXX frozen, > XXX text, > XXX text, > XXX frozen, > XXX bigint, > XXX bigint, > XXX list>, > XXX frozen, > XXX boolean, > XXX bigint, > XXX list>, > XXX frozen, > XXX bigint, > XXX bigint, > XXX list>, > XXX list>, > XXX set>, > XXX set, > XXX set, > tagids set, > XXX bigint, > XXX list>, > PRIMARY KEY ((customerid, sensorid)) > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids)); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX)); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX)); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > -- *Stefan Miklosovic**Senior Software Engineer* M: +61459911436 <https://www.instaclustr.com> <https://www.facebook.com/instaclustr> <h