Missing data
Hi, I have reloaded the data in my cluster of 3 nodes RF: 2. I have loaded about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes. For me this is quite acceptable since I should not be doing this again. I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems. Now I read the log files on the client side, there I see no warning and no errors. On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS. My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions. 1) Is this a know problem? 2) Do you have any idea how I could track down this problem? 3) What is the meaning of this WARNING (the only type of ERROR | WARN I could find)? WARN [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!] 4) Is it possible to have Tombstone when we make no DELETE statements? I’m lost… Thanks for your help.
RE: PrepareStatement problem
This only applies to “select *” queries where you don’t specify the column names. There is a reported bug and fixed in 2.1.3. See https://issues.apache.org/jira/browse/CASSANDRA-7910 From: joseph gao [mailto:gaojf.bok...@gmail.com] Sent: Monday, June 15, 2015 10:52 AM To: user@cassandra.apache.org Subject: PrepareStatement problem hi, all I'm using PrepareStatement. If I prepare a sql everytime I use, cassandra will give me a warning tell me NOT PREPARE EVERYTIME. So I Cache the PrepareStatement locally . But when other client change the table's schema, like, add a new Column, If I still use the former Cached PrepareStatement, the metadata will dismatch the data. The metadata tells n column, and the data tells n+1 column. So what should I do to avoid this problem? -- -- Joseph Gao PhoneNum:15210513582 QQ: 409343351
Catastrophy Recovery.
Hi, I have a cluster of 3 nodes RF: 2. There are about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I am have tested a scenario where one node crashes and loose all its data. I have deleted all data on this node after having stopped Cassandra. At this point I noticed that the cluster was giving proper results. What I was expecting from a cluster DB. I then restarted that node and I observed that the node was joining the cluster. After an hour or so the old “defect” node was up and normal. I noticed that its hard disk loaded with much less data than its neighbours. When I was querying the DB, the cluster was giving me different results for successive identical queries. I guess the old “defect” node was giving me less rows than it should have. 1) For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct? 2) After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe? 3) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair? Thanks for your comments. Kind regards Jean
Re: Missing data
Hi Jean, The problem of that Warning is that you are reading too many tombstones per request. If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay jean.tremb...@zen-innovations.com wrote: Hi, I have reloaded the data in my cluster of 3 nodes RF: 2. I have loaded about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes. For me this is quite acceptable since I should not be doing this again. I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems. Now I read the log files on the client side, there I see no warning and no errors. On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS. My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions. 1) Is this a know problem? 2) Do you have any idea how I could track down this problem? 3) What is the meaning of this WARNING (the only type of ERROR | WARN I could find)? WARN [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!] 4) Is it possible to have Tombstone when we make no DELETE statements? I’m lost… Thanks for your help. -- --
Re: Catastrophy Recovery.
That is really wonderful. Thank you very much Alain. You gave me a lot of trails to investigate. Thanks again for you help. On 15 Jun 2015, at 17:49 , Alain RODRIGUEZ arodr...@gmail.commailto:arodr...@gmail.com wrote: Hi, it looks like your starting to use Cassandra. Welcome. I invite you to read from here as much as you can http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html. When a node lose some data you have various anti entropy mechanism Hinted Handoff -- For writes that occurred while node was down and known as such by other nodes (exclusively) Read repair -- On each read, you can set a chance to check other nodes for auto correction. Repair ( called either manual / anti entropy / full / ...) : Which takes care to give back a node its missing data only for the range this node handles (-pr) or for all its data (its range plus its replica). This is something you generally want to perform on all nodes on a regular basis (lower than the lowest gc_grace_period set on any of your tables). Also, you are having wrong values because you probably have a Consistency Level (CL) too low. If you want this to never happen you have to set Read (R) / Write (W) consistency level as follow : R + W RF (Refplication Factor), if not you can see what you are currently seeing. I advise you to set your consistency to local_quorum or quorum on single DC environment. Also, with 3 nodes, you should set RF to 3, if not you won't be able to reach a strong consistency due to the formula I just give you. There is a lot more to know, you should read about this all. Using Cassandra without knowing about its internals would lead you to very poor and unexpected results. To answer your questions: For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct? -- Not at all, unless it join the ring for the first time, which is not your case. Through it will (by default) slowly recover while you read. After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe? No, we can't ensure that, excepted dropping the node and bootstrapping a new one. What we can make sure of is that there is enough replica remaining to serve consistent data (search for RF and CL) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair? This sentence is false bootstrap / joining phase ≠ from broken node coming back. You are right on repair, if a broken node (or down for too long - default 3 hours) come back you have to repair. But repair is slow, make sure you can afford a node, see my previous answer. Testing is a really good idea but you also have to read a lot imho. Good luck, C*heers, Alain 2015-06-15 11:13 GMT+02:00 Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com: Hi, I have a cluster of 3 nodes RF: 2. There are about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I am have tested a scenario where one node crashes and loose all its data. I have deleted all data on this node after having stopped Cassandra. At this point I noticed that the cluster was giving proper results. What I was expecting from a cluster DB. I then restarted that node and I observed that the node was joining the cluster. After an hour or so the old “defect” node was up and normal. I noticed that its hard disk loaded with much less data than its neighbours. When I was querying the DB, the cluster was giving me different results for successive identical queries. I guess the old “defect” node was giving me less rows than it should have. 1) For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct? 2) After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe? 3) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair? Thanks for your comments. Kind regards Jean
Re: Catastrophy Recovery.
Hi, it looks like your starting to use Cassandra. Welcome. I invite you to read from here as much as you can http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html . When a node lose some data you have various anti entropy mechanism Hinted Handoff -- For writes that occurred while node was down and known as such by other nodes (exclusively) Read repair -- On each read, you can set a chance to check other nodes for auto correction. Repair ( called either manual / anti entropy / full / ...) : Which takes care to give back a node its missing data only for the range this node handles (-pr) or for all its data (its range plus its replica). This is something you generally want to perform on all nodes on a regular basis (lower than the lowest gc_grace_period set on any of your tables). Also, you are having wrong values because you probably have a Consistency Level (CL) too low. If you want this to never happen you have to set Read (R) / Write (W) consistency level as follow : R + W RF (Refplication Factor), if not you can see what you are currently seeing. I advise you to set your consistency to local_quorum or quorum on single DC environment. Also, with 3 nodes, you should set RF to 3, if not you won't be able to reach a strong consistency due to the formula I just give you. There is a lot more to know, you should read about this all. Using Cassandra without knowing about its internals would lead you to very poor and unexpected results. To answer your questions: For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct? -- Not at all, unless it join the ring for the first time, which is not your case. Through it will (by default) slowly recover while you read. After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe? No, we can't ensure that, excepted dropping the node and bootstrapping a new one. What we can make sure of is that there is enough replica remaining to serve consistent data (search for RF and CL) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair? This sentence is false bootstrap / joining phase ≠ from broken node coming back. You are right on repair, if a broken node (or down for too long - default 3 hours) come back you have to repair. But repair is slow, make sure you can afford a node, see my previous answer. Testing is a really good idea but you also have to read a lot imho. Good luck, C*heers, Alain 2015-06-15 11:13 GMT+02:00 Jean Tremblay jean.tremb...@zen-innovations.com : Hi, I have a cluster of 3 nodes RF: 2. There are about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I am have tested a scenario where one node crashes and loose all its data. I have deleted all data on this node after having stopped Cassandra. At this point I noticed that the cluster was giving proper results. What I was expecting from a cluster DB. I then restarted that node and I observed that the node was joining the cluster. After an hour or so the old “defect” node was up and normal. I noticed that its hard disk loaded with much less data than its neighbours. When I was querying the DB, the cluster was giving me different results for successive identical queries. I guess the old “defect” node was giving me less rows than it should have. 1) For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct? 2) After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe? 3) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair? Thanks for your comments. Kind regards Jean
RE: Lucene index plugin for Apache Cassandra
Hi Andres, This looks awesome, many thanks for your work on this. Just out of curiosity, how does this compare to the DSE Cassandra with embedded Solr? Do they provide very similar functionality? Is there a list of obvious pros and cons of one versus the other? Thanks! Matthew *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* 13 June 2015 13:20 *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra Thanks for showing interest. Faceting is not yet supported, but it is in our roadmap. Our goal is to add to Cassandra as many Lucene features as possible. 2015-06-12 18:21 GMT+02:00 Mohammed Guller moham...@glassbeam.com: The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed *From:* Andres de la Peña [mailto:adelap...@stratio.com] *Sent:* Friday, June 12, 2015 3:43 AM *To:* user@cassandra.apache.org *Subject:* Re: Lucene index plugin for Apache Cassandra I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of range queries over multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table, if you can pay the cost of indexing. Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's also a good idea to put the Lucene directory files in a separate disk to those used by C* itself. Additionally, you should consider that indexed tables write throughput will be appreciably reduced, maybe to a few thousands rows per second. It's really hard to estimate the amount of resources needed by the index due to the great variety of indexing and querying ways that Lucene offers, so the only thing we can suggest is to empirically find the optimal setup for your use case. 2015-06-12 12:00 GMT+02:00 Carlos Rolo r...@pythian.com: Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña adelap...@stratio.com wrote: Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can also take a look at this presentation http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/ to see how cluster distribution is done. 2015-06-12 0:45 GMT+02:00 Ben Bromhead b...@instaclustr.com: Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster sizes e.g. 20 nodes, 60 nodes, 100s+? On 10 June 2015 at 09:08, Andres de la Peña adelap...@stratio.com wrote: Hi all, With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based implementation of C* secondary indexes https://github.com/Stratio/cassandra-lucene-index as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of now, the fork is discontinued and new users should use the recently created plugin, which maintains all the features of Stratio Cassandra https://github.com/Stratio/stratio-cassandra. Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed search engine capabilities such as with ElasticSearch or Solr, including full text search capabilities, free multivariable search, relevance queries and field-based sorting. Each node indexes its own data, so high availability and scalability is guaranteed. We hope this will be useful to the Apache Cassandra community. Regards, -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD* -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692 -- Andrés de la Peña http://www.stratio.com/ Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd
Re: Nodetool ring and Replicas after 1.2 upgrade
maybe check the system.log to see if there is any exception and/or error? check as well if they are having consistent schema for the keyspace? hth jason On Tue, Jun 16, 2015 at 7:17 AM, Michael Theroux mthero...@yahoo.com wrote: Hello, We (finally) have just upgraded from Cassandra 1.1 to Cassandra 1.2.19. Everything appears to be up and running normally, however, we have noticed unusual output from nodetool ring. There is a new (to us) field Replicas in the nodetool output, and this field, seemingly at random, is changing from 2 to 3 and back to 2. We are using the byte ordered partitioner (we hash our own keys), and have a replication factor of 3. We are also on AWS and utilize the Ec2snitch on a single Datacenter. Other calls appear to be normal. nodetool getEndpoints returns the proper endpoints when querying various keys, nodetool ring and status return that all nodes appear healthy. Anyone have any hints on what maybe happening, or if this is a problem we should be concerned with? Thanks, -Mike
Re: Missing data
You can get tombstones from inserting null values. Not sure if that’s the problem, but it is another way of getting tombstones in your data. On Jun 15, 2015, at 10:50 AM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Dear all, I identified a bit more closely the root cause of my missing data. The problem is occurring when I use dependency groupIdcom.datastax.cassandra/groupId artifactIdcassandra-driver-core/artifactId version2.1.6/version /dependency on my client against Cassandra 2.1.6. I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4. Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6. !! So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!! —— As far as my tombstones are concerned I don’t understand their origin. I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project). And yet I have many tombstones building up. Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them? @Carlos thanks for your guidance. Kind regards Jean On 15 Jun 2015, at 11:17 , Carlos Rolo r...@pythian.commailto:r...@pythian.com wrote: Hi Jean, The problem of that Warning is that you are reading too many tombstones per request. If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.comhttp://www.pythian.com/ On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Hi, I have reloaded the data in my cluster of 3 nodes RF: 2. I have loaded about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes. For me this is quite acceptable since I should not be doing this again. I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems. Now I read the log files on the client side, there I see no warning and no errors. On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS. My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions. 1) Is this a know problem? 2) Do you have any idea how I could track down this problem? 3) What is the meaning of this WARNING (the only type of ERROR | WARN I could find)? WARN [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!] 4) Is it possible to have Tombstone when we make no DELETE statements? I’m lost… Thanks for your help. --
Re: Seed Node OOM
On Sat, Jun 13, 2015 at 4:39 AM, Oleksandr Petrov oleksandr.pet...@gmail.com wrote: We're using Cassandra, recently migrated to 2.1.6, and we're experiencing constant OOMs in one of our clusters. Maybe this memory leak? https://issues.apache.org/jira/browse/CASSANDRA-9549 =Rob
Re: Catastrophy Recovery.
Alain great write-up on the recovery procedure. You had covered both RF factor and Consistency levels. As mentioned two anti entropy mechanisms, hinted hand off's and Read Repair work for temporary node outage and incremental recovery. In case of disaster/catastrophic recovery, nodetool repair is best way to recover back. Is below procedure would have ensured node being added properly to the cluster? Adding nodes to an existing cluster | DataStax Cassandra 2.0 Documentation | | | | | | | | | Adding nodes to an existing cluster | DataStax Cassandra 2.0 DocumentationSteps to add nodes when using virtual nodes. | Version 2.0 | | | | View on docs.datastax.com | Preview by Yahoo | | | | | Naidu Saladi From: Jean Tremblay jean.tremb...@zen-innovations.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: Monday, June 15, 2015 10:58 AM Subject: Re: Catastrophy Recovery. That is really wonderful. Thank you very much Alain. You gave me a lot of trails to investigate. Thanks again for you help. On 15 Jun 2015, at 17:49 , Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, it looks like your starting to use Cassandra. Welcome. I invite you to read from here as much as you can http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html. When a node lose some data you have various anti entropy mechanism Hinted Handoff -- For writes that occurred while node was down and known as such by other nodes (exclusively)Read repair -- On each read, you can set a chance to check other nodes for auto correction.Repair ( called either manual / anti entropy / full / ...) : Which takes care to give back a node its missing data only for the range this node handles (-pr) or for all its data (its range plus its replica). This is something you generally want to perform on all nodes on a regular basis (lower than the lowest gc_grace_period set on any of your tables). Also, you are having wrong values because you probably have a Consistency Level (CL) too low. If you want this to never happen you have to set Read (R) / Write (W) consistency level as follow : R + W RF (Refplication Factor), if not you can see what you are currently seeing. I advise you to set your consistency to local_quorum or quorum on single DC environment. Also, with 3 nodes, you should set RF to 3, if not you won't be able to reach a strong consistency due to the formula I just give you. There is a lot more to know, you should read about this all. Using Cassandra without knowing about its internals would lead you to very poor and unexpected results. To answer your questions: For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its old data from its neighbour while doing the joining phase. Is this correct? -- Not at all, unless it join the ring for the first time, which is not your case. Through it will (by default) slowly recover while you read. After such catastrophe, and after the joining phase is done should the cluster not be ready to deliver always consistent data if there was no inserts or delete during the catastrophe? No, we can't ensure that, excepted dropping the node and bootstrapping a new one. What we can make sure of is that there is enough replica remaining to serve consistent data (search for RF and CL) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not simply a repair to be done on that node using “node repair? This sentence is false bootstrap / joining phase ≠ from broken node coming back. You are right on repair, if a broken node (or down for too long - default 3 hours) come back you have to repair. But repair is slow, make sure you can afford a node, see my previous answer. Testing is a really good idea but you also have to read a lot imho. Good luck, C*heers, Alain 2015-06-15 11:13 GMT+02:00 Jean Tremblay jean.tremb...@zen-innovations.com: Hi, I have a cluster of 3 nodes RF: 2. There are about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I am have tested a scenario where one node crashes and loose all its data.I have deleted all data on this node after having stopped Cassandra.At this point I noticed that the cluster was giving proper results. What I was expecting from a cluster DB. I then restarted that node and I observed that the node was joining the cluster.After an hour or so the old “defect” node was up and normal. I noticed that its hard disk loaded with much less data than its neighbours. When I was querying the DB, the cluster was giving me different results for successive identical queries.I guess the old “defect” node was giving me less rows than it should have. 1) For what I understand, if you have a fixed node with no data it will automatically bootstrap and recover all its
Re: Missing data
Thanks Robert, but I don’t insert NULL values, but thanks anyway. On 15 Jun 2015, at 19:16 , Robert Wille rwi...@fold3.commailto:rwi...@fold3.com wrote: You can get tombstones from inserting null values. Not sure if that’s the problem, but it is another way of getting tombstones in your data. On Jun 15, 2015, at 10:50 AM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Dear all, I identified a bit more closely the root cause of my missing data. The problem is occurring when I use dependency groupIdcom.datastax.cassandra/groupId artifactIdcassandra-driver-core/artifactId version2.1.6/version /dependency on my client against Cassandra 2.1.6. I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4. Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6. !! So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!! —— As far as my tombstones are concerned I don’t understand their origin. I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project). And yet I have many tombstones building up. Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them? @Carlos thanks for your guidance. Kind regards Jean On 15 Jun 2015, at 11:17 , Carlos Rolo r...@pythian.commailto:r...@pythian.com wrote: Hi Jean, The problem of that Warning is that you are reading too many tombstones per request. If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.comhttp://www.pythian.com/ On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Hi, I have reloaded the data in my cluster of 3 nodes RF: 2. I have loaded about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes. For me this is quite acceptable since I should not be doing this again. I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems. Now I read the log files on the client side, there I see no warning and no errors. On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS. My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions. 1) Is this a know problem? 2) Do you have any idea how I could track down this problem? 3) What is the meaning of this WARNING (the only type of ERROR | WARN I could find)? WARN [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!] 4) Is it possible to have Tombstone when we make no DELETE statements? I’m lost… Thanks for your help. --
Re: Missing data
Dear all, I identified a bit more closely the root cause of my missing data. The problem is occurring when I use dependency groupIdcom.datastax.cassandra/groupId artifactIdcassandra-driver-core/artifactId version2.1.6/version /dependency on my client against Cassandra 2.1.6. I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4. Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6. !! So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!! —— As far as my tombstones are concerned I don’t understand their origin. I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project). And yet I have many tombstones building up. Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them? @Carlos thanks for your guidance. Kind regards Jean On 15 Jun 2015, at 11:17 , Carlos Rolo r...@pythian.commailto:r...@pythian.com wrote: Hi Jean, The problem of that Warning is that you are reading too many tombstones per request. If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.comhttp://www.pythian.com/ On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Hi, I have reloaded the data in my cluster of 3 nodes RF: 2. I have loaded about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes. For me this is quite acceptable since I should not be doing this again. I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems. Now I read the log files on the client side, there I see no warning and no errors. On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS. My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions. 1) Is this a know problem? 2) Do you have any idea how I could track down this problem? 3) What is the meaning of this WARNING (the only type of ERROR | WARN I could find)? WARN [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!] 4) Is it possible to have Tombstone when we make no DELETE statements? I’m lost… Thanks for your help. --
Re: Missing data
Theres your problem, you're using the DataStax java driver :) I just ran into this issue in the last week and it was incredibly frustrating. If you are doing a simple loop on a select * query, then the DataStax java driver will only process 2^31 rows (e.g. the Java Integer Max (2,147,483,647)) before it stops w/o any error or output in the logs. The fact that you said you only had about 2 billion rows but you are seeing missing data is a red flag. I found the only way around this is to do your select * in chunks based on the token range (see this gist for an example: https://gist.github.com/baholladay/21eb4c61ea8905302195 ) Just loop for every 100million rows and make a new query select * from TABLE where token(key) lastToken Thanks, Bryan On Mon, Jun 15, 2015 at 12:50 PM, Jean Tremblay jean.tremb...@zen-innovations.com wrote: Dear all, I identified a bit more closely the root cause of my missing data. The problem is occurring when I use dependency groupIdcom.datastax.cassandra/groupId artifactIdcassandra-driver-core/artifactId version2.1.6/version /dependency on my client against Cassandra 2.1.6. I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4. Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6. !! So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!! —— As far as my tombstones are concerned I don’t understand their origin. I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project). And yet I have many tombstones building up. Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them? @Carlos thanks for your guidance. Kind regards Jean On 15 Jun 2015, at 11:17 , Carlos Rolo r...@pythian.com wrote: Hi Jean, The problem of that Warning is that you are reading too many tombstones per request. If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay jean.tremb...@zen-innovations.com wrote: Hi, I have reloaded the data in my cluster of 3 nodes RF: 2. I have loaded about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes. For me this is quite acceptable since I should not be doing this again. I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems. Now I read the log files on the client side, there I see no warning and no errors. On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS. My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions. 1) Is this a know problem? 2) Do you have any idea how I could track down this problem? 3) What is the meaning of this WARNING (the only type of ERROR | WARN I could find)? WARN [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!] 4) Is it possible to have Tombstone when we make no DELETE statements? I’m lost… Thanks for your help. --
Re: Missing data
Thanks Bryan. I believe I have a different problem with the Datastax 2.1.6 driver. My problem is not that I make huge selects. My problem seems more to occur on some inserts. I inserts MANY rows and with the version 2.1.6 of the driver I seem to be loosing some records. But thanks anyway I will remember your mail when I bump into the select problem. Cheers Jean On 15 Jun 2015, at 19:13 , Bryan Holladay holla...@longsight.commailto:holla...@longsight.com wrote: Theres your problem, you're using the DataStax java driver :) I just ran into this issue in the last week and it was incredibly frustrating. If you are doing a simple loop on a select * query, then the DataStax java driver will only process 2^31 rows (e.g. the Java Integer Max (2,147,483,647)) before it stops w/o any error or output in the logs. The fact that you said you only had about 2 billion rows but you are seeing missing data is a red flag. I found the only way around this is to do your select * in chunks based on the token range (see this gist for an example: https://gist.github.com/baholladay/21eb4c61ea8905302195 ) Just loop for every 100million rows and make a new query select * from TABLE where token(key) lastToken Thanks, Bryan On Mon, Jun 15, 2015 at 12:50 PM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Dear all, I identified a bit more closely the root cause of my missing data. The problem is occurring when I use dependency groupIdcom.datastax.cassandra/groupId artifactIdcassandra-driver-core/artifactId version2.1.6/version /dependency on my client against Cassandra 2.1.6. I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4. Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6. !! So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!! —— As far as my tombstones are concerned I don’t understand their origin. I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project). And yet I have many tombstones building up. Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them? @Carlos thanks for your guidance. Kind regards Jean On 15 Jun 2015, at 11:17 , Carlos Rolo r...@pythian.commailto:r...@pythian.com wrote: Hi Jean, The problem of that Warning is that you are reading too many tombstones per request. If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolohttp://linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649tel:%2B1%20613%20565%208696%20x1649 www.pythian.comhttp://www.pythian.com/ On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Hi, I have reloaded the data in my cluster of 3 nodes RF: 2. I have loaded about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes. For me this is quite acceptable since I should not be doing this again. I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems. Now I read the log files on the client side, there I see no warning and no errors. On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS. My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions. 1) Is this a know problem? 2) Do you have any idea how I could track down this problem? 3) What is the meaning of this WARNING (the only type of ERROR | WARN I could find)? WARN [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!] 4) Is it possible to have Tombstone when we make no DELETE statements? I’m lost… Thanks for your help. --
counters still inconsistent after repair
Currently on 2.1.6 I'm seeing behavior like the following: cqlsh:walker select * from counter_table where field = 'test'; field | value ---+--- test |30 (1 rows) cqlsh:walker select * from counter_table where field = 'test'; field | value ---+--- test |90 (1 rows) cqlsh:walker select * from counter_table where field = 'test'; field | value ---+--- test |30 (1 rows) Using tracing I can see that one node has wrong data. However running repair on this table does not seem to have done anything, I still see the wrong value returned from this same node. Potentially relevant facts: - Recently upgraded to 2.1.6 from 2.0.14 - This table has ~million rows, low contention, and fairly high increment rate Mainly wondering: - Is this known or expected? I know Cassandra counters have had issues but thought by now it should be able to keep a consistent counter or at least repair it... - Any way to reset this counter? - Any other stuff I can check?
Nodetool ring and Replicas after 1.2 upgrade
Hello, We (finally) have just upgraded from Cassandra 1.1 to Cassandra 1.2.19. Everything appears to be up and running normally, however, we have noticed unusual output from nodetool ring. There is a new (to us) field Replicas in the nodetool output, and this field, seemingly at random, is changing from 2 to 3 and back to 2. We are using the byte ordered partitioner (we hash our own keys), and have a replication factor of 3. We are also on AWS and utilize the Ec2snitch on a single Datacenter. Other calls appear to be normal. nodetool getEndpoints returns the proper endpoints when querying various keys, nodetool ring and status return that all nodes appear healthy. Anyone have any hints on what maybe happening, or if this is a problem we should be concerned with? Thanks,-Mike
Re: counters still inconsistent after repair
On Mon, Jun 15, 2015 at 2:52 PM, Dan Kinder dkin...@turnitin.com wrote: Potentially relevant facts: - Recently upgraded to 2.1.6 from 2.0.14 - This table has ~million rows, low contention, and fairly high increment rate Can you repro on a counter that was created after the upgrade? Mainly wondering: - Is this known or expected? I know Cassandra counters have had issues but thought by now it should be able to keep a consistent counter or at least repair it... All counters which haven't been written to after 2.1 new counters are still on disk as old counters and will remain that way until UPDATEd and then compacted together with all old shards. Old counters can exhibit this behavior. - Any way to reset this counter? Per Aleksey (in IRC) you can turn a replica for an old counter into a new counter by UPDATEing it once. In order to do that without modifying the count, you can [1] : UPDATE tablename SET countercolumn = countercolumn +0 where id = 1; The important caveat that this must be done at least once per shard, with one shard per RF. The only way one can be sure that all shards have been UPDATEd is by contacting each replica node and doing the UPDATE + 0 there, because local writes are preferred. To summarize, the optimal process to upgrade your pre-existing counters to 2.1-era new counters : 1) get a list of all counter keys 2) get a list of replicas per counter key 3) connect to each replica for each counter key and issue an UPDATE + 0 for that counter key 4) run a major compaction As an aside, Aleksey suggests that the above process is so heavyweight that it may not be worth it. If you just leave them be, all counters you're actually used will become progressively more accurate over time. =Rob [1] Special thanks to Jeff Jirsa for verifying that this syntax works.
PrepareStatement problem
hi, all I'm using PrepareStatement. If I prepare a sql everytime I use, cassandra will give me a warning tell me NOT PREPARE EVERYTIME. So I Cache the PrepareStatement locally . But when other client change the table's schema, like, add a new Column, If I still use the former Cached PrepareStatement, the metadata will dismatch the data. The metadata tells n column, and the data tells n+1 column. So what should I do to avoid this problem? -- -- Joseph Gao PhoneNum:15210513582 QQ: 409343351