Re: Cassandra Counters and TTL
Hello, Thanks for your answer. See my reply in-line. On 11/04/2011 01:46 PM, Amit Chavan wrote: Answers inline. On Fri, Nov 4, 2011 at 4:59 PM, Vlad Paiu vladp...@opensips.org mailto:vladp...@opensips.org wrote: Hello, I'm a new user of Cassandra and I think it's great. Still, while developing my APP using Cassandra, I got stuck with some things and I'm not really sure that Cassandra can handle them at the moment. So, first of all, does Cassandra allow for Counters and regular Keys to be located in the same ColumnFamily ? What do you mean when you say regular Keys? If you are hinting at columns apart from counters, then the answer is *no*: only counters can exist in a CounterColumnFamily and other column families cannot hold counters. Yes, this is what I was asking. Thanks for the answer. Secondly, is there any way to dynamically set the TTL for a key ? In the sense that I have a key, I initially set it with no TTL, but after a while I decide that it should expire in 100 seconds. Can Cassandra do this ? TTL is not for one key, it is for one column. When I was saying 'Key' I actually meant to say column. Seems I'm not yet very acquainted with Cassandra terminology. So in the end, can you dynamically alter the TTL of a Column ? 3. Can counters have a TTL ? No. Currently, counters do not (or if I am correct - cannot) have TTL. Ok. Any info if this will be implemented anytime soon ? 4. Is there any way to atomically reset a counter ? I read on the website that the only way to do it is read the variable value, and then set it to -value, which seems rather bogus to me. I think that is the only way to reset a counter. I would like to know if there is another way. Ok then, waiting for someone to confirm. It's bad that you cannot atomically reset a counter value, as a two-way resetting might lead to undetermined behaviour. Also, can I set the counter to a specific value, without keeping state on the client ? For example, if the client does not know the current counter value is 3. Can it set the counter value to 10, without first getting the counter value, and then incrementing by 7 ? Background: I am using Cassandra since the past two months. Hope the community corrects me if I am wrong. Regards, -- Vlad Paiu OpenSIPS Developer -- Regards Amit S. Chavan Regards, Vlad Paiu OpenSIPS Developer
Re: Second Cassandra users survey
Take a look at this: http://www.oracle.com/technetwork/database/nosqldb/overview/index.html I understand the limitation/advantages of the architecture. Read this http://en.wikipedia.org/wiki/CAP_theorem
Re: Modeling big data to allow filtering with a lot of distinct combinations of dimesions, in real time and with no latency
Hi again. Did you receive my mail ? It's the first time I use this mailing list. If you received it, did anybody face this problem ? It looks like this subject is going to be discussed at Cassandra NYC meeting. http://www.datastax.com/2011/11/joe-stein-of-medialets-to-speak-at-cassandra-nyc Any idea of what they are going to say about this subject or have I to wait ? Will the video record of this conference be public ? thanks, Alain 2011/11/4 Alain RODRIGUEZ arodr...@gmail.com Hi all, I started this thread in the phpCassa google group, but I thinks its place is here. There is my first post : I was wondering about a specific point of Cassandra Modeling. If I need to know the number of connexion to my website using each browser, every hour, I can do: Row key: $browser, column key: date('YmdH', $timestamp), value: counter. I can increment this counter for any visit, this should work. The point is that I want to be able to render the results of a lot of statistics used as filters. I mean, I will have information such as browser, browser version, screen resolution, OS, OS version, localization... And I want to allow users to get data (number of views) filtering it as much as they want. For example, if I want to know how many people visited my website with safari, windos, and from New York, every hour, I can store: Row key : $browser:$os:$localization, column key : date('YmdH', $timestamp), value : counter. This can't be the best solution because according to the combinational mathematics I will have to store n! counters to be able to store data with all filters. If I got 10 filters I will increment 3 628 800 counters. That's not the good solution, for sure. How am I supposed to model this to be able to read data with any filter I want ? Thanks, Alain And there is the first answer given (thanks to Tyler Hobbs) : Technically, the number of potential different counters would be the cardinality of each field multiplied together. (Since one of the fields holds a time, this number would continue to grow.) However, in practice you'll have far fewer than this number of counters, because not every possible combination of these will happen. That's not the good solution, for sure. How am I supposed to model this to be able to read data with any filter I want ? It's a reasonable solution if you want to be able to drill down and filter by any attribute. If you want to be able to filter based on all of these attributes, you have to store that information about every request in one way or another. I know it's a non-trivial problem, but I'm sure that some people already faced this problem before I do. I'll allow user to filter however they want, chosing dimensions with checkboxes. They will be able to combine dimensions and ask for any combination. So, with this solution, I will have to store every event n times, with n = number of possible combinations. I saw this yesterday : http://t.co/EXL6yAO8 (thanks to Dave Gardner). This company seems to something equivalent of the idea exposed in my first post Any experience to share with this kind of problem ? thank you, Alain
Multiple Keyword Lookup Indexes
Hallo, We are implementing a Cassandra-backed user database. The challange in this is that there are 4 different sort of user IDs that all need to be indexed in order to access user data via them quickly. For example the user has a unique UUID, but also a LoginName and an email address, which can all be used for authentication. How do I model this in Cassandra? My approach would be to have one main table which is indexed by the most frequently used lookup value as row-key, lets say this is the UUID. This table would contain all customer data. Then I would create a index table for each of the other login alternatives, where I just reference to the UUID. So each alternative login which is not using the UUID would require two Cassandra queries. Are there any better approaches to model this? Also, I read somewhere that Cassandra is not optimized for these reference tables which are very short with two columns only. What is the reason for that? thanks, Felix
Re: Modeling big data to allow filtering with a lot of distinct combinations of dimesions, in real time and with no latency
В Mon, 7 Nov 2011 11:18:12 +0100 Alain RODRIGUEZ arodr...@gmail.com пишет: Hi again. Did you receive my mail ? It's the first time I use this mailing list. If you received it, did anybody face this problem ? It looks like this subject is going to be discussed at Cassandra NYC meeting. http://www.datastax.com/2011/11/joe-stein-of-medialets-to-speak-at-cassandra-nyc Any idea of what they are going to say about this subject or have I to wait ? Will the video record of this conference be public ? thanks, Alain 2011/11/4 Alain RODRIGUEZ arodr...@gmail.com Hi all, I started this thread in the phpCassa google group, but I thinks its place is here. There is my first post : I was wondering about a specific point of Cassandra Modeling. If I need to know the number of connexion to my website using each browser, every hour, I can do: Row key: $browser, column key: date('YmdH', $timestamp), value: counter. I can increment this counter for any visit, this should work. The point is that I want to be able to render the results of a lot of statistics used as filters. I mean, I will have information such as browser, browser version, screen resolution, OS, OS version, localization... And I want to allow users to get data (number of views) filtering it as much as they want. For example, if I want to know how many people visited my website with safari, windos, and from New York, every hour, I can store: Row key : $browser:$os:$localization, column key : date('YmdH', $timestamp), value : counter. This can't be the best solution because according to the combinational mathematics I will have to store n! counters to be able to store data with all filters. If I got 10 filters I will increment 3 628 800 counters. That's not the good solution, for sure. How am I supposed to model this to be able to read data with any filter I want ? Thanks, Alain And there is the first answer given (thanks to Tyler Hobbs) : Technically, the number of potential different counters would be the cardinality of each field multiplied together. (Since one of the fields holds a time, this number would continue to grow.) However, in practice you'll have far fewer than this number of counters, because not every possible combination of these will happen. That's not the good solution, for sure. How am I supposed to model this to be able to read data with any filter I want ? It's a reasonable solution if you want to be able to drill down and filter by any attribute. If you want to be able to filter based on all of these attributes, you have to store that information about every request in one way or another. I know it's a non-trivial problem, but I'm sure that some people already faced this problem before I do. I'll allow user to filter however they want, chosing dimensions with checkboxes. They will be able to combine dimensions and ask for any combination. So, with this solution, I will have to store every event n times, with n = number of possible combinations. I saw this yesterday : http://t.co/EXL6yAO8 (thanks to Dave Gardner). This company seems to something equivalent of the idea exposed in my first post Any experience to share with this kind of problem ? thank you, Alain Looks like Your mail has been recieved but for now nowbody has an answer. As for me - I'm a cassandra newbie and definitely can't help :-( Best regards Alexander
Re: Cassandra Counters and TTL
On Mon, Nov 7, 2011 at 10:12 AM, Vlad Paiu vladp...@opensips.org wrote: Hello, Thanks for your answer. See my reply in-line. On 11/04/2011 01:46 PM, Amit Chavan wrote: Answers inline. On Fri, Nov 4, 2011 at 4:59 PM, Vlad Paiu vladp...@opensips.org wrote: Hello, I'm a new user of Cassandra and I think it's great. Still, while developing my APP using Cassandra, I got stuck with some things and I'm not really sure that Cassandra can handle them at the moment. So, first of all, does Cassandra allow for Counters and regular Keys to be located in the same ColumnFamily ? What do you mean when you say regular Keys? If you are hinting at columns apart from counters, then the answer is *no*: only counters can exist in a CounterColumnFamily and other column families cannot hold counters. Yes, this is what I was asking. Thanks for the answer. Secondly, is there any way to dynamically set the TTL for a key ? In the sense that I have a key, I initially set it with no TTL, but after a while I decide that it should expire in 100 seconds. Can Cassandra do this ? TTL is not for one key, it is for one column. When I was saying 'Key' I actually meant to say column. Seems I'm not yet very acquainted with Cassandra terminology. So in the end, can you dynamically alter the TTL of a Column ? You'll have to update the column with the new TTL. Which does involve that you know the column value and so may require reading the column first. 3. Can counters have a TTL ? No. Currently, counters do not (or if I am correct - cannot) have TTL. Ok. Any info if this will be implemented anytime soon ? The current status is not anytime soon because we don't have a good solution for it so far. See https://issues.apache.org/jira/browse/CASSANDRA-2103 for more details. 4. Is there any way to atomically reset a counter ? I read on the website that the only way to do it is read the variable value, and then set it to -value, which seems rather bogus to me. I think that is the only way to reset a counter. I would like to know if there is another way. Ok then, waiting for someone to confirm. It's bad that you cannot atomically reset a counter value, as a two-way resetting might lead to undetermined behaviour. There is no other way. Which does mean that you need some external way to make sure that not two client will attempt resetting the same counter at the same time. Or model so that you don't need counter resets (I'm not saying this is always possible, but there is probably a number of cases where resetting a counter could be replaced by switching to a brand new counter). Also, can I set the counter to a specific value, without keeping state on the client ? For example, if the client does not know the current counter value is 3. Can it set the counter value to 10, without first getting the counter value, and then incrementing by 7 ? No. -- Sylvain Background: I am using Cassandra since the past two months. Hope the community corrects me if I am wrong. Regards, -- Vlad Paiu OpenSIPS Developer -- Regards Amit S. Chavan Regards, Vlad Paiu OpenSIPS Developer
Re: Debian package jna bug workaroung
Thanks - this is working correctly I have checked the classpath in vissual vm and it contains jna, and cassandra-cli describe reports SerializingCacheProvider, If you add jna a second time with the Sun jvm you seem to get the exception which had me wondering what cache provider was active. p -- java.class.path=/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share /cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.3.jar:/usr/share/cassandra/apache-cassandra-1.0.1.jar:/usr/share/cassandra/apache-cassandra-thrift-1.0.1.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar java.class.version=50.0 From: paul cannon p...@datastax.com To: user@cassandra.apache.org Sent: Friday, 4 November 2011, 19:05 Subject: Re: Debian package jna bug workaroung The cassandra-cli tool will show you, if you're using at least cassandra 1.0.1, in a describe command. If not, you can make a thrift describe_keyspace() call some other way, and check the value of the appropriate CfDef's row_cache_provider string. If it's SerializingCacheProvider, it's off-heap. Note that I think you need to create the columnfamily while JNA is present, not just have JNA present when cassandra starts. Might be wrong on that. p On Thu, Nov 3, 2011 at 4:10 PM, Peter Tillotson slatem...@yahoo.co.uk wrote: Cassandra 1.0.1 and only seemed to happen with * JAVA_HOME=/usr/lib/jvm/java-6-sun and jna.jar copied into /usr/share/cassandra(/lib) I then saw the detail in the init script and how it was being linked Is there a way I can verify which provider is being used? I want to make sure Off heap is being used in the default config. On 03/11/11 19:06, paul cannon wrote: I can't reproduce this. What version of the cassandra deb are you using, exactly, and why are you symlinking or copying jna.jar into /usr/share/cassandra? The initscript should be adding /usr/sahre/java/jna.jar to the classpath, and that should be all you need. The failure you see with o.a.c.cache.FreeableMemory is not because the jre can't find the class, it's just that it can't initialize the class (because it needs JNA, and it can't find JNA). p On Wed, Nov 2, 2011 at 4:42 AM, Peter Tillotson slatem...@yahoo.co.uk mailto:slatem...@yahoo.co.uk wrote: see below * JAVA_HOME=/usr/lib/jvm/java-6-openjdk works -- Reading the documentation over at Datastax “The Debian and RPM packages of Cassandra install JNA automatically” http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management And indeed the Debian package depends on jna, and the /etc/init.d/cassandra looks as though in adds /usr/share/java/jna.jar to the classpath, and here is the but. If I copy or symlink jna.jar into: * /usr/share/cassandra * /usr/share/cassandra/lib Whenever a column family initialises I get: java.lang.NoClassDefFoundError: Could not initialize class org.apache.cassandra.cache.FreeableMemory This suggests to me that: 1) By default, for me at least, in Debian jna.jar is not on the classpath 2) There is an additional classpath issue jar -tf apache-cassandra.jar | grep FreeableMemory succeeds I'm running on: * Ubuntu 10.04 x64 * JAVA_HOME=/usr/lib/jvm/java-6-sun Full stack traces: java.lang.NoClassDefFoundError: Could not initialize class com.sun.jna.Native at com.sun.jna.Pointer.clinit(Pointer.java:42) at org.apache.cassandra.cache.SerializingCache.serialize(SerializingCache.java:92) at org.apache.cassandra.cache.SerializingCache.put(SerializingCache.java:154) at org.apache.cassandra.cache.InstrumentingCache.put(InstrumentingCache.java:63) at org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1150) at
Re: Will writes with ALL consistency eventually propagate?
Anthony and Jaydeep, thank you for weighing in. I am glad to see that they are two different values (makes more sense mentally to me). Anthony, what you said caught my attention to ensure all nodes have a copy you may not be able to survive the loss of a single node. -- why would this be the case? I assumed (incorrectly?) that a node would simply disappear off the map until I could bring it back up again, at which point all the missing values that it didn't get while it was done, it would slowly retrieve from other members of the ring. Is this the wrong understanding? If forcing a replication factor equal to the number of nodes in my ring will cause a hard-stop when one ring goes down (as I understood your comment to mean), it seems to me I should go with a much lower replication factor... something along the lines of 3 or roughly ceiling(N / 2) and just deal with the latency when one of the nodes has to route a request to another server when it doesn't contain the value. Is there a better way to accomplish what I want, or is keeping the replication factor that aggressively high generally a bad thing and using Cassandra in the wrong way? Thank you for the help. -Riyad On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep chovatia_jayd...@yahoo.co.in wrote: Hi Riyad, You can set replication = 5 (number of replicas) and write with CL = ONE. There is no hard requirement from Cassandra to write with CL=ALL to replicate the data unless you need it. Considering your example, If you write with CL=ONE then also it will replicate your data to all 5 replicas eventually. Thank you, Jaydeep -- *From:* Riyad Kalla rka...@gmail.com *To:* user@cassandra.apache.org user@cassandra.apache.org *Sent:* Sunday, 6 November 2011 9:50 PM *Subject:* Will writes with ALL consistency eventually propagate? I am new to Cassandra and was curious about the following scenario... Lets say i have a ring of 5 servers. Ultimately I would like each server to be a full replication of the next (master-master-*). In a presentation i watched today on Cassandra, the presenter mentioned that the ring members will shard data and route your requests to the right host when they come in to a server that doesnt physically contain the value you wanted. To the client requesting this is seamless excwpt for the added latency. If i wanted to avoid the routing and latency and ensure every server had the full data set, do i have to write with a consistency level of ALL and wait for all of those writes to return in my code, or can i write with a CL of 1 or 2 and let the ring propagate the rest of the copies to the other servers in the background after my code has continued executing? I dont mind eventual consistency in my case, but i do (eventually) want all nodes to have all values and cannot tell if this is default behavior, or if sharding is the default and i can only force duplicates onto the other servers explicitly with a CL of ALL. Best, Riyad
Re: Will writes with ALL consistency eventually propagate?
Riyad, I'm also just getting to know the different settings and values myself :) I believe, and it also depends on your config, CL.ONE Should ignore the loss of a node if your RF is 5, once you increase the CL then if you lose a node the CL is not met and you will get exceptions returned. Sent from my iPhone On 07/11/2011, at 4:32, Riyad Kalla rka...@gmail.com wrote: Anthony and Jaydeep, thank you for weighing in. I am glad to see that they are two different values (makes more sense mentally to me). Anthony, what you said caught my attention to ensure all nodes have a copy you may not be able to survive the loss of a single node. -- why would this be the case? I assumed (incorrectly?) that a node would simply disappear off the map until I could bring it back up again, at which point all the missing values that it didn't get while it was done, it would slowly retrieve from other members of the ring. Is this the wrong understanding? If forcing a replication factor equal to the number of nodes in my ring will cause a hard-stop when one ring goes down (as I understood your comment to mean), it seems to me I should go with a much lower replication factor... something along the lines of 3 or roughly ceiling(N / 2) and just deal with the latency when one of the nodes has to route a request to another server when it doesn't contain the value. Is there a better way to accomplish what I want, or is keeping the replication factor that aggressively high generally a bad thing and using Cassandra in the wrong way? Thank you for the help. -Riyad On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep chovatia_jayd...@yahoo.co.in wrote: Hi Riyad, You can set replication = 5 (number of replicas) and write with CL = ONE. There is no hard requirement from Cassandra to write with CL=ALL to replicate the data unless you need it. Considering your example, If you write with CL=ONE then also it will replicate your data to all 5 replicas eventually. Thank you, Jaydeep From: Riyad Kalla rka...@gmail.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: Sunday, 6 November 2011 9:50 PM Subject: Will writes with ALL consistency eventually propagate? I am new to Cassandra and was curious about the following scenario... Lets say i have a ring of 5 servers. Ultimately I would like each server to be a full replication of the next (master-master-*). In a presentation i watched today on Cassandra, the presenter mentioned that the ring members will shard data and route your requests to the right host when they come in to a server that doesnt physically contain the value you wanted. To the client requesting this is seamless excwpt for the added latency. If i wanted to avoid the routing and latency and ensure every server had the full data set, do i have to write with a consistency level of ALL and wait for all of those writes to return in my code, or can i write with a CL of 1 or 2 and let the ring propagate the rest of the copies to the other servers in the background after my code has continued executing? I dont mind eventual consistency in my case, but i do (eventually) want all nodes to have all values and cannot tell if this is default behavior, or if sharding is the default and i can only force duplicates onto the other servers explicitly with a CL of ALL. Best, Riyad
Re: Second Cassandra users survey
Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Will writes with ALL consistency eventually propagate?
Ah! Ok I was interpreting what you were saying to mean that if my RF was too high, then the ring would die if I lost one. Ultimately what I want (I think) is: Replication Factor: 5 (aka all of my nodes) Consistency Level: 2 Put another way, when I write a value, I want it to exist on two servers *at least* before I consider that write successful enough for my code to continue, but in the background I would like Cassandra to keep copying that value around at its leisure until all the ring nodes know about it. This sounds like what I need. Thanks for pointing me in the right direction. Best, Riyad On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda anthony.ikeda@gmail.comwrote: Riyad, I'm also just getting to know the different settings and values myself :) I believe, and it also depends on your config, CL.ONE Should ignore the loss of a node if your RF is 5, once you increase the CL then if you lose a node the CL is not met and you will get exceptions returned. Sent from my iPhone On 07/11/2011, at 4:32, Riyad Kalla rka...@gmail.com wrote: Anthony and Jaydeep, thank you for weighing in. I am glad to see that they are two different values (makes more sense mentally to me). Anthony, what you said caught my attention to ensure all nodes have a copy you may not be able to survive the loss of a single node. -- why would this be the case? I assumed (incorrectly?) that a node would simply disappear off the map until I could bring it back up again, at which point all the missing values that it didn't get while it was done, it would slowly retrieve from other members of the ring. Is this the wrong understanding? If forcing a replication factor equal to the number of nodes in my ring will cause a hard-stop when one ring goes down (as I understood your comment to mean), it seems to me I should go with a much lower replication factor... something along the lines of 3 or roughly ceiling(N / 2) and just deal with the latency when one of the nodes has to route a request to another server when it doesn't contain the value. Is there a better way to accomplish what I want, or is keeping the replication factor that aggressively high generally a bad thing and using Cassandra in the wrong way? Thank you for the help. -Riyad On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep chovatia_jayd...@yahoo.co.in wrote: Hi Riyad, You can set replication = 5 (number of replicas) and write with CL = ONE. There is no hard requirement from Cassandra to write with CL=ALL to replicate the data unless you need it. Considering your example, If you write with CL=ONE then also it will replicate your data to all 5 replicas eventually. Thank you, Jaydeep -- *From:* Riyad Kalla rka...@gmail.com *To:* user@cassandra.apache.org user@cassandra.apache.org *Sent:* Sunday, 6 November 2011 9:50 PM *Subject:* Will writes with ALL consistency eventually propagate? I am new to Cassandra and was curious about the following scenario... Lets say i have a ring of 5 servers. Ultimately I would like each server to be a full replication of the next (master-master-*). In a presentation i watched today on Cassandra, the presenter mentioned that the ring members will shard data and route your requests to the right host when they come in to a server that doesnt physically contain the value you wanted. To the client requesting this is seamless excwpt for the added latency. If i wanted to avoid the routing and latency and ensure every server had the full data set, do i have to write with a consistency level of ALL and wait for all of those writes to return in my code, or can i write with a CL of 1 or 2 and let the ring propagate the rest of the copies to the other servers in the background after my code has continued executing? I dont mind eventual consistency in my case, but i do (eventually) want all nodes to have all values and cannot tell if this is default behavior, or if sharding is the default and i can only force duplicates onto the other servers explicitly with a CL of ALL. Best, Riyad
Re: Second Cassandra users survey
This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
order of output in get_slice
Does a call to listColumnOrSuperColumn get_slice(binary key, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) give us any guarantees on the order of the returned list? I understand that when the predicate actually contains a sliceRange, then the order _is_ guaranteed to be increasing (decreasing if the reverse flag is set). But when the predicate contains a list of column names instead of a range, do we also have the guarantee that the order is increasing (no decreasing option because no reverse flag here)? Greetings, Roland
Determining Strategy options for a particular Strategy class
Is there a programmatic way to determine what the valid 'keys' are for the strategy options for a particular strategy class?
Re: Second Cassandra users survey
We are using Cassandra for time series storage. Strong points: write performance. Pain points: dinamically adding column families as new time series come in. Caused a lot of headaches, mismatchers between nodes, etc. In the end we just put everything together in a single (huge) column family. Wish list: A decent GUI to explore data kept in Cassandra would be much valuable. It should also be extendable to provide viewers for custom data. Il 11/1/2011 23:59 PM, Jonathan Ellis ha scritto: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
Re: Will writes with ALL consistency eventually propagate?
Consistency Level is a pseudo-enum... you have the choice between ONE Quorum (and there are different types of this) ALL At CL=ONE, only one node is guaranteed to have got the write if the operation is a success. At CL=ALL, all nodes that the RF says it should be stored at must confirm the write before the operation succeeds, but a partial write will succeed eventually if at least one node recorded the write At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the operation to succeed, otherwise failure, but a partial write will succeed eventually if at least one node recorded the write. Read repair will eventually ensure that the write is replicated across all RF nodes in the cluster. The N in QUORUM above depends on the type of QUORUM you choose, in general think N=RF unless you choose a fancy QUORUM. To have a consistent read, CL of write + CL of read must be RF... Write at ONE, read at ONE = may not get the most recent write if RF 1 [fastest write, fastest read] {data loss possible if node lost before read repair} Write at QUORUM, read at ONE = consistent read [moderate write, fastest read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at ONE = consistent read, writes may be blocked if any node fails [slowest write, fastest read] Write at ONE, read at QUORUM = may not get the most recent write if RF 2 [fastest write, moderate read] {data loss possible if node lost before read repair} Write at QUORUM, read at QUORUM = consistent read [moderate write, moderate read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at QUORUM = consistent read, writes may be blocked if any node fails [slowest write, moderate read] Write at ONE, read at ALL = consistent read, reads may fail if any node fails [fastest write, slowest read] {data loss possible if node lost before read repair} Write at QUORUM, read at ALL = consistent read, reads may fail if any node fails [moderate write, slowest read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at ALL = consistent read, writes may be blocked if any node fails, reads may fail if any node fails [slowest write, slowest read] Note: You can choose the CL for each and every operation. This is something that you should design into your application (unless you exclusively use QUORUM for all operations, in which case you are advised to bake the logic in, but it is less necessary) The other thing to remember is that RF does not have to equal the number of nodes in your cluster... in fact I would recommend designing your app on the basis that RF number of nodes in your cluster... because at some point, when your data set grows big enough, you will end up with RF number of nodes. -Stephen On 7 November 2011 13:03, Riyad Kalla rka...@gmail.com wrote: Ah! Ok I was interpreting what you were saying to mean that if my RF was too high, then the ring would die if I lost one. Ultimately what I want (I think) is: Replication Factor: 5 (aka all of my nodes) Consistency Level: 2 Put another way, when I write a value, I want it to exist on two servers *at least* before I consider that write successful enough for my code to continue, but in the background I would like Cassandra to keep copying that value around at its leisure until all the ring nodes know about it. This sounds like what I need. Thanks for pointing me in the right direction. Best, Riyad On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Riyad, I'm also just getting to know the different settings and values myself :) I believe, and it also depends on your config, CL.ONE Should ignore the loss of a node if your RF is 5, once you increase the CL then if you lose a node the CL is not met and you will get exceptions returned. Sent from my iPhone On 07/11/2011, at 4:32, Riyad Kalla rka...@gmail.com wrote: Anthony and Jaydeep, thank you for weighing in. I am glad to see that they are two different values (makes more sense mentally to me). Anthony, what you said caught my attention to ensure all nodes have a copy you may not be able to survive the loss of a single node. -- why would this be the case? I assumed (incorrectly?) that a node would simply disappear off the map until I could bring it back up again, at which point all the missing values that it didn't get while it was done, it would slowly retrieve from other members of the ring. Is this the wrong understanding? If forcing a replication factor equal to the number of nodes in my ring will cause a hard-stop when one ring goes down (as I understood your comment to mean), it seems to me I should go with a much lower replication factor... something along the lines of 3 or roughly ceiling(N / 2) and just deal with the latency when one of the nodes has to route a request to another server when it doesn't contain the value. Is there a better way to accomplish what I want, or is keeping the
Re: Multiple Keyword Lookup Indexes
You could directly use secondary indexes on the other fields instead of handling yourself your indexes : Define your global id (can be UUID), and have columns loginName, email etc with a secondary index. Retrieval will then be fast. 2011/11/7 Felix Sprick fspr...@gmail.com: Hallo, We are implementing a Cassandra-backed user database. The challange in this is that there are 4 different sort of user IDs that all need to be indexed in order to access user data via them quickly. For example the user has a unique UUID, but also a LoginName and an email address, which can all be used for authentication. How do I model this in Cassandra? My approach would be to have one main table which is indexed by the most frequently used lookup value as row-key, lets say this is the UUID. This table would contain all customer data. Then I would create a index table for each of the other login alternatives, where I just reference to the UUID. So each alternative login which is not using the UUID would require two Cassandra queries. Are there any better approaches to model this? Also, I read somewhere that Cassandra is not optimized for these reference tables which are very short with two columns only. What is the reason for that? thanks, Felix -- sent from my Nokia 3210
Re: Will writes with ALL consistency eventually propagate?
Stephen, Excellent breakdown; I appreciate all the detail. Your last comment about RF being smaller than N (number of nodes) -- in my particular case my data set isn't particularly large (a few GB) and is distributed globally across a handful of data centers. What I am utilizing Cassandra for is the replication in order to minimize latency for requests. So when a request comes into any location, I want each node in the ring to contain the full data set so it never needs to defer to another member of the ring to answer a question (even if this means eventually consistency, that is alright in my case). Given that, the way I've understood this discussion so far is I would have a RF of N (my total node count) but my Consistency Level with all my writes will *likely* be QUORUM -- I think that is a good/safe default for me to use as writes aren't the scenario I need to optimize for latency; that being said, I also don't want to wait for a ConsistencyLevel of ALL to complete before my code continues though. Would you agree with this assessment or am I missing the boat on something? Best, Riyad On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly stephen.alan.conno...@gmail.com wrote: Consistency Level is a pseudo-enum... you have the choice between ONE Quorum (and there are different types of this) ALL At CL=ONE, only one node is guaranteed to have got the write if the operation is a success. At CL=ALL, all nodes that the RF says it should be stored at must confirm the write before the operation succeeds, but a partial write will succeed eventually if at least one node recorded the write At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the operation to succeed, otherwise failure, but a partial write will succeed eventually if at least one node recorded the write. Read repair will eventually ensure that the write is replicated across all RF nodes in the cluster. The N in QUORUM above depends on the type of QUORUM you choose, in general think N=RF unless you choose a fancy QUORUM. To have a consistent read, CL of write + CL of read must be RF... Write at ONE, read at ONE = may not get the most recent write if RF 1 [fastest write, fastest read] {data loss possible if node lost before read repair} Write at QUORUM, read at ONE = consistent read [moderate write, fastest read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at ONE = consistent read, writes may be blocked if any node fails [slowest write, fastest read] Write at ONE, read at QUORUM = may not get the most recent write if RF 2 [fastest write, moderate read] {data loss possible if node lost before read repair} Write at QUORUM, read at QUORUM = consistent read [moderate write, moderate read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at QUORUM = consistent read, writes may be blocked if any node fails [slowest write, moderate read] Write at ONE, read at ALL = consistent read, reads may fail if any node fails [fastest write, slowest read] {data loss possible if node lost before read repair} Write at QUORUM, read at ALL = consistent read, reads may fail if any node fails [moderate write, slowest read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at ALL = consistent read, writes may be blocked if any node fails, reads may fail if any node fails [slowest write, slowest read] Note: You can choose the CL for each and every operation. This is something that you should design into your application (unless you exclusively use QUORUM for all operations, in which case you are advised to bake the logic in, but it is less necessary) The other thing to remember is that RF does not have to equal the number of nodes in your cluster... in fact I would recommend designing your app on the basis that RF number of nodes in your cluster... because at some point, when your data set grows big enough, you will end up with RF number of nodes. -Stephen On 7 November 2011 13:03, Riyad Kalla rka...@gmail.com wrote: Ah! Ok I was interpreting what you were saying to mean that if my RF was too high, then the ring would die if I lost one. Ultimately what I want (I think) is: Replication Factor: 5 (aka all of my nodes) Consistency Level: 2 Put another way, when I write a value, I want it to exist on two servers *at least* before I consider that write successful enough for my code to continue, but in the background I would like Cassandra to keep copying that value around at its leisure until all the ring nodes know about it. This sounds like what I need. Thanks for pointing me in the right direction. Best, Riyad On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda anthony.ikeda@gmail.com wrote: Riyad, I'm also just getting to know the different settings and values myself :) I believe, and it also depends on your config, CL.ONE Should ignore the loss of a node
Counters and replication factor
Hi, I trying to switch from a RF = 1 to a RF = 3, but I get wrong values from counters when doing so... I got a CF that contains many counters of some events. When I'm at RF = 1 and simulate 10 events, they are well counted. However, when I switch to a RF = 3, my counter show a wrong value that sometimes change when requested twice (it can return 7, then 5 instead of 10 all the time). I first thought that it was a problem of CL because I seem to remember that I read once that I had to use CL.One for reads and writes with counters. So I tried with CL.One, without success... What am I doing wrong ? Is that some precaution to take when replicating counters ? Alain
Re: Second Cassandra users survey
So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? In current cassandra you can use nodetool move for rebalancing. Its fast operation, portion of existing data is moved to new server.
Re: Counters and replication factor
Alain, Try using a CL of 3 or ALL and see if that the problem goes away. Your replication factor (as I just learned) dictates how many nodes each piece of data is replicated to; by using a RF of 3 you are saying replicate all my data to all my nodes (in this case counters). This doesn't happen immediately, but you can *force* it to happen on write by specifying a CL of ALL. If you specify 1 then your counter value is written to one member of the ring, then your command returns. If you keep querying you will bounce around your ring, reading the values from the different nodes until a future date at *which point* all the values will likely agree. If you keep all your code you have now exactly the same, just change the code at the end where you read the counter value back, to keep reading the counter value back every second for 60 seconds and see if all the values eventually match up -- they should (as the counter value is replicated to all the nodes and their old values discarded). -R On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, I trying to switch from a RF = 1 to a RF = 3, but I get wrong values from counters when doing so... I got a CF that contains many counters of some events. When I'm at RF = 1 and simulate 10 events, they are well counted. However, when I switch to a RF = 3, my counter show a wrong value that sometimes change when requested twice (it can return 7, then 5 instead of 10 all the time). I first thought that it was a problem of CL because I seem to remember that I read once that I had to use CL.One for reads and writes with counters. So I tried with CL.One, without success... What am I doing wrong ? Is that some precaution to take when replicating counters ? Alain
Re: Second Cassandra users survey
Actually, the data will be visible at QUORUM as well if you can see it with ONE. QUORUM actually gives you a higher chance of seeing the new value than ONE does. In the case of R=3 you have 2/3 chance of seeing the new value with QUORUM, with ONE you have 1/3... And this JIRA fixed an issue where two QUORUM reads in a row could give you the NEW value and then the OLD value. https://issues.apache.org/jira/browse/CASSANDRA-2494 So quorum read on fail for a single row always gives consistent results now. For multiple rows your still have issues, but you can always mitigate that in app with something like giving all of the changes the same time stamp, and then on read checking to make sure the time stamps match, and reading the data again if they don't. I'm not arguing against atomic batch operations, they would be nice =). Just clarifying how things work now. -Jeremiah On 11/06/2011 02:05 PM, Pierre Chalamet wrote: - support for atomic operations or batches (if QUORUM fails, data should not be visible with ONE) zookeeper is solving that. I might have screwed up a little bit since I didn't talk about isolation; let's reformulate: support for read committed (using DB terminology). Cassandra is more like read uncommitted. Even if row mutations in one CF for one key are atomic on one server , stuff is not rolled back when the CL can't be satisfied at the coordinator level. Data won't be visible at QUORUM level, but when using weaker CL, invalid data can appear imho. Also it should be possible to tell which operations failed with batch_mutate but unfortunately it is not
Re: Second Cassandra users survey
- Batch read/slice from multiple column families. On 11/01/2011 05:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
Re: Counters and replication factor
Alain, When you tried CL.All was that only after you had made the change of ReplicationFactor=3 and restarted all the servers? If you hadn't restarted the servers with the new RF, I am not sure that CL.All would have the intended effect. Also, I wasn't sure what you meant by but know every request returns me always the same count value... -- didn't want the requests to always return you the same values? Or maybe you are saying that it always returns the same *wrong* value? Like you do: counter.increment (v=1) counter.increment (v=2) counter.increment (v=3) counter.getValue = returns 7 counter.getValue = returns 7 counter.getValue = returns 7 or something inconsistent like that? On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I've tried with CL.All, but it doesn't wotk better. I still have strange values (between 4 and 10 events counted instead of 10) but know every request returns me always the same count value... It's very strange. Any other idea ? Alain 2011/11/7 Riyad Kalla rka...@gmail.com Alain, Try using a CL of 3 or ALL and see if that the problem goes away. Your replication factor (as I just learned) dictates how many nodes each piece of data is replicated to; by using a RF of 3 you are saying replicate all my data to all my nodes (in this case counters). This doesn't happen immediately, but you can *force* it to happen on write by specifying a CL of ALL. If you specify 1 then your counter value is written to one member of the ring, then your command returns. If you keep querying you will bounce around your ring, reading the values from the different nodes until a future date at *which point* all the values will likely agree. If you keep all your code you have now exactly the same, just change the code at the end where you read the counter value back, to keep reading the counter value back every second for 60 seconds and see if all the values eventually match up -- they should (as the counter value is replicated to all the nodes and their old values discarded). -R On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi, I trying to switch from a RF = 1 to a RF = 3, but I get wrong values from counters when doing so... I got a CF that contains many counters of some events. When I'm at RF = 1 and simulate 10 events, they are well counted. However, when I switch to a RF = 3, my counter show a wrong value that sometimes change when requested twice (it can return 7, then 5 instead of 10 all the time). I first thought that it was a problem of CL because I seem to remember that I read once that I had to use CL.One for reads and writes with counters. So I tried with CL.One, without success... What am I doing wrong ? Is that some precaution to take when replicating counters ? Alain
Re: Second Cassandra users survey
This is basically what entity groups are about - https://issues.apache.org/jira/browse/CASSANDRA-1684 On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote: This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Counters and replication factor
I retried it after restarting all the servers. I still have wrong results (I simulated an event 5 times and it was counted 3 times by some counters 4 or 5 times by others. What I meant by but now every request returns me always the same count value... will be easier to explain with an example : event 1: counter1.increment counter2.increment counter3.increment . . . event 5: counter1.increment counter2.increment counter3.increment Show results : counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 5 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 4 counter2.getValue = returns 4 counter3.getValue = returns 5 ... So I've got wrong values, and not always the same ones. In my previous email I tried to tell you by saying but now every request returns me always the same count value... that I had all the time the same wrong values, let us say : counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 But that is not true, I still have some random wrong values, maybe haven't I query to get counter values often enough to see it last time. Sorry of not being clearer, that is not easy to explain, neither to understand for me. Thanks for help. Alain 2011/11/7 Riyad Kalla rka...@gmail.com Alain, When you tried CL.All was that only after you had made the change of ReplicationFactor=3 and restarted all the servers? If you hadn't restarted the servers with the new RF, I am not sure that CL.All would have the intended effect. Also, I wasn't sure what you meant by but know every request returns me always the same count value... -- didn't want the requests to always return you the same values? Or maybe you are saying that it always returns the same *wrong* value? Like you do: counter.increment (v=1) counter.increment (v=2) counter.increment (v=3) counter.getValue = returns 7 counter.getValue = returns 7 counter.getValue = returns 7 or something inconsistent like that? On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: I've tried with CL.All, but it doesn't wotk better. I still have strange values (between 4 and 10 events counted instead of 10) but know every request returns me always the same count value... It's very strange. Any other idea ? Alain 2011/11/7 Riyad Kalla rka...@gmail.com Alain, Try using a CL of 3 or ALL and see if that the problem goes away. Your replication factor (as I just learned) dictates how many nodes each piece of data is replicated to; by using a RF of 3 you are saying replicate all my data to all my nodes (in this case counters). This doesn't happen immediately, but you can *force* it to happen on write by specifying a CL of ALL. If you specify 1 then your counter value is written to one member of the ring, then your command returns. If you keep querying you will bounce around your ring, reading the values from the different nodes until a future date at *which point* all the values will likely agree. If you keep all your code you have now exactly the same, just change the code at the end where you read the counter value back, to keep reading the counter value back every second for 60 seconds and see if all the values eventually match up -- they should (as the counter value is replicated to all the nodes and their old values discarded). -R On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi, I trying to switch from a RF = 1 to a RF = 3, but I get wrong values from counters when doing so... I got a CF that contains many counters of some events. When I'm at RF = 1 and simulate 10 events, they are well counted. However, when I switch to a RF = 3, my counter show a wrong value that sometimes change when requested twice (it can return 7, then 5 instead of 10 all the time). I first thought that it was a problem of CL because I seem to remember that I read once that I had to use CL.One for reads and writes with counters. So I tried with CL.One, without success... What am I doing wrong ? Is that some precaution to take when replicating counters ? Alain
Re: Counters and replication factor
This sound like a bug 'a priori'. Do you mind opening a ticket at https://issues.apache.org/jira/browse/CASSANDRA? It will help if you can specify which version you are using and the exact procedure you did that leads to that. If know how to reproduce, that would be even better. -- Sylvain On Mon, Nov 7, 2011 at 5:57 PM, Alain RODRIGUEZ arodr...@gmail.com wrote: I retried it after restarting all the servers. I still have wrong results (I simulated an event 5 times and it was counted 3 times by some counters 4 or 5 times by others. What I meant by but now every request returns me always the same count value... will be easier to explain with an example : event 1: counter1.increment counter2.increment counter3.increment . . . event 5: counter1.increment counter2.increment counter3.increment Show results : counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 5 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 4 counter2.getValue = returns 4 counter3.getValue = returns 5 ... So I've got wrong values, and not always the same ones. In my previous email I tried to tell you by saying but now every request returns me always the same count value... that I had all the time the same wrong values, let us say : counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 But that is not true, I still have some random wrong values, maybe haven't I query to get counter values often enough to see it last time. Sorry of not being clearer, that is not easy to explain, neither to understand for me. Thanks for help. Alain 2011/11/7 Riyad Kalla rka...@gmail.com Alain, When you tried CL.All was that only after you had made the change of ReplicationFactor=3 and restarted all the servers? If you hadn't restarted the servers with the new RF, I am not sure that CL.All would have the intended effect. Also, I wasn't sure what you meant by but know every request returns me always the same count value... -- didn't want the requests to always return you the same values? Or maybe you are saying that it always returns the same *wrong* value? Like you do: counter.increment (v=1) counter.increment (v=2) counter.increment (v=3) counter.getValue = returns 7 counter.getValue = returns 7 counter.getValue = returns 7 or something inconsistent like that? On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I've tried with CL.All, but it doesn't wotk better. I still have strange values (between 4 and 10 events counted instead of 10) but know every request returns me always the same count value... It's very strange. Any other idea ? Alain 2011/11/7 Riyad Kalla rka...@gmail.com Alain, Try using a CL of 3 or ALL and see if that the problem goes away. Your replication factor (as I just learned) dictates how many nodes each piece of data is replicated to; by using a RF of 3 you are saying replicate all my data to all my nodes (in this case counters). This doesn't happen immediately, but you can *force* it to happen on write by specifying a CL of ALL. If you specify 1 then your counter value is written to one member of the ring, then your command returns. If you keep querying you will bounce around your ring, reading the values from the different nodes until a future date at *which point* all the values will likely agree. If you keep all your code you have now exactly the same, just change the code at the end where you read the counter value back, to keep reading the counter value back every second for 60 seconds and see if all the values eventually match up -- they should (as the counter value is replicated to all the nodes and their old values discarded). -R On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, I trying to switch from a RF = 1 to a RF = 3, but I get wrong values from counters when doing so... I got a CF that contains many counters of some events. When I'm at RF = 1 and simulate 10 events, they are well counted. However, when I switch to a RF = 3, my counter show a wrong value that sometimes change when requested twice (it can return 7, then 5 instead of 10 all the time). I first thought that it was a problem of CL because I seem to remember that I read once that I had to use CL.One for reads and writes with counters. So I tried with CL.One, without success... What am I doing wrong ? Is that some precaution to take when replicating counters ? Alain
Re: Counters and replication factor
Alain thank you for all the clarification, I understand exactly what you meant now... and as a result am just as confused as you are :) What version of Cassandra are you using? Can you share the important parts of your config? (you double checked that your replication factor is set on all 3 to 3?) Also out of curiosity, if you keep querying for up to 5 mins (say every 10 seconds) do counter1, 2 and 3 still show the same wrong values for getValue or do the values eventually converge on the correct amounts? (I assume 5mins is a long enough window to test, maybe I'm wrong and another Cassandra dev can correct me here). -R On Mon, Nov 7, 2011 at 9:57 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I retried it after restarting all the servers. I still have wrong results (I simulated an event 5 times and it was counted 3 times by some counters 4 or 5 times by others. What I meant by but now every request returns me always the same count value... will be easier to explain with an example : event 1: counter1.increment counter2.increment counter3.increment . . . event 5: counter1.increment counter2.increment counter3.increment Show results : counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 5 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 4 counter2.getValue = returns 4 counter3.getValue = returns 5 ... So I've got wrong values, and not always the same ones. In my previous email I tried to tell you by saying but now every request returns me always the same count value... that I had all the time the same wrong values, let us say : counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 counter1.getValue = returns 4 counter2.getValue = returns 3 counter3.getValue = returns 5 But that is not true, I still have some random wrong values, maybe haven't I query to get counter values often enough to see it last time. Sorry of not being clearer, that is not easy to explain, neither to understand for me. Thanks for help. Alain 2011/11/7 Riyad Kalla rka...@gmail.com Alain, When you tried CL.All was that only after you had made the change of ReplicationFactor=3 and restarted all the servers? If you hadn't restarted the servers with the new RF, I am not sure that CL.All would have the intended effect. Also, I wasn't sure what you meant by but know every request returns me always the same count value... -- didn't want the requests to always return you the same values? Or maybe you are saying that it always returns the same *wrong* value? Like you do: counter.increment (v=1) counter.increment (v=2) counter.increment (v=3) counter.getValue = returns 7 counter.getValue = returns 7 counter.getValue = returns 7 or something inconsistent like that? On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: I've tried with CL.All, but it doesn't wotk better. I still have strange values (between 4 and 10 events counted instead of 10) but know every request returns me always the same count value... It's very strange. Any other idea ? Alain 2011/11/7 Riyad Kalla rka...@gmail.com Alain, Try using a CL of 3 or ALL and see if that the problem goes away. Your replication factor (as I just learned) dictates how many nodes each piece of data is replicated to; by using a RF of 3 you are saying replicate all my data to all my nodes (in this case counters). This doesn't happen immediately, but you can *force* it to happen on write by specifying a CL of ALL. If you specify 1 then your counter value is written to one member of the ring, then your command returns. If you keep querying you will bounce around your ring, reading the values from the different nodes until a future date at *which point* all the values will likely agree. If you keep all your code you have now exactly the same, just change the code at the end where you read the counter value back, to keep reading the counter value back every second for 60 seconds and see if all the values eventually match up -- they should (as the counter value is replicated to all the nodes and their old values discarded). -R On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi, I trying to switch from a RF = 1 to a RF = 3, but I get wrong values from counters when doing so... I got a CF that contains many counters of some events. When I'm at RF = 1 and simulate 10 events, they are well counted. However, when I switch to a RF = 3, my counter show a wrong value that sometimes change when requested twice (it can return 7, then 5 instead of 10 all the time). I first thought that it was a problem of CL because I seem to remember that I read once that I had to use CL.One for reads and writes
Re: Will writes with ALL consistency eventually propagate?
Plan for the future At some point your data set will become too big for the node that it is running on, or your load will force you to split nodes once you do that RF N To solve performance issues with C* the solution is add more nodes To solve storage issues with C* the solution is add more nodes In most cases the solution in C* is add more nodes. Don't assume RF=Number of nodes as a core design decision of your application and you will not have your ass bitten ;-) -Stephen P.S. making the point more extreme to make it clear On 7 November 2011 15:04, Riyad Kalla rka...@gmail.com wrote: Stephen, Excellent breakdown; I appreciate all the detail. Your last comment about RF being smaller than N (number of nodes) -- in my particular case my data set isn't particularly large (a few GB) and is distributed globally across a handful of data centers. What I am utilizing Cassandra for is the replication in order to minimize latency for requests. So when a request comes into any location, I want each node in the ring to contain the full data set so it never needs to defer to another member of the ring to answer a question (even if this means eventually consistency, that is alright in my case). Given that, the way I've understood this discussion so far is I would have a RF of N (my total node count) but my Consistency Level with all my writes will *likely* be QUORUM -- I think that is a good/safe default for me to use as writes aren't the scenario I need to optimize for latency; that being said, I also don't want to wait for a ConsistencyLevel of ALL to complete before my code continues though. Would you agree with this assessment or am I missing the boat on something? Best, Riyad On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly stephen.alan.conno...@gmail.com wrote: Consistency Level is a pseudo-enum... you have the choice between ONE Quorum (and there are different types of this) ALL At CL=ONE, only one node is guaranteed to have got the write if the operation is a success. At CL=ALL, all nodes that the RF says it should be stored at must confirm the write before the operation succeeds, but a partial write will succeed eventually if at least one node recorded the write At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the operation to succeed, otherwise failure, but a partial write will succeed eventually if at least one node recorded the write. Read repair will eventually ensure that the write is replicated across all RF nodes in the cluster. The N in QUORUM above depends on the type of QUORUM you choose, in general think N=RF unless you choose a fancy QUORUM. To have a consistent read, CL of write + CL of read must be RF... Write at ONE, read at ONE = may not get the most recent write if RF 1 [fastest write, fastest read] {data loss possible if node lost before read repair} Write at QUORUM, read at ONE = consistent read [moderate write, fastest read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at ONE = consistent read, writes may be blocked if any node fails [slowest write, fastest read] Write at ONE, read at QUORUM = may not get the most recent write if RF 2 [fastest write, moderate read] {data loss possible if node lost before read repair} Write at QUORUM, read at QUORUM = consistent read [moderate write, moderate read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at QUORUM = consistent read, writes may be blocked if any node fails [slowest write, moderate read] Write at ONE, read at ALL = consistent read, reads may fail if any node fails [fastest write, slowest read] {data loss possible if node lost before read repair} Write at QUORUM, read at ALL = consistent read, reads may fail if any node fails [moderate write, slowest read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at ALL = consistent read, writes may be blocked if any node fails, reads may fail if any node fails [slowest write, slowest read] Note: You can choose the CL for each and every operation. This is something that you should design into your application (unless you exclusively use QUORUM for all operations, in which case you are advised to bake the logic in, but it is less necessary) The other thing to remember is that RF does not have to equal the number of nodes in your cluster... in fact I would recommend designing your app on the basis that RF number of nodes in your cluster... because at some point, when your data set grows big enough, you will end up with RF number of nodes. -Stephen On 7 November 2011 13:03, Riyad Kalla rka...@gmail.com wrote: Ah! Ok I was interpreting what you were saying to mean that if my RF was too high, then the ring would die if I lost one. Ultimately what I want (I think) is: Replication Factor: 5 (aka all of my nodes) Consistency Level: 2 Put another way, when I write a
Reminder: Cassandra Meetup, Thursday Nov. 10th in Vancouver
Just a reminder; If you're planning to be at ApacheCon, or are otherwise able to be in Vancouver on the 10th, we're having a Cassandra Meetup. There is no cost to attend (you don't even need to be registered for the conference), and beer will be provided. As a special treat, Chris Burroughs of Clearspring, and Paul Querna of Rackspace have each agreed to spend a few minutes talking about their use-cases, and answering any question you might have. Hope to see you there! http://wiki.apache.org/cassandra/Meetup_ApacheConNA2011 -- Eric Evans Acunu | http://www.acunu.com | @acunu
Re: Reminder: Cassandra Meetup, Thursday Nov. 10th in Vancouver
I'll be there! On Mon, Nov 7, 2011 at 5:23 PM, Eric Evans eev...@acunu.com wrote: Just a reminder; If you're planning to be at ApacheCon, or are otherwise able to be in Vancouver on the 10th, we're having a Cassandra Meetup. There is no cost to attend (you don't even need to be registered for the conference), and beer will be provided. As a special treat, Chris Burroughs of Clearspring, and Paul Querna of Rackspace have each agreed to spend a few minutes talking about their use-cases, and answering any question you might have. Hope to see you there! http://wiki.apache.org/cassandra/Meetup_ApacheConNA2011 -- Eric Evans Acunu | http://www.acunu.com | @acunu -- http://twitter.com/tjake
Re: Key count mismatch in cluster for a column family
Sylvain - We have similar problem but the discrepancy is not that big. Do we have to do major compaction to fix it? We did not do 'nodetool compact', just did repair regularly, which triggers minor compaction. Thanks, Daning On 10/26/2011 03:23 AM, Sylvain Lebresne wrote: The estimate for the number of keys is computed by summing the key estimate for each sstable of the CF. For each sstable, the estimate should be fairly good. However, that's when we sum all the sstable estimates that we can loose potentially a lot of precision if there is a lot of rows that have parts in different sstables. But that in turn would suggest a problem with compaction lacking badly behind, especially with leveled compaction. -- Sylvain On Wed, Oct 26, 2011 at 3:58 AM, Terry Cumaranatungecumar...@gmail.com wrote: I have a cluster of 8 nodes all running 1.0. The stats shown on the 1st node on one of the CFs for the number of keys is much larger than expected. The first node shows the key count estimate to be 9.2M whereas the rest report ~650K on each node. The 650K is in the correct neighborhood of the number of keys that have been inserted. The counts are comparable for all other CFs across the cluster. I'm using Level compaction, but no compression. The 'nodetool ring' shows that the load is equal across all nodes. What could cause this large disparity in the number of keys? Is this just a stats issue or does this suggest a functional problem? 1st node: Column Family: uid SSTable count: 395 Space used (live): 1375262 Space used (total): 5482088532 Number of Keys (estimate): 9215104 Memtable Columns Count: 514952 Memtable Data Size: 295213448 Memtable Switch Count: 290 Read Count: 193102511 Read Latency: 0.146 ms. Write Count: 176934874 Write Latency: 0.018 ms. Pending Tasks: 0 Key cache capacity: 8302131 Key cache size: 8302131 Key cache hit rate: 0.8644664668071792 Row cache: disabled Compacted row minimum size: 87 Compacted row maximum size: 7007506 Compacted row mean size: 8944 2nd node: Column Family: uid SSTable count: 402 Space used (live): 13723958304 Space used (total): 4044833220 Number of Keys (estimate): 652928 Memtable Columns Count: 170290 Memtable Data Size: 102378904 Memtable Switch Count: 272 Read Count: 192463595 Read Latency: 0.289 ms. Write Count: 176527238 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 8783058 Key cache size: 8783058 Key cache hit rate: 0.7865727464740025 Row cache: disabled Compacted row minimum size: 87 Compacted row maximum size: 7007506 Compacted row mean size: 12151 3rd node: Column Family: uid SSTable count: 401 Space used (live): 13204714872 Space used (total): 4030024144 Number of Keys (estimate): 675968 Memtable Columns Count: 42881 Memtable Data Size: 30992298 Memtable Switch Count: 304 Read Count: 190769879 Read Latency: 0.224 ms. Write Count: 175381826 Write Latency: 0.014 ms. Pending Tasks: 0 Key cache capacity: 8920108 Key cache size: 8920108 Key cache hit rate: 0.8053563128870577 Row cache: disabled Compacted row minimum size: 87 Compacted row maximum size: 4866323 Compacted row mean size: 12074
Re: Cassandra NYC Summit on December 6th
Very cool Nate, when will the tracks be locked in? On Mon, Nov 7, 2011 at 11:14 AM, Nate McCall n...@datastax.com wrote: The first East Coast Apache Cassandra conference - Cassandra NYC - will be held on Tuesday, December 6, at the Lighthouse International conference center in New York City. This is a one day two-track event with lectures and workshops by leading Cassandra experts. Jonathan Ellis, head of the Apache Cassandra Project, will give the keynote. There is an initial list of speakers up, with others to be announced in the next few days. Continental breakfast, lunch, and continuous beverage service will be provided for all attendees. Following the conference there will be an after party worthy of NYC. For additional details and registration, please visit the event page: http://www.datastax.com/events/cassandranyc2011 Hope to see you there! -Nate
Re: Cassandra NYC Summit on December 6th
We should have the schedule set by the end of the week. The confirmed speaker list can be found here (and should be growing by the day as bios and summaries come in): http://www.datastax.com/events/cassandranyc2011 Thanks, -Nate On Mon, Nov 7, 2011 at 12:25 PM, Riyad Kalla rka...@gmail.com wrote: Very cool Nate, when will the tracks be locked in? On Mon, Nov 7, 2011 at 11:14 AM, Nate McCall n...@datastax.com wrote: The first East Coast Apache Cassandra conference - Cassandra NYC - will be held on Tuesday, December 6, at the Lighthouse International conference center in New York City. This is a one day two-track event with lectures and workshops by leading Cassandra experts. Jonathan Ellis, head of the Apache Cassandra Project, will give the keynote. There is an initial list of speakers up, with others to be announced in the next few days. Continental breakfast, lunch, and continuous beverage service will be provided for all attendees. Following the conference there will be an after party worthy of NYC. For additional details and registration, please visit the event page: http://www.datastax.com/events/cassandranyc2011 Hope to see you there! -Nate
RE: Second Cassandra users survey
I second transparent disk encryption. Also: Matching column names via 'like' and %wildcards Parameterized CQL plus Support for 'AND' and 'OR' Bulk row deletion. Also, more clarification on various parameters and configuration - If you are doing this, change Thanks for the opportunity, -Derek -- Derek Deeter, Sr. Software Engineer Intuit Financial Services (818) 597-5932 (x76932)5601 Lindero Canyon Rd. derek.dee...@digitalinsight.com Westlake, CA 91362 -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Sunday, November 06, 2011 10:58 AM To: user@cassandra.apache.org Subject: Re: Second Cassandra users survey Transparent on disk encryption with pluggable keyprovider will also be really helpful to secure sensitive information. On Sun, Nov 6, 2011 at 9:42 AM, Aaron Turner synfina...@gmail.com wrote: The intent was to have a lighter solution for common problems then having to go with Hadoop or streaming large quantities of data back to the client. Is this feature creep? Yeah, prolly. Is it useful? Yes. If it can't be done well, then it probably shouldn't be done, but it never hurts to ask. :) On Sun, Nov 6, 2011 at 9:13 AM, Sarah Baker sba...@mspot.com wrote: Isn't this sort of heading on the slippery slope of things that weigh you down? It was my understanding that Cassandra was stick to your core competency sort of database that really wanted to leave such utilities external. At its core was get and put. Did I miss something in my reading of intent? -Sarah -Original Message- From: Aaron Turner [mailto:synfina...@gmail.com] Sent: Sunday, November 06, 2011 8:25 AM To: user@cassandra.apache.org Subject: Re: Second Cassandra users survey 1. Basic SQL-like summary transforms for both CQL and Thrift API clients like: SUM AVG MIN MAX -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: Second Cassandra users survey
Wish list: A decent GUI to explore data kept in Cassandra would be much valuable. It should also be extendable to provide viewers for custom data. +1 to that. @jonathan - This is what google moderator is really good at. Perhaps start one and move the idea creation / voting there.
Re: Determining Strategy options for a particular Strategy class
No, there isn't. On Mon, Nov 7, 2011 at 8:04 AM, Dave Brosius dbros...@mebigfatguy.com wrote: Is there a programmatic way to determine what the valid 'keys' are for the strategy options for a particular strategy class? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Will writes with ALL consistency eventually propagate?
at that point, your cluster will either have so much data on each node that you will need to split them, keeping rf=5 so you have 10 nodes... or the intra cluster traffic will swap you and you will split each node keeping rf=5 so you have 10 nodes again. safest thing is not to design with the assumption that rf=n - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 7 Nov 2011 17:47, Riyad Kalla rka...@gmail.com wrote: Stephen, I appreciate you making the point more strongly; I won't make this decision lightly given the stress you are putting on it, but the technical aspects of this make me curious... If I start with RF=N (number of nodes) now, and in 2 years (hypothetically) my dataset is too large and I say to myself Dangit, Stephen was right..., couldn't I just change the RF to some smaller value, say 3 at that point or would the Cassandra ring not rebalance the data set nicely at that point? More specifically, would it not know how best to slowly remove extraneous copies from the nodes and make the data more sparse among the ring members? Thanks for the hand-holding; it is helping me understand the operational landscape quickly. -R On Mon, Nov 7, 2011 at 10:18 AM, Stephen Connolly stephen.alan.conno...@gmail.com wrote: Plan for the future At some point your data set will become too big for the node that it is running on, or your load will force you to split nodes once you do that RF N To solve performance issues with C* the solution is add more nodes To solve storage issues with C* the solution is add more nodes In most cases the solution in C* is add more nodes. Don't assume RF=Number of nodes as a core design decision of your application and you will not have your ass bitten ;-) -Stephen P.S. making the point more extreme to make it clear On 7 November 2011 15:04, Riyad Kalla rka...@gmail.com wrote: Stephen, Excellent breakdown; I appreciate all the detail. Your last comment about RF being smaller than N (number of nodes) -- in my particular case my data set isn't particularly large (a few GB) and is distributed globally across a handful of data centers. What I am utilizing Cassandra for is the replication in order to minimize latency for requests. So when a request comes into any location, I want each node in the ring to contain the full data set so it never needs to defer to another member of the ring to answer a question (even if this means eventually consistency, that is alright in my case). Given that, the way I've understood this discussion so far is I would have a RF of N (my total node count) but my Consistency Level with all my writes will *likely* be QUORUM -- I think that is a good/safe default for me to use as writes aren't the scenario I need to optimize for latency; that being said, I also don't want to wait for a ConsistencyLevel of ALL to complete before my code continues though. Would you agree with this assessment or am I missing the boat on something? Best, Riyad On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly stephen.alan.conno...@gmail.com wrote: Consistency Level is a pseudo-enum... you have the choice between ONE Quorum (and there are different types of this) ALL At CL=ONE, only one node is guaranteed to have got the write if the operation is a success. At CL=ALL, all nodes that the RF says it should be stored at must confirm the write before the operation succeeds, but a partial write will succeed eventually if at least one node recorded the write At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the operation to succeed, otherwise failure, but a partial write will succeed eventually if at least one node recorded the write. Read repair will eventually ensure that the write is replicated across all RF nodes in the cluster. The N in QUORUM above depends on the type of QUORUM you choose, in general think N=RF unless you choose a fancy QUORUM. To have a consistent read, CL of write + CL of read must be RF... Write at ONE, read at ONE = may not get the most recent write if RF 1 [fastest write, fastest read] {data loss possible if node lost before read repair} Write at QUORUM, read at ONE = consistent read [moderate write, fastest read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at ONE = consistent read, writes may be blocked if any node fails [slowest write, fastest read] Write at ONE, read at QUORUM = may not get the most recent write if RF 2 [fastest write, moderate read] {data loss possible if node lost before read repair} Write at QUORUM, read at QUORUM = consistent read [moderate write, moderate read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at QUORUM = consistent read,
Re: Second Cassandra users survey
Well - given the example in our case the prefix that determines the endpoints where a token should be routed to could be something like a user-id so with key = userid + . + userthingid; instead of // this is happening right now getEndpoints(hash(key)) you would have getEndpoints(userid) Since count(users) is much larger than number of nodes in the ring we would still have a balanced cluster. I guess what we would need is something like a compound row key You could almost do something like this with the current code base but I remember that there are certain assumptions about how keys translate to tokens on the ring make this impossible. But in essence this would result in another partitioner implementation. So you'd have OrderPreserverPartitioner, RandomPartitioner and maybe ShardedPartitioner On Nov 7, 2011, at 2:26 PM, Peter Lin wrote: This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2, Oracle and manual partitioning, one of the biggest challenges is keeping the partitions balanced. What I've seen with manual partitioning is that often the partitions get unbalanced. Usually the developers take a best guess and hope it ends up balanced. Some of the approaches I've used in the past were zip code, area code, state and some kind of hash. So my question related deterministic sharding is this, what rebalance feature(s) would be useful or needed once the partitions get unbalanced? Without a decent plan for rebalancing, it often ends up being a very painful problem to solve in production. Back when I worked mobile apps, we saw issues with how OpenWave WAP servers partitioned the accounts. The early versions randomly assigned a phone to a server when it is provisioned the first time. Once the phone was associated to that server, it was stuck on that server. If the load on that server was heavier than the others, the only choice was to scale up the hardware. My understanding of Cassandra's current sharding is consistent and random. Does the new feature sit some where in-between? Are you thinking of a pluggable API so that you can provide your own hash algorithm for cassandra to use? On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires writing rows with different row keys. If we could use something like this: prefix.uniquekey and let the partitioner use only the prefix the probability that only part of the transaction would be written could be reduced considerably. On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Will writes with ALL consistency eventually propagate?
Ahh, I see your point. Thanks for the help Stephen. On Mon, Nov 7, 2011 at 12:43 PM, Stephen Connolly stephen.alan.conno...@gmail.com wrote: at that point, your cluster will either have so much data on each node that you will need to split them, keeping rf=5 so you have 10 nodes... or the intra cluster traffic will swap you and you will split each node keeping rf=5 so you have 10 nodes again. safest thing is not to design with the assumption that rf=n - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 7 Nov 2011 17:47, Riyad Kalla rka...@gmail.com wrote: Stephen, I appreciate you making the point more strongly; I won't make this decision lightly given the stress you are putting on it, but the technical aspects of this make me curious... If I start with RF=N (number of nodes) now, and in 2 years (hypothetically) my dataset is too large and I say to myself Dangit, Stephen was right..., couldn't I just change the RF to some smaller value, say 3 at that point or would the Cassandra ring not rebalance the data set nicely at that point? More specifically, would it not know how best to slowly remove extraneous copies from the nodes and make the data more sparse among the ring members? Thanks for the hand-holding; it is helping me understand the operational landscape quickly. -R On Mon, Nov 7, 2011 at 10:18 AM, Stephen Connolly stephen.alan.conno...@gmail.com wrote: Plan for the future At some point your data set will become too big for the node that it is running on, or your load will force you to split nodes once you do that RF N To solve performance issues with C* the solution is add more nodes To solve storage issues with C* the solution is add more nodes In most cases the solution in C* is add more nodes. Don't assume RF=Number of nodes as a core design decision of your application and you will not have your ass bitten ;-) -Stephen P.S. making the point more extreme to make it clear On 7 November 2011 15:04, Riyad Kalla rka...@gmail.com wrote: Stephen, Excellent breakdown; I appreciate all the detail. Your last comment about RF being smaller than N (number of nodes) -- in my particular case my data set isn't particularly large (a few GB) and is distributed globally across a handful of data centers. What I am utilizing Cassandra for is the replication in order to minimize latency for requests. So when a request comes into any location, I want each node in the ring to contain the full data set so it never needs to defer to another member of the ring to answer a question (even if this means eventually consistency, that is alright in my case). Given that, the way I've understood this discussion so far is I would have a RF of N (my total node count) but my Consistency Level with all my writes will *likely* be QUORUM -- I think that is a good/safe default for me to use as writes aren't the scenario I need to optimize for latency; that being said, I also don't want to wait for a ConsistencyLevel of ALL to complete before my code continues though. Would you agree with this assessment or am I missing the boat on something? Best, Riyad On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly stephen.alan.conno...@gmail.com wrote: Consistency Level is a pseudo-enum... you have the choice between ONE Quorum (and there are different types of this) ALL At CL=ONE, only one node is guaranteed to have got the write if the operation is a success. At CL=ALL, all nodes that the RF says it should be stored at must confirm the write before the operation succeeds, but a partial write will succeed eventually if at least one node recorded the write At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the operation to succeed, otherwise failure, but a partial write will succeed eventually if at least one node recorded the write. Read repair will eventually ensure that the write is replicated across all RF nodes in the cluster. The N in QUORUM above depends on the type of QUORUM you choose, in general think N=RF unless you choose a fancy QUORUM. To have a consistent read, CL of write + CL of read must be RF... Write at ONE, read at ONE = may not get the most recent write if RF 1 [fastest write, fastest read] {data loss possible if node lost before read repair} Write at QUORUM, read at ONE = consistent read [moderate write, fastest read] {multiple nodes must be lost for data loss to be possible} Write at ALL, read at ONE = consistent read, writes may be blocked if any node fails [slowest write, fastest read] Write at ONE, read at QUORUM = may not get the most recent write if RF 2 [fastest write, moderate read] {data loss possible if node lost before read repair} Write at QUORUM, read at QUORUM =
Re: Running java stress tests
I managed to fix my own problem. I had rpc_address set to localhost. Running an strace on the stress command shows this attempt to bind to an IPV6 address. Leaving this blank fixes the problem since host resolution works fine on my cluster. Thanks, Joe On Sun, Nov 6, 2011 at 8:15 PM, Joe Kaiser joe.kai...@stackiq.com wrote: Hi, I am attempting to run the java stress tests: Tests to the local machine work fine: # sh stress -d localhost -n 100 Unable to create stress keyspace: Keyspace names must be case-insensitively unique total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time 321975,32197,32197,0.0011982358878794939,10 608889,28691,28691,0.0014594442934119632,20 897246,28835,28835,0.0014675003554621528,30 .. 9894395,11904,11904,0.002851321382369248,505 1000,10560,10560,0.003761071918943232,513 END Tests to a number of the remote machines do not: # sh stress -d 10.1.255.254 -n 1000 Exception in thread main java.lang.RuntimeException: java.net.ConnectException: Connection refused at org.apache.cassandra.stress.Session.getClient(Unknown Source) at org.apache.cassandra.stress.Session.createKeySpaces(Unknown Source) at org.apache.cassandra.stress.StressAction.run(Unknown Source) at org.apache.cassandra.stress.Stress.main(Unknown Source) This is without firewalls, on the same private subnet with DNS resolvings hostname correctly. Nothing shows up in the cassandra logs on either the machine where the stress test command was invoked or where it was intended to run. Has anyone seen this problem before? Thanks, Joe -- Joe Kaiser StackIQ Systems Engineer -- Joe Kaiser Systems Engineer 801-477-0272 AIM: kobudojoe GTalk: joe.kai...@stackiq.com Skype: joe.kaiser
Re: Running java stress tests
Thanks for the update, Joe. Good to know! On Mon, Nov 7, 2011 at 2:44 PM, Joe Kaiser joe.kai...@stackiq.com wrote: I managed to fix my own problem. I had rpc_address set to localhost. Running an strace on the stress command shows this attempt to bind to an IPV6 address. Leaving this blank fixes the problem since host resolution works fine on my cluster. Thanks, Joe On Sun, Nov 6, 2011 at 8:15 PM, Joe Kaiser joe.kai...@stackiq.com wrote: Hi, I am attempting to run the java stress tests: Tests to the local machine work fine: # sh stress -d localhost -n 100 Unable to create stress keyspace: Keyspace names must be case-insensitively unique total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time 321975,32197,32197,0.0011982358878794939,10 608889,28691,28691,0.0014594442934119632,20 897246,28835,28835,0.0014675003554621528,30 .. 9894395,11904,11904,0.002851321382369248,505 1000,10560,10560,0.003761071918943232,513 END Tests to a number of the remote machines do not: # sh stress -d 10.1.255.254 -n 1000 Exception in thread main java.lang.RuntimeException: java.net.ConnectException: Connection refused at org.apache.cassandra.stress.Session.getClient(Unknown Source) at org.apache.cassandra.stress.Session.createKeySpaces(Unknown Source) at org.apache.cassandra.stress.StressAction.run(Unknown Source) at org.apache.cassandra.stress.Stress.main(Unknown Source) This is without firewalls, on the same private subnet with DNS resolvings hostname correctly. Nothing shows up in the cassandra logs on either the machine where the stress test command was invoked or where it was intended to run. Has anyone seen this problem before? Thanks, Joe -- Joe Kaiser StackIQ Systems Engineer -- Joe Kaiser Systems Engineer 801-477-0272 AIM: kobudojoe GTalk: joe.kai...@stackiq.com Skype: joe.kaiser -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Error connection to remote JMX agent during repair
Hello, I'm trying to run repair on one of my nodes which needs to be repopulated after a failure of the hard drive. What I'm getting is below. Note: I'm not loading JMX with Cassandra, it always worked before... The version if 0.8.6. Any help will be appreciated, Maxim Error connection to remote JMX agent! java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.net.SocketTimeoutException: Read timed out] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:338) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248) at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:140) at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:110) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:582) Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.net.SocketTimeoutException: Read timed out] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) at javax.naming.InitialContext.lookup(InitialContext.java:392) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1886) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1856) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) ... 4 more Caused by: java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.net.SocketTimeoutException: Read timed out at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) ... 9 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)
[RELEASE] Apache Cassandra 1.0.2 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 1.0.2. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is maintenance/bug fix release[1]. It comes quickly after the release of 1.0.1 mainly because if fixes a bug that is making compression unusable for some, and we preferred delivering the fix quickly. It contains a few other fixes as well and upgrade is encouraged. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. Have fun! [1]: http://goo.gl/81Xbe (CHANGES.txt) [2]: http://goo.gl/XUedS (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Secondary index issue, unable to query for records that should be there
Hello, I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several secondary indexes to try out some options. Right now I have the following to create my CF using the CLI: create column family MyTest with key_validation_class = UTF8Type and comparator = UTF8Type and column_metadata = [ -- absolute timestamp for this message, also indexed year/month/day/hour/minute -- index these as they are low cardinality {column_name:messageTimestamp, validation_class:LongType}, {column_name:messageYear, validation_class:IntegerType, index_type: KEYS}, {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS}, {column_name:messageDay, validation_class:IntegerType, index_type: KEYS}, {column_name:messageHour, validation_class:IntegerType, index_type: KEYS}, {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS}, ... other non-indexed columns defined ]; So when I insert data, I calculate a year/month/day/hour/minute and set these values on a Hector ColumnFamilyUpdater instance and update that way. Then later I can query from the command line with CQL such as: get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and messageHour=13 and messageMinute=44; etc. This generally works, however at some point queries that I know should return data no longer return any rows. So for instance, part way through my test (inserting 250K rows), I can query for what should be there and get data back such as the above query, but later that same query returns 0 rows. Similarly, with fewer clauses in the expression, like this: get MyTest where messageYear=2011 and messageMonth=6; Will also return 0 rows. ??? Any idea what could be going wrong? I'm not getting any exceptions in my client during the write, and I don't see anything in the logs (no errors anyway). A second question - is what I'm doing insane? I'm not sure that performance on CQL queries with multiple indexed columns is good (does Cassandra intelligently use all available indexes on these queries?) Thanks, -nate
Re: jamm - memory meter
Currently cassandra/conf/cassandra-env.sh disables use of jamm if openjdk is detected. I enabled it and tested it on openjdk 1.6 b23 and it works as expected. That openjdk test can be probably removed.
Re: jamm - memory meter
What version is shipping on debian stable / RHEL? On Mon, Nov 7, 2011 at 4:13 PM, Radim Kolar h...@sendmail.cz wrote: Currently cassandra/conf/cassandra-env.sh disables use of jamm if openjdk is detected. I enabled it and tested it on openjdk 1.6 b23 and it works as expected. That openjdk test can be probably removed. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Secondary index issue, unable to query for records that should be there
Nate, is this all against a single Cassandra server, or do you have a ring setup? If you do have a ring setup, what is your replicationfactor set to? Also what ConsistencyLevel are you writing with when storing the values? -R On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons nsamm...@ften.com wrote: Hello, ** ** I’m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve got a CF with several secondary indexes to try out some options. Right now I have the following to create my CF using the CLI: ** ** create column family MyTest with key_validation_class = UTF8Type and comparator = UTF8Type and column_metadata = [ -- absolute timestamp for this message, also indexed year/month/day/hour/minute -- index these as they are low cardinality {column_name:messageTimestamp, validation_class:LongType}, {column_name:messageYear, validation_class:IntegerType, index_type: KEYS}, {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS}, {column_name:messageDay, validation_class:IntegerType, index_type: KEYS}, {column_name:messageHour, validation_class:IntegerType, index_type: KEYS}, {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS}, ** ** … other non-indexed columns defined ** ** ]; ** ** ** ** So when I insert data, I calculate a year/month/day/hour/minute and set these values on a Hector ColumnFamilyUpdater instance and update that way. Then later I can query from the command line with CQL such as: ** ** get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and messageHour=13 and messageMinute=44; ** ** etc. This generally works, however at some point queries that I know should return data no longer return any rows. ** ** So for instance, part way through my test (inserting 250K rows), I can query for what should be there and get data back such as the above query, but later that same query returns 0 rows. Similarly, with fewer clauses in the expression, like this: ** ** get MyTest where messageYear=2011 and messageMonth=6; ** ** Will also return 0 rows. ** ** ** ** ??? Any idea what could be going wrong? I’m not getting any exceptions in my client during the write, and I don’t see anything in the logs (no errors anyway). ** ** ** ** ** ** A second question – is what I’m doing insane? I’m not sure that performance on CQL queries with multiple indexed columns is good (does Cassandra intelligently use all available indexes on these queries?) ** ** ** ** ** ** Thanks, ** ** -nate
RE: Using Cli to create a column family with column name metadata question
Hi, Thanks for the replay. I'm not talking about the column name. I'm talking about the column metadata's column name. Right now cli can't not display the column's meta name correctly if the comparator type is not UTF8. Regards, Arsene -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, November 04, 2011 11:09 PM To: user Subject: Re: Using Cli to create a column family with column name metadata question [Moving to user@] Because Cassandra's sparse data model supports using rows as materialized views, having non-UTF8 column names is common and totally valid. On Fri, Nov 4, 2011 at 5:19 AM, Arsene Lee arsene@ruckuswireless.com wrote: Hi, I'm trying to use Column Family's metadata to do some validation. I found out that in Cassandra's CLI CliClient.java code when trying to create a column family with column name metadata. It is based on CF's comparator type to convert the name String to ByteBuffer. I'm wondering if there is any particular reason for this? For the column name metadata shouldn't it be easier just to all use UTF8Type. Because if CF's comparator is other than UTF8Type, it is hard to convert the column name back. Regards, Arsene Lee -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Using Cli to create a column family with column name metadata question
On Mon, Nov 7, 2011 at 7:36 PM, Arsene Lee arsene@ruckuswireless.com wrote: Hi, Thanks for the replay. I'm not talking about the column name. I'm talking about the column metadata's column name. Right now cli can't not display the column's meta name correctly if the comparator type is not UTF8. Try 'help assume;' -Brandon
propertyfilesnitch problem
Hi, We have a 2 DC setup on version 0.7.9 and have observed the following: 1. Using a property file snitch, with dynamic snitch turned on. The performance of LOCAL_QUORUM operations is poor for a while (around a minute) after a cluster restart before drastically improving. 2. With the same setup, after each period as defined by dynamic_snitch_reset_interval_in_ms, the LOCAL_QUORUM performance greatly degrades before drastically improving again within a minute. 3. With the dynamic snitch turned off, LOCAL_QUORUM operations perform extremely poorly... same as the 1st minute after a restart. 4. With dynamic snitch turned on, QUORUM operations' performance is about the same as using LOCAL_QUORUM when the dynamic snitch is off or the first minute after a restart with the snitch turned on. All of this seem to point to LOCAL_QUORUM operations not differentiating our DCs using the property file snitch and its performance effectively degrades to that of QUORUM when dynamic snitch doesn't have appropriate scores. Our main concern is the performance degradation at the periods defined by dynamic_snitch_reset_interval_in_ms. The DynamicEndpointSnitch in steady state assigns scores that matches the DCs we've configured through the network topology property file. Our network topology property file appears to be properly configured and have been confirmed through the EndpointSnitchInfo mbean. Please advice. Thanks, Shu Medio Systems
Re: Second Cassandra users survey
It should be dead-simple to build a slick GUI on the REST layer. (@Virgilhttp://code.google.com/a/apache-extras.org/p/virgil/ ) I had planned to crank one out this week (using ExtJS) that mimicked the Squirrel/Toad look and feel. The UI would have a tree-panel of keyspaces and column families on the left. Then the main panel would be partitioned into two. The top of the main panel would would allow a user to type in CQL/Pig, etc. The bottom of the main panel would show the data contained in the column family / result set. Any other thoughts on design before I get started? If we build this based on the JSON/REST interface, it should be pretty easy to embed in other applications. -brian On Mon, Nov 7, 2011 at 2:36 PM, Ian Danforth idanfo...@numenta.com wrote: Wish list: A decent GUI to explore data kept in Cassandra would be much valuable. It should also be extendable to provide viewers for custom data. +1 to that. @jonathan - This is what google moderator is really good at. Perhaps start one and move the idea creation / voting there. -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
RE: Using Cli to create a column family with column name metadata question
Hi, I tried the assume and column metadata's column name still not right. I think CLI shouldn't use comparator type to convert the column meta string. It should all use UTF8 to convert column name metadata. Regards, Arsene -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Tuesday, November 08, 2011 9:49 AM To: user@cassandra.apache.org Subject: Re: Using Cli to create a column family with column name metadata question On Mon, Nov 7, 2011 at 7:36 PM, Arsene Lee arsene@ruckuswireless.com wrote: Hi, Thanks for the replay. I'm not talking about the column name. I'm talking about the column metadata's column name. Right now cli can't not display the column's meta name correctly if the comparator type is not UTF8. Try 'help assume;' -Brandon
Re: Will writes with ALL consistency eventually propagate?
Given that, the way I've understood this discussion so far is I would have a RF of N (my total node count) but my Consistency Level with all my writes will *likely* be QUORUM -- I think that is a good/safe default for me to use as writes aren't the scenario I need to optimize for latency; that being said, I also don't want to wait for a ConsistencyLevel of ALL to complete before my code continues though. Would you agree with this assessment or am I missing the boat on something? Are you *sure* you care about latency to the degree that data being non-local actually matters to your application? Normally you don't set RF=N unless you have particularly special requirements. The extra latency implied by another network round-trip is certainly greater than zero, but in many practical situations outliers and the behavior in case of e.g. node problems is much more important than an extra millisecond or two on the average request. Setting RF=N causes a larger data set on each node, in addition to causing more nodes to be involved in every request. Consider whether it's a better use of resources to set RF to e.g. 3 instead, and let the ring grow independently. That is what one normally does. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: Will writes with ALL consistency eventually propagate?
Peter, Thanks for the additional insight on this -- think of a CDN that needs to respond to requests, distributed around the globe. Ultimately you would hope that each edge location could respond as quickly as possible (RF=N) but if each of the ring members keep open/active connections to each other, and a request comes in to an edge location that does not contain a copy of the data, does it request the data from the node that does, then cache it (in the case of more requests coming into that edge location with the same request) or does it reply once and forget it, requiring *each* subsequent request to that node to always phone back home to the node that actually contains it? The CDN/edge-server scenario works particularly well to illustrate my goals, if visualizing that helps. Look forward to your thoughts. -R On Mon, Nov 7, 2011 at 8:05 PM, Peter Schuller peter.schul...@infidyne.comwrote: Given that, the way I've understood this discussion so far is I would have a RF of N (my total node count) but my Consistency Level with all my writes will *likely* be QUORUM -- I think that is a good/safe default for me to use as writes aren't the scenario I need to optimize for latency; that being said, I also don't want to wait for a ConsistencyLevel of ALL to complete before my code continues though. Would you agree with this assessment or am I missing the boat on something? Are you *sure* you care about latency to the degree that data being non-local actually matters to your application? Normally you don't set RF=N unless you have particularly special requirements. The extra latency implied by another network round-trip is certainly greater than zero, but in many practical situations outliers and the behavior in case of e.g. node problems is much more important than an extra millisecond or two on the average request. Setting RF=N causes a larger data set on each node, in addition to causing more nodes to be involved in every request. Consider whether it's a better use of resources to set RF to e.g. 3 instead, and let the ring grow independently. That is what one normally does. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: Second Cassandra users survey
Decompression without compression (for lack of a better name). We store into Cassandra log batches that come in over http either uncompressed, deflate, snappy. We just add 'magic e.g. \0 \s \n \a \p \p \y as a prefix to the column value so we can decode it when serve it back up. Seems like Cassandra could detect data with the appropriate magic, store as is and decode for us automatically on the way back. Colin.
Re: order of output in get_slice
The results still come back in comparator order (increasing). 2011/11/7 Roland Hänel rol...@haenel.me: Does a call to listColumnOrSuperColumn get_slice(binary key, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) give us any guarantees on the order of the returned list? I understand that when the predicate actually contains a sliceRange, then the order _is_ guaranteed to be increasing (decreasing if the reverse flag is set). But when the predicate contains a list of column names instead of a range, do we also have the guarantee that the order is increasing (no decreasing option because no reverse flag here)? Greetings, Roland
Re: Will writes with ALL consistency eventually propagate?
Thanks for the additional insight on this -- think of a CDN that needs to respond to requests, distributed around the globe. Ultimately you would hope that each edge location could respond as quickly as possible (RF=N) but if each of the ring members keep open/active connections to each other, and a request comes in to an edge location that does not contain a copy of the data, does it request the data from the node that does, then cache it (in the case of more requests coming into that edge location with the same request) or does it reply once and forget it, requiring *each* subsequent request to that node to always phone back home to the node that actually contains it? The CDN/edge-server scenario works particularly well to illustrate my goals, if visualizing that helps. Look forward to your thoughts. Nodes will never cache any data. Nodes have the data that they own according to the ring topology and the replication factor (to the extent that the data has been replicated); the node you happen to talk to is merely a co-ordinator of a request; essentially a proxy with intelligent routing to the correct hosts. In the CDN situation, if you're talking about e.g. having a group of servers in one place (network topologically distinct location, such as geographically distinct) then a better fit than RF=N is probably to use multi-site support and say that you want a certain number of copies for each location and have all clients talk to the most local site. But that's assuming you want to try to model this using just Cassandra's replication to begin with. Dynamically caching wherever data is accessed is a good idea for a CDN use-case (probably), but is not something that Cassandra does itself, internally. It's really difficult to know what the best solution is for a CDN; and in your case you imply that it's really *not* a CDN and it's just an analogy ;) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: shutdown by KILL
For things like rolling restarts, we do: disablethrift disablegossip (...wait for all nodes to see this node go down..) drain i implemented this in our batch scripts for cassandra disablegossip sleep 10 seconds dissablethrift drain KILL -TERM similar thing should be added to bin/stop-server