Re: Any tools like phpMyAdmin to see data stored in Cassandra ?
You might run it from a VM? 2012/1/30 Ertio Lew ertio...@gmail.com On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael michael.fri...@nuance.com wrote: OpsCenter? http://www.datastax.com/products/opscenter - Mike I have tried Sebastien's phpmyAdmin For Cassandrahttps://github.com/sebgiroux/Cassandra-Cluster-Admin to see the data stored in Cassandra in the same manner as phpMyAdmin allows. But since it makes assumptions about the datatypes of the column name/column value doesn't allow to configure the datatype data should be read as on per cf basis, I couldn't make the best use of it. Are there any similar other tools out there that can do the job better ? Thanks, that's a great product but unfortunately doesn't work with windows. Any tools for windows ?
Re: two dimensional slicing
On Sun, 29 Jan 2012 23:26:52 +1300 aaron morton aa...@thelastpickle.com wrote: and compare them, but at this point I need to focus on one to get things working, so I'm trying to make a best initial guess. I would go for RP then, BOP may look like less work to start with but it *will* bite you later. If you use an increasing version number as a key you will get a hot spot. Get it working with RP and Standard CF's, accept the extra lookups, and then see if where you are performance / complexity wise. Cassandra can be pretty fast. The keys are (random uuid)-(version), because there are many lists and they already have a random id associated with them. Some of the lists will be much larger than others, but with the random prefix the large lists will be evenly distributed across the cluster. This is pretty much the same as having some rows that are bigger than others with RP. There is a small amount of other data that has non-random keys and would require an artificial MD5(key) prefix, but it's (at least currently) an insignificant subset of the total data. I do appreciate the warning though - if things change and we end up with a lot of keys that aren't naturally random, I can see how it would be a pain to manage. The reason I'm concerned about one more query (especially when it can't be done in parallel), is that the overarching structure is actually a tree, and the data payload under a name will often be a pointer to another list. Each query required in the list lookup will be repeated at each level. Anyway I don't want to turn this into a BOP vs RP thread. I'm really interested in the underlying modeling issues, and how it plays out using different partitioning is instructive. I'm willing to use BOP _if_ it has real concrete advantages, because it seems very unlikely to cause balance/hotspot issues for our application. That being said all other things being equal (or almost equal), I would use RP, and actually our latest design uses RP... I still don't really understand the problem, but I think you have many lists of names and when each list is updated you consider it a version. You then want to answer a query such as get all the names between foo and bar that were written to between version 100 and 200. Can this query can be re-written as get all the names between foo and bar that existed at version 200 and were created on or after version 100 ? There are two queries we need to answer - one is get the first N names from version V of list L starting at name n0 (chunked listing). The second is get name n from list L version V (or determine that it doesn't exist). In many cases the list is too big to re-write on every update, so storing deltas instead of the whole thing becomes attractive. There can be additions, deletes, updates, and renames (which can be modeled as deletion + addition). A background process creates complete lists from the deltas at certain versions (compacts), to prevent having to replay the entire history. The queries are actually done based on timestamp, not on a specific version (e.g. what was the state of the list at time T). The passed timestamp won't in general correspond to the time of an update. With this model, fetching a chunk of a list version requires pulling the range of names from the most recent complete/compacted list less than or equal to the desired version, and fetching the relevent deltas between that and the desired version. Fetching the relevent deltas is where it gets complicated. We've gone through many iterations - this is our latest model (very much still subject to change): CF: List row key: list id (random uuid) columns: latest version and unversioned meta data about the list CF: ListVersionIndex row key: (list id) columns: ts - version, compact? CF: ListCompact row key: (list id)-(version) columns: name - associated data CF: ListDelta row key: (list id)-(version) columns: name - operation (create, delete, update) + associated data With BOP and timestamp versions, ListVersionIndex isn't necessary - a row range scan can be done to get the latest compact list, and then another to get all the deltas since compaction, all with an appropriate column offset and limit. Timestamp versions make cleaning up partial updates more complicated though, since the versions numbers aren't known. With RP, the idea is to query many versions in ListVersionIndex starting at the desired version going backward, hoping that it will hit a compact version. We could also maintain a separate CompactVersion index, and accept another query. In any case I think this model demonstrates a key point about two dimensional range queries - RP really only requires one extra query on an index to do get the row range, and then replaces the BOP row range query with a multi get. Multi get can be done in parallel (correct me if I'm wrong?), so it seems reasonable that in some cases it could actually be faster than the row range query (but still at the cost of the extra RTT
recovering from network partition
I'm trying to work through various failure modes to figure out the proper operating procedure and proper client coding practices. I'm a little unclear about what happens when a network partition gets repaired. Take the following scenario: - cluster with 5 nodes: A thru E; RF = 3; read_cf = 1; write_cf = 1 - network partition divides A-C off from D-E - operation continues on both sides, obviously some data is unavailable from D-E - hinted handoffs accumulate Now the network partition is repaired. The question I have is what is the sequencing of events, in particular between processing HH and forwarding read requests across the former partition. I'm hoping that there is a time period to process HH *before* nodes forward requests. E.g. it would be really good for A not to forward read requests to D until D is done with HH processing. Otherwise, clients of A may see a discontinuity where data that was available during the partition see it go away and then come back. Is there a manual or wiki section that discusses some of this and I just missed it?
Re: two dimensional slicing
(not trolling) but do you have any ideas on how ? The token produced by the partitioner is used as the key in the distributed hash table so we can map keys to nodes, and evenly distribute load. If the range of tokens for the DHT are infinite it's difficult to evenly map them to a finite set of nodes. So… If you know that the number of DHT keys (and so row keys) are finite then it is easier to use the BOP. Or if you know that the row keys are something like a time series you could use the sort of approach used with Horizontal Partitioning in a RDBMS and run a sliding window of nodes. Every month drop the oldest partition / node off the end and add a new one for the next month. Just some thoughts. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/01/2012, at 7:19 PM, Terje Marthinussen wrote: On Sun, Jan 29, 2012 at 7:26 PM, aaron morton aa...@thelastpickle.com wrote: and compare them, but at this point I need to focus on one to get things working, so I'm trying to make a best initial guess. I would go for RP then, BOP may look like less work to start with but it *will* bite you later. If you use an increasing version number as a key you will get a hot spot. Get it working with RP and Standard CF's, accept the extra lookups, and then see if where you are performance / complexity wise. Cassandra can be pretty fast. Of course, there is no guarantee that it will bite you. Whatever data hotspot you may get may very well be minor vs. the advantage of slicing continous blocks of data on a single server vs. random bits and pieces all over the place. For instance, there are many large data repositories out there of analytic data which only have a few queries per hour. BOP will most likely have no performance at all for many of these, indeed, it may be much faster than the alternatives. BOP is very useful and powerful for many things and saves a fair chunk of development time vs. the alternatives when you can use it. If we really want everybody to stop using it, we should change cassandra so it by default can provide the same function in some other way without adding days and maybe weeks of development and extra complexity to your project. Terje
Re: recovering from network partition
If you are working at CF ONE you are accepting that *any* value for a key+col combination stored on a replica for a row is a valid response, and that includes no value. After the nodes have detected the others are UP they will start their HH in a staggered fashion, and will rate limit themselves to avoid overwhelming the node. It may take some time to complete. Otherwise, clients of A may see a discontinuity where data that was available during the partition see it go away and then come back. If you are concerned about reads been consistent, then use CL QUORUM. If you are reading at CL ONE (in 1.0* ) the read will go one replica 90% of the time, and you will only get the result from that one replica. Which may be any value the key+col has been set to including no value. The other 10% of the time Read Repair will kick in (this is the configured value for read_repair_chance in 1.0, you can change this value). The purpose of RR is to make is so that the next time a read happens the data is consistent. So reading the CL ONE the read will go to all nodes, you will get a response from one and only one of them. In the background the responses from the others will be checked and consistency repaired. If you were working at a higher CL the responses from CL nodes are checked as part of the read request, synchronous to the read, and you get a consistent result from all nodes. RR may still run in the background and CL nodes may be less than RF nodes. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/01/2012, at 6:51 AM, Thorsten von Eicken wrote: I'm trying to work through various failure modes to figure out the proper operating procedure and proper client coding practices. I'm a little unclear about what happens when a network partition gets repaired. Take the following scenario: - cluster with 5 nodes: A thru E; RF = 3; read_cf = 1; write_cf = 1 - network partition divides A-C off from D-E - operation continues on both sides, obviously some data is unavailable from D-E - hinted handoffs accumulate Now the network partition is repaired. The question I have is what is the sequencing of events, in particular between processing HH and forwarding read requests across the former partition. I'm hoping that there is a time period to process HH *before* nodes forward requests. E.g. it would be really good for A not to forward read requests to D until D is done with HH processing. Otherwise, clients of A may see a discontinuity where data that was available during the partition see it go away and then come back. Is there a manual or wiki section that discusses some of this and I just missed it?
Re: How much has Cassandra improved from 0.8.6 to 1.0+?
Well as they say Lies, damned lies, and statistics This is a alternate comparison you can review: http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/ YCSB is a known and agreed upon benchmark. The benchmark you link includes no sourcecode to reproduce with and as the author mentions For Cassandra this was single node cluster, for Mongo simply one server with no replication. Cluster tests were run for functionality. -Jake On Mon, Jan 30, 2012 at 1:56 PM, Kevin klawso...@gmail.com wrote: I’m currently using 0.8.6 and want to know how much (performance wise), Cassandra has improved. Specifically read performance. This benchmarkhttp://amesar.wordpress.com/2011/10/19/mongodb-vs-cassandra-benchmarks/here illustrates my concerns. I don’t know whether it was a fair comparison (especially since the conductor did not perform any tweaks or optimizations beforehand), but from all the resources I’ve read it seems that Cassandra still has quite a way to go before matching the read performance of MongoDB and some of the other NoSQL alternatives. ** ** Is this still true, and if so, how far down the line can we expect to see work on this specific area? -- http://twitter.com/tjake
Re: two dimensional slicing
On Mon, 30 Jan 2012 11:14:37 -0600 Bryce Allen bal...@ci.uchicago.edu wrote: With RP, the idea is to query many versions in ListVersionIndex starting at the desired version going backward, hoping that it will hit a compact version. We could also maintain a separate CompactVersion index, and accept another query. Actually a better way to handle this is to store the latest compacted version with each delta version in the index. When doing compaction, all the deltas between it and the next compaction (or end) are updated to point at the new compaction. E.g.: ts0: 20;20 - compacted version ts1: 21;20 ts2: 22;20 ... ts9: 29;20 ts10: 30;20 ts11: 31;20 compaction is done on version 30: ... ts9: 29;20 ts10: 30;30 - new compacted version ts11: 31;30 Perhaps compaction is a bad term because it already has meaning in Cassandra, but I can't think of a better name at the moment. -Bryce signature.asc Description: PGP signature
Re: how stable is 1.0 these days?
Could you also elaborate for creating/dropping column families? We're currently working on moving to 1.0 and using dynamically created tables, so I'm very interested in what issues we might encounter. So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is that dropping a cf may sometimes fail with UnavailableException. I think this happens when the cf is busy being compacted. When I sleep/retry within a loop it eventually succeeds. Thanks, Jim On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote: Can you elaborate on the composite types instabilities ? is this specific to hector as the radim's posts suggests ? These one liner answers are quite stressful :) On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pirescarlopi...@gmail.com wrote: If you need to use composite types and create/drop column families on the fly you must be prepared to instabilities.
Re: SSTable compaction issue in our system
Thanks Aaron for the perfect explanation. Decided to go with automatic compaction. Thanks again. On Wed, Jan 25, 2012 at 11:19 AM, aaron morton aa...@thelastpickle.comwrote: The issue with major / manual compaction is that it creates a one file. One big old file. That one file will not be compacted unless there are (min_compaction_threshold -1) other files of a similar size. So thombstones and overwrites in that file may not be purged for a long time. If you go down the manual compaction path you need to keep doing it. If you feel you need to do it do it, otherwise let automatic compaction do it's thing. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/01/2012, at 12:47 PM, Roshan wrote: Thanks for the reply. Is the major compaction not recommended for Cassandra 1.0.6? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Different-size-of-SSTable-are-remain-in-the-system-without-compact-tp7218239p7222322.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
WARN [Memtable] live ratio
Hi All Time to time I am seen this below warning in Cassandra logs. WARN [Memtable] setting live ratio to minimum of 1.0 instead of 0.21084217381985554 Not sure what the exact cause for this and the solution to eliminate this. Any help is appreciated. Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/WARN-Memtable-live-ratio-tp7238582p7238582.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: WARN [Memtable] live ratio
I have the same experience. Wondering what's causing this? One thing I noticed is that this happens if server is idle for some time and then load starts going high is when I start to see these messages. On Mon, Jan 30, 2012 at 4:54 PM, Roshan codeva...@gmail.com wrote: Hi All Time to time I am seen this below warning in Cassandra logs. WARN [Memtable] setting live ratio to minimum of 1.0 instead of 0.21084217381985554 Not sure what the exact cause for this and the solution to eliminate this. Any help is appreciated. Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/WARN-Memtable-live-ratio-tp7238582p7238582.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: WARN [Memtable] live ratio
Exactly, I am also getting this when server moving idle to high load. May be Cassandra Experts can help to us. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/WARN-Memtable-live-ratio-tp7238582p7238603.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Any tools like phpMyAdmin to see data stored in Cassandra ?
I think his development environment is windows. On Mon, Jan 30, 2012 at 7:29 PM, R. Verlangen ro...@us2.nl wrote: You might run it from a VM? 2012/1/30 Ertio Lew ertio...@gmail.com On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael michael.fri...@nuance.com wrote: OpsCenter? http://www.datastax.com/products/opscenter - Mike I have tried Sebastien's phpmyAdmin For Cassandrahttps://github.com/sebgiroux/Cassandra-Cluster-Admin to see the data stored in Cassandra in the same manner as phpMyAdmin allows. But since it makes assumptions about the datatypes of the column name/column value doesn't allow to configure the datatype data should be read as on per cf basis, I couldn't make the best use of it. Are there any similar other tools out there that can do the job better ? Thanks, that's a great product but unfortunately doesn't work with windows. Any tools for windows ? -- Best Regards Bob Bao baohan...@gmail.com
Re: how stable is 1.0 these days?
I'm not sure what Carlo is referring to, but generally if you have done, thousands of migrations you can end up in a situation where the migrations take a long time to replay, and there are some race conditions that can be problematic in the case where there are thousands of migrations that may need to be replayed while a node is bootstrapped. If you get into this situation it can be fixed by copying migrations from a known good schema to the node that you are trying to bootstrap. Generally I would advise against frequent schema updates. Unlike rows in column families the schema itself is designed to be relatively static. On Mon, Jan 30, 2012 at 2:14 PM, Jim Newsham jnews...@referentia.comwrote: Could you also elaborate for creating/dropping column families? We're currently working on moving to 1.0 and using dynamically created tables, so I'm very interested in what issues we might encounter. So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is that dropping a cf may sometimes fail with UnavailableException. I think this happens when the cf is busy being compacted. When I sleep/retry within a loop it eventually succeeds. Thanks, Jim On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote: Can you elaborate on the composite types instabilities ? is this specific to hector as the radim's posts suggests ? These one liner answers are quite stressful :) On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pirescarlopi...@gmail.com wrote: If you need to use composite types and create/drop column families on the fly you must be prepared to instabilities. -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: Any tools like phpMyAdmin to see data stored in Cassandra ?
On Sun, Jan 29, 2012 at 11:52 PM, Ertio Lew ertio...@gmail.com wrote: On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael michael.fri...@nuance.com wrote: OpsCenter? http://www.datastax.com/products/opscenter Thanks, that's a great product but unfortunately doesn't work with windows. Now it does: http://www.datastax.com/products/opscenter/platforms -Brandon