Effective allocation of multiple disks
Based on the documentation, it is clear that with Cassandra you want to have one disk for commitlog, and one disk for data. My question is: If you think your workload is going to require more io performance to the data disks than a single disk can handle, how would you recommend effectively utilizing additional disks? It would seem a number of vendors sell 1U boxes with four 3.5 inch disks. If we use one for commitlog, is there a way to have Cassandra itself equally split data across the three remaining disks? Or is this something that needs to be handled by the hardware level, or operating system/file system level? Options include a hardware RAID controller in a RAID 0 stripe (this is more $$$ and for what gain?), or utilizing a volume manager like LVM. Along those same lines, if you do implement some type of striping, what RAID stripe size is recommended? (I think Todd Burruss asked this earlier but I did not see a response) Thanks for any input! -Eric
RE: Effective allocation of multiple disks
You can list multiple DataFileDirectories, and Cassandra will scatter files across all of them. Use 1 disk for the commitlog, and 3 disks for data directories. See http://wiki.apache.org/cassandra/CassandraHardware#Disk Thanks, Stu -Original Message- From: Eric Rosenberry epros...@gmail.com Sent: Wednesday, March 10, 2010 2:00am To: cassandra-user@incubator.apache.org Subject: Effective allocation of multiple disks Based on the documentation, it is clear that with Cassandra you want to have one disk for commitlog, and one disk for data. My question is: If you think your workload is going to require more io performance to the data disks than a single disk can handle, how would you recommend effectively utilizing additional disks? It would seem a number of vendors sell 1U boxes with four 3.5 inch disks. If we use one for commitlog, is there a way to have Cassandra itself equally split data across the three remaining disks? Or is this something that needs to be handled by the hardware level, or operating system/file system level? Options include a hardware RAID controller in a RAID 0 stripe (this is more $$$ and for what gain?), or utilizing a volume manager like LVM. Along those same lines, if you do implement some type of striping, what RAID stripe size is recommended? (I think Todd Burruss asked this earlier but I did not see a response) Thanks for any input! -Eric
Re: Effective allocation of multiple disks
Ahh, thanks! I had read that, but I had assumed the reference to use one or more devices for DataFileDirectories was referring to somehow making multiple physical devices into one logical device via some underlying RAID system. So then as far as free space on the disks go, I have seen references to keeping utilization below 50% to handle compaction. Would it not be true to say that you only need as much free space as the to handle another copy of the largest data file you have? (i.e. perhaps less than 50% of the disk) Due to the compaction space requirement, would it be more efficient to do RAID 0 somewhere under the hood? Just simply being able to specify multiple DataFileDirectories does does indeed sound appealing... Thanks. -Eric On Wed, Mar 10, 2010 at 12:08 AM, Stu Hood stu.h...@rackspace.com wrote: You can list multiple DataFileDirectories, and Cassandra will scatter files across all of them. Use 1 disk for the commitlog, and 3 disks for data directories. See http://wiki.apache.org/cassandra/CassandraHardware#Disk Thanks, Stu -Original Message- From: Eric Rosenberry epros...@gmail.com Sent: Wednesday, March 10, 2010 2:00am To: cassandra-user@incubator.apache.org Subject: Effective allocation of multiple disks Based on the documentation, it is clear that with Cassandra you want to have one disk for commitlog, and one disk for data. My question is: If you think your workload is going to require more io performance to the data disks than a single disk can handle, how would you recommend effectively utilizing additional disks? It would seem a number of vendors sell 1U boxes with four 3.5 inch disks. If we use one for commitlog, is there a way to have Cassandra itself equally split data across the three remaining disks? Or is this something that needs to be handled by the hardware level, or operating system/file system level? Options include a hardware RAID controller in a RAID 0 stripe (this is more $$$ and for what gain?), or utilizing a volume manager like LVM. Along those same lines, if you do implement some type of striping, what RAID stripe size is recommended? (I think Todd Burruss asked this earlier but I did not see a response) Thanks for any input! -Eric
CassandraHardware link on the wiki FrontPage
Would it be possible to add a link to the CassandraHardware page from the FrontPage of the wiki? I think other new folks to Cassandra may find it useful. ;-) (I would do it myself, though that page is Immutable) http://wiki.apache.org/cassandra/FrontPage http://wiki.apache.org/cassandra/CassandraHardware Thanks! -Eric
RE: CassandraHardware link on the wiki FrontPage
Anyone can edit any page once they have an account: click the Login link at the top right next to the search box to create an account. Thanks, Stu -Original Message- From: Eric Rosenberry e...@rosenberry.org Sent: Wednesday, March 10, 2010 2:52am To: cassandra-user@incubator.apache.org Subject: CassandraHardware link on the wiki FrontPage Would it be possible to add a link to the CassandraHardware page from the FrontPage of the wiki? I think other new folks to Cassandra may find it useful. ;-) (I would do it myself, though that page is Immutable) http://wiki.apache.org/cassandra/FrontPage http://wiki.apache.org/cassandra/CassandraHardware Thanks! -Eric
Re: schema design question
Well, I don't like clunky and I'm java friendly. I'll go for the abstract class. Thanks for the help. On Tue, Mar 9, 2010 at 7:33 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari matteo.capr...@gmail.com wrote: On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis jbel...@gmail.com wrote: That's true. So you'd want to use a custom comparator where first 64 bits is the Long and the rest is the userid, for instance. (Long + something else is common enough that we might want to add it to the defaults...) What about using a SuperColumn for each like-count and then the list of users that hit that level? That would also work, it's just a little clunky pulling things out of a nested structure when really you want a flat list. But if you are allergic to Java that is the way to go so you don't have to write a custom AbstractType subclass. :) -Jonathan -- :Matteo Caprari matteo.capr...@gmail.com
Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'
Well, I've found the reason. The default cassandra configuration use a 10% row cache. And the row cache reads all the row each time. So it was indeed reading the full row each time even though the request was asking for only one column. My bad (at least I learned something). -- Sylvain On Tue, Mar 9, 2010 at 9:49 PM, Brandon Williams dri...@gmail.com wrote: On Tue, Mar 9, 2010 at 2:28 PM, Sylvain Lebresne sylv...@yakaz.com wrote: A row causes a disk seek while columns are contiguous. So if the row isn't in the cache, you're being impaired by the seeks. In general, fatter rows should be more performant than skinny ones. Sure, I understand that. Still, I get 400 columns by seconds (ie, 400 seeks by seconds) when the rows only have one column by row, while I have 10 columns by seconds when the row have 100 columns, even though I read only the first column. Doesn't that imply the disk is having to seek further for the rows with more columns? -Brandon
Login Failure Error
hello, I have just download the source code from the trunk using svn, I have set up the following configuration Created a different user and group named cassandra When i do *cassandra -f* the following is the output I get INFO 18:02:16,697 Auto DiskAccessMode determined to be standard INFO 18:02:16,995 Saved Token not found. Using 4812241153415237834436824812586788175 INFO 18:02:17,008 Creating new commitlog segment /u02/cassandra/commitlog/CommitLog-1268224337008.log INFO 18:02:17,105 Starting up server gossip INFO 18:02:17,163 Binding thrift service to localhost/127.0.0.1:9160 INFO 18:02:17,169 Cassandra starting up... and next i ran* cassandra-cli --host 127.0.0.1 --port 9160* I get the following Login failure. Did you specify 'keyspace', 'username' and 'password'? Welcome to cassandra CLI. Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. What could have gone wrong ?
Re: Login Failure Error Attached to storage-conf.xml file
Shirish Reddy P (Student) Indian Institute Of Information Technology, Allahabad Mob No. +919651418099 On Wed, Mar 10, 2010 at 6:16 PM, shirish shirishredd...@gmail.com wrote: hello, I have just download the source code from the trunk using svn, I have set up the following configuration Created a different user and group named cassandra When i do *cassandra -f* the following is the output I get INFO 18:02:16,697 Auto DiskAccessMode determined to be standard INFO 18:02:16,995 Saved Token not found. Using 4812241153415237834436824812586788175 INFO 18:02:17,008 Creating new commitlog segment /u02/cassandra/commitlog/CommitLog-1268224337008.log INFO 18:02:17,105 Starting up server gossip INFO 18:02:17,163 Binding thrift service to localhost/127.0.0.1:9160 INFO 18:02:17,169 Cassandra starting up... and next i ran* cassandra-cli --host 127.0.0.1 --port 9160* I get the following Login failure. Did you specify 'keyspace', 'username' and 'password'? Welcome to cassandra CLI. Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. What could have gone wrong ? !-- ~ Licensed to the Apache Software Foundation (ASF) under one ~ or more contributor license agreements. See the NOTICE file ~ distributed with this work for additional information ~ regarding copyright ownership. The ASF licenses this file ~ to you under the Apache License, Version 2.0 (the ~ License); you may not use this file except in compliance ~ with the License. You may obtain a copy of the License at ~ ~http://www.apache.org/licenses/LICENSE-2.0 ~ ~ Unless required by applicable law or agreed to in writing, ~ software distributed under the License is distributed on an ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY ~ KIND, either express or implied. See the License for the ~ specific language governing permissions and limitations ~ under the License. -- Storage !--==-- !-- Basic Configuration -- !--==-- !-- ~ The name of this cluster. This is mainly used to prevent machines in ~ one logical cluster from joining another. -- ClusterNameTest Cluster/ClusterName !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap !-- ~ Keyspaces and ColumnFamilies: ~ A ColumnFamily is the Cassandra concept closest to a relational ~ table. Keyspaces are separate groups of ColumnFamilies. Except in ~ very unusual circumstances you will have one Keyspace per application. ~ There is an implicit keyspace named 'system' for Cassandra internals. -- Keyspaces Keyspace Name=Keyspace1 !-- ~ ColumnFamily definitions have one required attribute (Name) ~ and several optional ones. ~ ~ The CompareWith attribute tells Cassandra how to sort the columns ~ for slicing operations. The default is BytesType, which is a ~ straightforward lexical comparison of the bytes in each column. ~ Other options are AsciiType, UTF8Type, LexicalUUIDType, TimeUUIDType, ~ and LongType. You can also specify the fully-qualified class ~ name to a class of your choice extending ~ org.apache.cassandra.db.marshal.AbstractType. ~ ~ SuperColumns have a similar CompareSubcolumnsWith attribute. ~ ~ BytesType: Simple sort by byte value. No validation is performed. ~ AsciiType: Like BytesType, but validates that the input can be ~parsed as US-ASCII. ~ UTF8Type: A string encoded as UTF8 ~ LongType: A 64bit long ~ LexicalUUIDType: A 128bit UUID, compared lexically (by byte value) ~ TimeUUIDType: a 128bit version 1 UUID, compared by timestamp ~ ~ (To get the closest approximation to 0.3-style supercolumns, you ~ would use CompareWith=UTF8Type CompareSubcolumnsWith=LongType.) ~ ~ An optional `Comment` attribute may be used to attach additional ~
Re: Login Failure Error
Please don't use trunk unless you're actively fixing bugs. If you want the latest greatest, get the 0.6 branch from svn. On Wed, Mar 10, 2010 at 6:46 AM, shirish shirishredd...@gmail.com wrote: hello, I have just download the source code from the trunk using svn, I have set up the following configuration Created a different user and group named cassandra When i do cassandra -f the following is the output I get INFO 18:02:16,697 Auto DiskAccessMode determined to be standard INFO 18:02:16,995 Saved Token not found. Using 4812241153415237834436824812586788175 INFO 18:02:17,008 Creating new commitlog segment /u02/cassandra/commitlog/CommitLog-1268224337008.log INFO 18:02:17,105 Starting up server gossip INFO 18:02:17,163 Binding thrift service to localhost/127.0.0.1:9160 INFO 18:02:17,169 Cassandra starting up... and next i ran cassandra-cli --host 127.0.0.1 --port 9160 I get the following Login failure. Did you specify 'keyspace', 'username' and 'password'? Welcome to cassandra CLI. Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. What could have gone wrong ?
Re: Login Failure Error
Every thing ran fine using the stable release. I wanted to start contributing and hence downloaded the source code. What could possibly be giving this error ? On Wed, Mar 10, 2010 at 6:49 PM, Jonathan Ellis jbel...@gmail.com wrote: Please don't use trunk unless you're actively fixing bugs. If you want the latest greatest, get the 0.6 branch from svn. On Wed, Mar 10, 2010 at 6:46 AM, shirish shirishredd...@gmail.com wrote: hello, I have just download the source code from the trunk using svn, I have set up the following configuration Created a different user and group named cassandra When i do cassandra -f the following is the output I get INFO 18:02:16,697 Auto DiskAccessMode determined to be standard INFO 18:02:16,995 Saved Token not found. Using 4812241153415237834436824812586788175 INFO 18:02:17,008 Creating new commitlog segment /u02/cassandra/commitlog/CommitLog-1268224337008.log INFO 18:02:17,105 Starting up server gossip INFO 18:02:17,163 Binding thrift service to localhost/127.0.0.1:9160 INFO 18:02:17,169 Cassandra starting up... and next i ran cassandra-cli --host 127.0.0.1 --port 9160 I get the following Login failure. Did you specify 'keyspace', 'username' and 'password'? Welcome to cassandra CLI. Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. What could have gone wrong ?
exception with python client
Hi. On Cassandra 0.6 beta-2 I have this schema: Keyspace Name=KS ColumnFamily Name=Users CompareWith=BytesType/ ColumnFamily Name=Items CompareWith=BytesType ColumnType=Super CompareSubcolumnsWith=BytesType/ I'm trying the batch_mutate api using python: socket = TSocket.TSocket(localhost, 9160) transport = TTransport.TBufferedTransport(socket) protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport) client = Cassandra.Client(protocol) transport.open() m = { 'exmpl_item_id': { 'Items': [Mutation(ColumnOrSuperColumn(super_column=SuperColumn('users',[Column('name','matteo')])))] }} client.batch_mutate('KS', m, ConsistencyLevel.ONE) I get an exception, but it's a shy one and can't figure out what is that I'm doing wrong. Thanks. Traceback (most recent call last): File test-migrate.py, line 23, in module client.batch_mutate('KS', m, ConsistencyLevel.ONE) File /Users/dikappa/Documents/workspace/likelike/python/cassandra/Cassandra.py, line 771, in batch_mutate self.recv_batch_mutate() File /Users/dikappa/Documents/workspace/likelike/python/cassandra/Cassandra.py, line 784, in recv_batch_mutate (fname, mtype, rseqid) = self._iprot.readMessageBegin() File build/bdist.macosx-10.6-i386/egg/thrift/protocol/TBinaryProtocol.py, line 126, in readMessageBegin File build/bdist.macosx-10.6-i386/egg/thrift/protocol/TBinaryProtocol.py, line 203, in readI32 File build/bdist.macosx-10.6-i386/egg/thrift/transport/TTransport.py, line 58, in readAll File build/bdist.macosx-10.6-i386/egg/thrift/transport/TTransport.py, line 155, in read File build/bdist.macosx-10.6-i386/egg/thrift/transport/TSocket.py, line 94, in read thrift.transport.TTransport.TTransportException: None -- :Matteo Caprari matteo.capr...@gmail.com
Re: exception with python client
On Wed, Mar 10, 2010 at 08:33, Matteo Caprari matteo.capr...@gmail.com wrote: protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport) client = Cassandra.Client(protocol) transport.open() before attempting the mutation, try adding: client.transport = transport Gary.
RE: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'
So did you disable the row cache entirely? From: Sylvain Lebresne Well, I've found the reason. The default cassandra configuration use a 10% row cache. And the row cache reads all the row each time. So it was indeed reading the full row each time even though the request was asking for only one column. Sylvain
Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'
So did you disable the row cache entirely? Yes (getting back reasonable performances). From: Sylvain Lebresne Well, I've found the reason. The default cassandra configuration use a 10% row cache. And the row cache reads all the row each time. So it was indeed reading the full row each time even though the request was asking for only one column. Sylvain
Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'
For the record, I note that no row cache is the default on user-defined CFs; we include it in the sample configuration file as an example only. On Wed, Mar 10, 2010 at 9:58 AM, Sylvain Lebresne sylv...@yakaz.com wrote: So did you disable the row cache entirely? Yes (getting back reasonable performances). From: Sylvain Lebresne Well, I've found the reason. The default cassandra configuration use a 10% row cache. And the row cache reads all the row each time. So it was indeed reading the full row each time even though the request was asking for only one column. Sylvain
Re: cassandra 0.6.0 beta 2 download contains beta 1?
On Tue, 2010-03-09 at 12:38 -0800, Omer van der Horst Jansen wrote: The apache-cassandra-0.6.0-beta2-bin.tar.gz download contains both these files in the apache-cassandra-0.6.0-beta2/lib directory: apache-cassandra-0.6.0-beta1.jar apache-cassandra-0.6.0-beta2.jar Ugh, my bad. I must have failed to `clean' in between the aborted beta1 and beta2. Given the way the classpath is constructed, it's possible that anyone using this download is actually running beta 1 rather than beta 2... The only difference between beta1 and beta2 was a couple of bytes worth of version metadata, so it really wouldn't matter in this context. -- Eric Evans eev...@rackspace.com
Re: Effective allocation of multiple disks
Thanks for testing that, added a note to http://wiki.apache.org/cassandra/CassandraHardware on stripe size. On Wed, Mar 10, 2010 at 11:03 AM, B. Todd Burruss bburr...@real.com wrote: with the file sizes we're talking about with cassandra and other database products, the stripe size doesn't seem to matter. i suppose there may be a modicum of overhead with a small stripe size, but i'm not sure. mine is set to 128k, which produced the same results as 16k and 256k. i will say the number of drives within the RAID 0 setup does seem to matter. more you have the more parallelism you can get with a good RAID controller. Eric Rosenberry wrote: Based on the documentation, it is clear that with Cassandra you want to have one disk for commitlog, and one disk for data. My question is: If you think your workload is going to require more io performance to the data disks than a single disk can handle, how would you recommend effectively utilizing additional disks? It would seem a number of vendors sell 1U boxes with four 3.5 inch disks. If we use one for commitlog, is there a way to have Cassandra itself equally split data across the three remaining disks? Or is this something that needs to be handled by the hardware level, or operating system/file system level? Options include a hardware RAID controller in a RAID 0 stripe (this is more $$$ and for what gain?), or utilizing a volume manager like LVM. Along those same lines, if you do implement some type of striping, what RAID stripe size is recommended? (I think Todd Burruss asked this earlier but I did not see a response) Thanks for any input! -Eric
Re: schema design question
if you want to select stuff out w/ one query, then single CF is the only sane choice if not then 2 CFs may be more performant On Wed, Mar 10, 2010 at 4:42 AM, Matteo Caprari matteo.capr...@gmail.com wrote: I can't quite decide if to go with a flat schema, with keys repeated in different CFs or have one CF with nested supercolumns. I guess there is no straight answer here, but what's a good reasoning about the choice? This two mutation maps should clarify my dillemma: deep_mutation_map = { 'example_item': { 'Items': [ Mutation(SuperColumn('details', [ Column('title', 'an article'), Column('link', 'www.example.com') ])), Mutation(SuperColumn('likers', [ Column('user_1', 'xx'), Column('user_2', 'xx') ])) ] } } flat_mutation_map = { 'example_item': { 'Item_Info': [ Mutation(Column('title', 'an_article')), Mutation(Column('link', 'www.example.com')), ], 'Item_likers': [ Mutation(Column('user_1', 'xx')), Mutation(Column('user_2', 'xx')) ] } } On Tue, Mar 9, 2010 at 7:33 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari matteo.capr...@gmail.com wrote: On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis jbel...@gmail.com wrote: That's true. So you'd want to use a custom comparator where first 64 bits is the Long and the rest is the userid, for instance. (Long + something else is common enough that we might want to add it to the defaults...) What about using a SuperColumn for each like-count and then the list of users that hit that level? That would also work, it's just a little clunky pulling things out of a nested structure when really you want a flat list. But if you are allergic to Java that is the way to go so you don't have to write a custom AbstractType subclass. :) -Jonathan -- :Matteo Caprari matteo.capr...@gmail.com
Re: exception with python client
There was indeed a very clear message in the logs. I was missing the timestamp in the Column declaration. Thanks On Wed, Mar 10, 2010 at 3:42 PM, Eric Evans eev...@rackspace.com wrote: On Wed, 2010-03-10 at 14:33 +, Matteo Caprari wrote: I get an exception, but it's a shy one and can't figure out what is that I'm doing wrong. Thanks. Traceback (most recent call last): File test-migrate.py, line 23, in module client.batch_mutate('KS', m, ConsistencyLevel.ONE) File /Users/dikappa/Documents/workspace/likelike/python/cassandra/Cassandra.py, line 771, in batch_mutate self.recv_batch_mutate() File /Users/dikappa/Documents/workspace/likelike/python/cassandra/Cassandra.py, line 784, in recv_batch_mutate (fname, mtype, rseqid) = self._iprot.readMessageBegin() File build/bdist.macosx-10.6-i386/egg/thrift/protocol/TBinaryProtocol.py, line 126, in readMessageBegin File build/bdist.macosx-10.6-i386/egg/thrift/protocol/TBinaryProtocol.py, line 203, in readI32 File build/bdist.macosx-10.6-i386/egg/thrift/transport/TTransport.py, line 58, in readAll File build/bdist.macosx-10.6-i386/egg/thrift/transport/TTransport.py, line 155, in read File build/bdist.macosx-10.6-i386/egg/thrift/transport/TSocket.py, line 94, in read thrift.transport.TTransport.TTransportException: None I believe this simply means that the read didn't return a response. Start by checking the cassandra logs to see if there are any exceptions, and double check your connection parameters, network setup, etc. -- Eric Evans eev...@rackspace.com -- :Matteo Caprari matteo.capr...@gmail.com
Re: Hackathon?!?
Sweet I'm in! Is there going to be a more formal invite? If not, can we get the details on where Digg is and where at Digg? Peter On Tue, Mar 9, 2010 at 9:28 PM, Dan Di Spaltro dan.dispal...@gmail.comwrote: Great, that would probably get us a lot more room. Sweet, so its settled, we'll do it at Digg WHQ! On Tue, Mar 9, 2010 at 9:13 PM, Chris Goffinet goffi...@digg.com wrote: +1 from Digg if you wanna have it at our place as well, got the OK from the boss. -Chris On Mar 9, 2010, at 6:05 PM, Dan Di Spaltro wrote: Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as, Tax day! We can host it here at Cloudkick, unless a cooler startup wants to host it. http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=19 http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=191499 Potrero Ave San Francisco CA 94110 Bottom line, it would be great to get some folks together and spend some time doing an intro, cover some deployments, data models and try to address all the other burning questions out there. We pushed it out from PyCON and hopefully settled on a good day, lets get a count for how many folks are interested! Thanks, On Tue, Feb 9, 2010 at 3:10 PM, Reuben Smith reuben.sm...@gmail.comwrote: I live in the city and I'd like to add my vote for an Intro to Cassandra night. Reuben On Tue, Feb 9, 2010 at 10:43 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I think the tentative plans would be to push this out a bit farther away from PyCon, to get a bigger attendance. It sounds like an Intro to Cassandra would be a better theme; focus on the education piece. But it will happen! So stay tuned. On Tue, Feb 9, 2010 at 3:53 AM, Wayne Lewis wa...@lewisclan.org wrote: Hi Dan, Are you still planning for end of Feb? Please add me to the very interested list. Thanks! Wayne Lewis On Jan 26, 2010, at 8:42 PM, Dan Di Spaltro wrote: Would anyone be interested in a Cassandra hack-a-thon at the end of February in San Francisco? I think it would be great to get everyone together, since the last hack-a-thon was at the Twitter office back around OSCON time. We could provide space in the Mission area or someone else could too, our office is in a pretty interesting area ( http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=17 ). Tell me what you guys think! -- Dan Di Spaltro -- Dan Di Spaltro -- Dan Di Spaltro -- Dan Di Spaltro
Re: Hackathon?!?
I'm in either way, but if we push it a week later then the twitter guys could (a) make it and (b) pimp it at their own conference. On Wed, Mar 10, 2010 at 12:26 AM, Jeff Hodges jhod...@twitter.com wrote: Ah, hell. Thought this was the first day. Can't make it. -- Jeff On Mar 9, 2010 9:32 PM, Ryan King r...@twitter.com wrote: I'm already committed to talking about cassandra that day at our company's developer conference (chirp.twitter.com). -ryan On Tue, Mar 9, 2010 at 6:26 PM, Jeff Hodges jhod...@twitter.com wrote: I'm down. -- Jeff ...
Re: Hackathon?!?
I'll work on putting together the formal invite. Stay tuned. -Chris On Mar 10, 2010, at 9:54 AM, Peter Chang wrote: Sweet I'm in! Is there going to be a more formal invite? If not, can we get the details on where Digg is and where at Digg? Peter On Tue, Mar 9, 2010 at 9:28 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Great, that would probably get us a lot more room. Sweet, so its settled, we'll do it at Digg WHQ! On Tue, Mar 9, 2010 at 9:13 PM, Chris Goffinet goffi...@digg.com wrote: +1 from Digg if you wanna have it at our place as well, got the OK from the boss. -Chris On Mar 9, 2010, at 6:05 PM, Dan Di Spaltro wrote: Alright guys, we have settled on a date for the Cassandra meetup on... April 15th, better known as, Tax day! We can host it here at Cloudkick, unless a cooler startup wants to host it. http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=19 1499 Potrero Ave San Francisco CA 94110 Bottom line, it would be great to get some folks together and spend some time doing an intro, cover some deployments, data models and try to address all the other burning questions out there. We pushed it out from PyCON and hopefully settled on a good day, lets get a count for how many folks are interested! Thanks, On Tue, Feb 9, 2010 at 3:10 PM, Reuben Smith reuben.sm...@gmail.com wrote: I live in the city and I'd like to add my vote for an Intro to Cassandra night. Reuben On Tue, Feb 9, 2010 at 10:43 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I think the tentative plans would be to push this out a bit farther away from PyCon, to get a bigger attendance. It sounds like an Intro to Cassandra would be a better theme; focus on the education piece. But it will happen! So stay tuned. On Tue, Feb 9, 2010 at 3:53 AM, Wayne Lewis wa...@lewisclan.org wrote: Hi Dan, Are you still planning for end of Feb? Please add me to the very interested list. Thanks! Wayne Lewis On Jan 26, 2010, at 8:42 PM, Dan Di Spaltro wrote: Would anyone be interested in a Cassandra hack-a-thon at the end of February in San Francisco? I think it would be great to get everyone together, since the last hack-a-thon was at the Twitter office back around OSCON time. We could provide space in the Mission area or someone else could too, our office is in a pretty interesting area (http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=17). Tell me what you guys think! -- Dan Di Spaltro -- Dan Di Spaltro -- Dan Di Spaltro -- Dan Di Spaltro
Re: cassandra 0.6.0 beta 2 download contains beta 1?
On Wed, Mar 10, 2010 at 11:30 AM, Eric Evans eev...@rackspace.com wrote: apache-cassandra-0.6.0-beta1.jar apache-cassandra-0.6.0-beta2.jar Ugh, my bad. I must have failed to `clean' in between the aborted beta1 and beta2. The beta2 also does not include the other support jar files like log4j. Not being a java person, I didn't know what to do so I just started my experimentation with the 0.5.1 release which has it all bundled.
Re: NoSQL live tomorrow
Hey Jonathan, What event is this and will it be livecasted/recorded? Cheers, Tim. On Thu, Mar 11, 2010 at 10:21 AM, Jonathan Ellis jbel...@gmail.com wrote: Ryan King and I will have 20 minutes to talk about Cassandra in the Lab part of the program. 20 minutes isn't enough to present a whole lot in a structured manner so we are planning to just do QA the whole time. So if you are going to be there, come with your questions. I will also bring a few slides about 0.6 / 0.7 features to kick things off in case we have a slow start. -Jonathan
Re: Strategy to delete/expire keys in cassandra
Hi Sylvain, I applied your patch to 0.5 but it seems that it's not compilable: 1) column.getTtl() is no defined in RowMutation.java public static RowMutation getRowMutation(String table, String key, MapString, ListColumnOrSuperColumn cfmap) { RowMutation rm = new RowMutation(table, key.trim()); for (Map.EntryString, ListColumnOrSuperColumn entry : cfmap.entrySet()) { String cfName = entry.getKey(); for (ColumnOrSuperColumn cosc : entry.getValue()) { if (cosc.column == null) { assert cosc.super_column != null; for (org.apache.cassandra.service.Column column : cosc.super_column.columns) { rm.add(new QueryPath(cfName, cosc.super_column.name, column.name), column.value, column.timestamp, column.getTtl()); } } else { assert cosc.super_column == null; rm.add(new QueryPath(cfName, null, cosc.column.name), cosc.column.value, cosc.column.timestamp, cosc.column.getTtl()); } } } return rm; } 2) CassandraServer.java: Column.setTtl() is not defined. if (column instanceof ExpiringColumn) { thrift_column.setTtl(((ExpiringColumn) column).getTimeToLive()); } 3) CliClient.java: type mismatch for ColumnParent thriftClient_.insert(tableName, key, new ColumnParent(columnFamily, superColumnName), new Column(columnName, value.getBytes(), System.currentTimeMillis()), ConsistencyLevel.ONE); It seems that the patch doesn't add getTtl()/setTtl() stuff to Column.java? Thanks, -Weijun -Original Message- From: Sylvain Lebresne [mailto:sylv...@yakaz.com] Sent: Thursday, February 25, 2010 2:23 AM To: Weijun Li Cc: cassandra-user@incubator.apache.org Subject: Re: Strategy to delete/expire keys in cassandra Hi, Should I just run command (in Cassandra 0.5 source folder?) like: patch –p1 –i 0001-Add-new-ExpiringColumn-class.patch for all of the five patches in your ticket? Well, actually I lied. The patches were made for a version a little after 0.5. If you really want to try, I attach a version of those patches that (should) work with 0.5 (There is only the 3 first patch, but the fourth one is for tests so not necessary per se). Apply them with your patch command. Still, to compile that you will have to regenerate the thrift java interface (with ant gen-thrift-java), but for that you will have to install the right svn revision of thrift (which is libthrift-r820831 for 0.5). And if you manage to make it work, you will have to digg in cassandra.thrift as it make change to it. In the end, remember that this is not an official patch yet and it *will not* make it in Cassandra in its current form. All I can tell you is that I need those expiring columns for quite some of my usage and I will do what I can to make this feature included if and when possible. Also what’s your opinion on extending ExpiringColumn to expire a key completely? Otherwise it will be difficult to track what are expired or old rows in Cassandra. I'm not sure how to make full rows (or even full superColumns for that matter) expire. What if you set a row to expire after some time and add new columns before this expiration ? Should you update the expiration of the row ? Which is to say that a row will expires when it's last column expire, which is almost what you get with expiring column. The one thing you may want though is that when all the columns of a row expire (or, to be precise, get physically deleted), the row itself is deleted. Looking at the code, I'm not convince this happen and I'm not sure why. -- Sylvain
Re: NoSQL live tomorrow
http://nosqlboston.eventbrite.com/ don't know about recording / casting plans. On Wed, Mar 10, 2010 at 3:25 PM, Tim Haines tmhai...@gmail.com wrote: Hey Jonathan, What event is this and will it be livecasted/recorded? Cheers, Tim. On Thu, Mar 11, 2010 at 10:21 AM, Jonathan Ellis jbel...@gmail.com wrote: Ryan King and I will have 20 minutes to talk about Cassandra in the Lab part of the program. 20 minutes isn't enough to present a whole lot in a structured manner so we are planning to just do QA the whole time. So if you are going to be there, come with your questions. I will also bring a few slides about 0.6 / 0.7 features to kick things off in case we have a slow start. -Jonathan
Re: Effective allocation of multiple disks
Yea, I suppose major compactions are the wildcard here. Nonetheless, the situation where you only have 1 SSTable should be very rare. I'll open a ticket though, because we really ought to be able to utilize those disks more thoroughly, and I have some ideas there. -Original Message- From: Anthony Molinaro antho...@alumni.caltech.edu Sent: Wednesday, March 10, 2010 3:38pm To: cassandra-user@incubator.apache.org Subject: Re: Effective allocation of multiple disks This is incorrect, as discussed a few weeks ago. I have a setup with multiple disks, and as soon as compaction occurs all the data ends up on one disk. If you need the additional io, you will want raid0. But simply listing multiple DataFileDirectories will not work. -Anthony On Wed, Mar 10, 2010 at 02:08:13AM -0600, Stu Hood wrote: You can list multiple DataFileDirectories, and Cassandra will scatter files across all of them. Use 1 disk for the commitlog, and 3 disks for data directories. See http://wiki.apache.org/cassandra/CassandraHardware#Disk Thanks, Stu -Original Message- From: Eric Rosenberry epros...@gmail.com Sent: Wednesday, March 10, 2010 2:00am To: cassandra-user@incubator.apache.org Subject: Effective allocation of multiple disks Based on the documentation, it is clear that with Cassandra you want to have one disk for commitlog, and one disk for data. My question is: If you think your workload is going to require more io performance to the data disks than a single disk can handle, how would you recommend effectively utilizing additional disks? It would seem a number of vendors sell 1U boxes with four 3.5 inch disks. If we use one for commitlog, is there a way to have Cassandra itself equally split data across the three remaining disks? Or is this something that needs to be handled by the hardware level, or operating system/file system level? Options include a hardware RAID controller in a RAID 0 stripe (this is more $$$ and for what gain?), or utilizing a volume manager like LVM. Along those same lines, if you do implement some type of striping, what RAID stripe size is recommended? (I think Todd Burruss asked this earlier but I did not see a response) Thanks for any input! -Eric -- Anthony Molinaro antho...@alumni.caltech.edu
Re: Testing row cache feature in trunk: write should put record in cache
Thanks for that, Daniel. I'm pretty heads down finishing off the last 0.6 issues right now, but this is on my list to get to. On Mon, Mar 8, 2010 at 1:25 PM, Daniel Kluesing d...@bluekai.com wrote: This is interesting for the use cases I'm looking at Cassandra for, so if that offer still stands I'll take you up on it. I took a crack at it in https://issues.apache.org/jira/browse/CASSANDRA-860 - also in large part to get my feet wet with the code. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Tuesday, February 16, 2010 9:22 PM To: cassandra-user@incubator.apache.org Subject: Re: Testing row cache feature in trunk: write should put record in cache ... tell you what, if you write the option-processing part in DatabaseDescriptor I will do the actual cache part. :) On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com wrote: https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but this is pretty low priority for me. On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote: Just tried to make quick change to enable it but it didn't work out :-( ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key()); // What I modified if( cachedRow == null ) { cfs.cacheRow(mutation.key()); cachedRow = cfs.getRawCachedRow(mutation.key()); } if (cachedRow != null) cachedRow.addAll(columnFamily); How can I open a ticket for you to make the change (enable row cache write through with an option)? Thanks, -Weijun On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote: Just started to play with the row cache feature in trunk: it seems to be working fine so far except that for RowsCached parameter you need to specify number of rows rather than a percentage (e.g., 20% doesn't work). 20% works, but it's 20% of the rows at server startup. So on a fresh start that is zero. Maybe we should just get rid of the % feature... (Actually, it shouldn't be hard to update this on flush, if you want to open a ticket.)
problem with running simple example using cassandra-cli with 0.6.0-beta2
I am checking out 0.6.0-beta2 since I need the batch-mutate function. I am just trying to run the example is the cassandra-cli Wiki: http://wiki.apache.org/cassandra/CassandraCli Here is what I am getting: cassandra set Keyspace1.Standard1['jsmith']['first'] = 'John' Value inserted. cassandra get Keyspace1.Standard1['jsmith'] = (column=6669727374, value=John, timestamp=1268261785077) Returned 1 results. The column name being returned by get (6669727374) does not match what is set (first). This is true for all column names. cassandra set Keyspace1.Standard1['jsmith']['last'] = 'Smith' Value inserted. cassandra set Keyspace1.Standard1['jsmith']['age'] = '42' Value inserted. cassandra get Keyspace1.Standard1['jsmith'] = (column=6c617374, value=Smith, timestamp=1268262480130) = (column=6669727374, value=John, timestamp=1268261785077) = (column=616765, value=42, timestamp=1268262484133) Returned 3 results. Is this a problem in 0.6.0-beta2 or am I doing anything wrong? Bill
Re: cassandra 0.6.0 beta 2 download contains beta 1?
I am building from source and found the same problem. I manually copied all the jar files from build/lib/jars to lib and that seems to do the trick. Bill On Wed, Mar 10, 2010 at 1:39 PM, Vick Khera vi...@khera.org wrote: On Wed, Mar 10, 2010 at 11:30 AM, Eric Evans eev...@rackspace.com wrote: apache-cassandra-0.6.0-beta1.jar apache-cassandra-0.6.0-beta2.jar Ugh, my bad. I must have failed to `clean' in between the aborted beta1 and beta2. The beta2 also does not include the other support jar files like log4j. Not being a java person, I didn't know what to do so I just started my experimentation with the 0.5.1 release which has it all bundled.
Re: problem with running simple example using cassandra-cli with 0.6.0-beta2
On Wed, Mar 10, 2010 at 5:09 PM, Bill Au bill.w...@gmail.com wrote: I am checking out 0.6.0-beta2 since I need the batch-mutate function. I am just trying to run the example is the cassandra-cli Wiki: http://wiki.apache.org/cassandra/CassandraCli Here is what I am getting: cassandra set Keyspace1.Standard1['jsmith']['first'] = 'John' Value inserted. cassandra get Keyspace1.Standard1['jsmith'] = (column=6669727374, value=John, timestamp=1268261785077) Returned 1 results. The column name being returned by get (6669727374) does not match what is set (first). This is true for all column names. cassandra set Keyspace1.Standard1['jsmith']['last'] = 'Smith' Value inserted. cassandra set Keyspace1.Standard1['jsmith']['age'] = '42' Value inserted. cassandra get Keyspace1.Standard1['jsmith'] = (column=6c617374, value=Smith, timestamp=1268262480130) = (column=6669727374, value=John, timestamp=1268261785077) = (column=616765, value=42, timestamp=1268262484133) Returned 3 results. Is this a problem in 0.6.0-beta2 or am I doing anything wrong? Bill This is normal. You've added the 'first', 'last', and 'age' columns to the 'jsmith' row, and then asked for the entire row, so you got all 3 columns back. -Brandon
Re: Strategy to delete/expire keys in cassandra
Never mind. Figured out I forgot to compile thrift :) Thanks, -Weijun On Wed, Mar 10, 2010 at 1:43 PM, Weijun Li weiju...@gmail.com wrote: Hi Sylvain, I applied your patch to 0.5 but it seems that it's not compilable: 1) column.getTtl() is no defined in RowMutation.java public static RowMutation getRowMutation(String table, String key, MapString, ListColumnOrSuperColumn cfmap) { RowMutation rm = new RowMutation(table, key.trim()); for (Map.EntryString, ListColumnOrSuperColumn entry : cfmap.entrySet()) { String cfName = entry.getKey(); for (ColumnOrSuperColumn cosc : entry.getValue()) { if (cosc.column == null) { assert cosc.super_column != null; for (org.apache.cassandra.service.Column column : cosc.super_column.columns) { rm.add(new QueryPath(cfName, cosc.super_column.name, column.name), column.value, column.timestamp, column.getTtl()); } } else { assert cosc.super_column == null; rm.add(new QueryPath(cfName, null, cosc.column.name), cosc.column.value, cosc.column.timestamp, cosc.column.getTtl()); } } } return rm; } 2) CassandraServer.java: Column.setTtl() is not defined. if (column instanceof ExpiringColumn) { thrift_column.setTtl(((ExpiringColumn) column).getTimeToLive()); } 3) CliClient.java: type mismatch for ColumnParent thriftClient_.insert(tableName, key, new ColumnParent(columnFamily, superColumnName), new Column(columnName, value.getBytes(), System.currentTimeMillis()), ConsistencyLevel.ONE); It seems that the patch doesn't add getTtl()/setTtl() stuff to Column.java? Thanks, -Weijun -Original Message- From: Sylvain Lebresne [mailto:sylv...@yakaz.com] Sent: Thursday, February 25, 2010 2:23 AM To: Weijun Li Cc: cassandra-user@incubator.apache.org Subject: Re: Strategy to delete/expire keys in cassandra Hi, Should I just run command (in Cassandra 0.5 source folder?) like: patch –p1 –i 0001-Add-new-ExpiringColumn-class.patch for all of the five patches in your ticket? Well, actually I lied. The patches were made for a version a little after 0.5. If you really want to try, I attach a version of those patches that (should) work with 0.5 (There is only the 3 first patch, but the fourth one is for tests so not necessary per se). Apply them with your patch command. Still, to compile that you will have to regenerate the thrift java interface (with ant gen-thrift-java), but for that you will have to install the right svn revision of thrift (which is libthrift-r820831 for 0.5). And if you manage to make it work, you will have to digg in cassandra.thrift as it make change to it. In the end, remember that this is not an official patch yet and it *will not* make it in Cassandra in its current form. All I can tell you is that I need those expiring columns for quite some of my usage and I will do what I can to make this feature included if and when possible. Also what’s your opinion on extending ExpiringColumn to expire a key completely? Otherwise it will be difficult to track what are expired or old rows in Cassandra. I'm not sure how to make full rows (or even full superColumns for that matter) expire. What if you set a row to expire after some time and add new columns before this expiration ? Should you update the expiration of the row ? Which is to say that a row will expires when it's last column expire, which is almost what you get with expiring column. The one thing you may want though is that when all the columns of a row expire (or, to be precise, get physically deleted), the row itself is deleted. Looking at the code, I'm not convince this happen and I'm not sure why. -- Sylvain
Re: NoSQL live tomorrow
does anyone know if there is a plan for nosql seattle anytime soon? Jonathan Ellis wrote: http://nosqlboston.eventbrite.com/ don't know about recording / casting plans. On Wed, Mar 10, 2010 at 3:25 PM, Tim Haines tmhai...@gmail.com wrote: Hey Jonathan, What event is this and will it be livecasted/recorded? Cheers, Tim. On Thu, Mar 11, 2010 at 10:21 AM, Jonathan Ellis jbel...@gmail.com wrote: Ryan King and I will have 20 minutes to talk about Cassandra in the Lab part of the program. 20 minutes isn't enough to present a whole lot in a structured manner so we are planning to just do QA the whole time. So if you are going to be there, come with your questions. I will also bring a few slides about 0.6 / 0.7 features to kick things off in case we have a slow start. -Jonathan
Re: NoSQL live tomorrow
I will be at NoSQL LIve, but I have a client call for most of the lab part. --Original Message-- From: Jonathan Ellis To: cassandra-user@incubator.apache.org ReplyTo: cassandra-user@incubator.apache.org Subject: NoSQL live tomorrow Sent: Mar 10, 2010 21:21 Ryan King and I will have 20 minutes to talk about Cassandra in the Lab part of the program. 20 minutes isn't enough to present a whole lot in a structured manner so we are planning to just do QA the whole time. So if you are going to be there, come with your questions. I will also bring a few slides about 0.6 / 0.7 features to kick things off in case we have a slow start. -Jonathan
Re: Hackathon?!?
We could do it on April 22 (1 week later), that's my birthday :-) What better way to celebrate haha. -Chris On Mar 10, 2010, at 9:58 AM, Jonathan Ellis wrote: I'm in either way, but if we push it a week later then the twitter guys could (a) make it and (b) pimp it at their own conference. On Wed, Mar 10, 2010 at 12:26 AM, Jeff Hodges jhod...@twitter.com wrote: Ah, hell. Thought this was the first day. Can't make it. -- Jeff On Mar 9, 2010 9:32 PM, Ryan King r...@twitter.com wrote: I'm already committed to talking about cassandra that day at our company's developer conference (chirp.twitter.com). -ryan On Tue, Mar 9, 2010 at 6:26 PM, Jeff Hodges jhod...@twitter.com wrote: I'm down. -- Jeff ...
Re: problem with running simple example using cassandra-cli with 0.6.0-beta2
I think he means how the column names are rendered as bytes but the values are strings. On Wed, Mar 10, 2010 at 5:22 PM, Brandon Williams dri...@gmail.com wrote: On Wed, Mar 10, 2010 at 5:09 PM, Bill Au bill.w...@gmail.com wrote: I am checking out 0.6.0-beta2 since I need the batch-mutate function. I am just trying to run the example is the cassandra-cli Wiki: http://wiki.apache.org/cassandra/CassandraCli Here is what I am getting: cassandra set Keyspace1.Standard1['jsmith']['first'] = 'John' Value inserted. cassandra get Keyspace1.Standard1['jsmith'] = (column=6669727374, value=John, timestamp=1268261785077) Returned 1 results. The column name being returned by get (6669727374) does not match what is set (first). This is true for all column names. cassandra set Keyspace1.Standard1['jsmith']['last'] = 'Smith' Value inserted. cassandra set Keyspace1.Standard1['jsmith']['age'] = '42' Value inserted. cassandra get Keyspace1.Standard1['jsmith'] = (column=6c617374, value=Smith, timestamp=1268262480130) = (column=6669727374, value=John, timestamp=1268261785077) = (column=616765, value=42, timestamp=1268262484133) Returned 3 results. Is this a problem in 0.6.0-beta2 or am I doing anything wrong? Bill This is normal. You've added the 'first', 'last', and 'age' columns to the 'jsmith' row, and then asked for the entire row, so you got all 3 columns back. -Brandon
Re: Effective allocation of multiple disks
On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: I would almost recommend just keeping things simple and removing multiple data directories from the config altogether and just documenting that you should plan on using OS level mechanisms for growing diskspace and io. I think that is a pretty sane suggestion actually. -Jonathan
Strategies for storing lexically ordered data in supercolumns
I'm wondering about good strategies for picking keys that I want to be lexically sorted in a super column family. For example, my data looks like this: [user1_uuid][connections][some_key_for_user2] = [user1_uuid][connections][some_key_for_user3] = I was thinking that I wanted some_key_for_user2 to be sorted by a user's name. So I was thinking I set the subcolumn compareWith to UTF8Type or BytesType and construct a key [user's lastname + user's firstname + user's uuid] This would result in sorted subcolumn and user list. That's fine. But I wonder what would happen if, say, a user changes their last name. Happens rarely but I imagine people getting married and modifying their name. Now the sort is no longer correct. There seems to be some bad consequences to creating keys based on data that can change. So what is the general (elegant, easy to maintain) strategy here? Always sort in your server-side code and don't bother trying to have the data sorted? I'm a cassandra noob with all my experience in relational DBMS. TIA Pete