Re: Cassandra users survey
On Mon, 23 Nov 2009 23:30:51 -0500 Matt Revelle mreve...@gmail.com wrote: MR Are you both using timestamps as row keys? Would be great to hear MR more details. I'm using super column keys in a super column. So let's say your resource is routerA. Your data will be: Row routerA SuperColumn Status SuperColumn key T0 (this morning) Columns { status: connected, location: USA, ... } SuperColumn key T1 (T0 + 10 seconds for example) Columns { status: disconnected, location: Europe, ... } SuperColumn key T2 (T1 + 10 seconds for example) Columns { status: connected, ... } // no location specified Then you can say give me the latest super column key (limit = 1, order = reversed, start == end == 0) and you'll get T1. Ted
Re: cassandra over hbase
On Mon, 23 Nov 2009 11:58:08 -0800 Jun Rao jun...@almaden.ibm.com wrote: JR After chatting with some Facebook guys, we realized that one potential JR benefit from using HDFS is that the recovery from losing partial data in a JR node is more efficient. Suppose that one lost a single disk at a node. HDFS JR can quickly rebuild the blocks on the failed disk in parallel. This is a JR bit hard to do in cassandra, since we can't easily find the data on the JR failed disk from another node. This is an architectural issue, right? IIUC Cassandra simply doesn't care about disks. I think that's a plus, actually, because it simplifies the code and filesystems in my experience are better left up to the OS. For instance, we're evaluating Lustre and for many specific reasons it's significantly better for our needs than HDFS, so HDFS would be a tough sell. JR So, when this happens, the whole node probably has to be taken out JR and bootstrapped. The same problem exists when a single sstable file JR is corrupted. I think recovering a single sstable is a useful thing, and it seems like a better problem to solve. Ted
Re: Wish list [from users survey thread]
On Mon, 23 Nov 2009 13:45:09 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE 1. Increment/decrement: atomic is a dirty word in a system JE emphasizing availability, but incr/decr can be provided in an JE eventually consistent manner with vector clocks. There are other JE possible approaches but this is probably the best fit for us. We'd JE want to allow ColumnFamilies with either traditional (for Cassandra) JE long timestamps, or vector clocks, but not mixed. The bad news is, JE this is a very substantial change and will probably not be in 0.9 JE unless someone steps up to do the work. (This would also cover JE flexible conflict resolution, which came up as well.) Just for my benefit, can someone explain the reasons why atomic inc/dec are needed inside Cassandra if 64-bit time stamps and UUIDs are available? I have not needed them in my usage but am curious about other schemas that do. Thanks Ted
Re: Wish list [from users survey thread]
well. I'd like to see how many times a specific user hits the site, without having to add them up every time. On Nov 24, 2009, at 9:47 AM, Ted Zlatanov wrote: On Mon, 23 Nov 2009 13:45:09 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE 1. Increment/decrement: atomic is a dirty word in a system JE emphasizing availability, but incr/decr can be provided in an JE eventually consistent manner with vector clocks. There are other JE possible approaches but this is probably the best fit for us. We'd JE want to allow ColumnFamilies with either traditional (for Cassandra) JE long timestamps, or vector clocks, but not mixed. The bad news is, JE this is a very substantial change and will probably not be in 0.9 JE unless someone steps up to do the work. (This would also cover JE flexible conflict resolution, which came up as well.) Just for my benefit, can someone explain the reasons why atomic inc/dec are needed inside Cassandra if 64-bit time stamps and UUIDs are available? I have not needed them in my usage but am curious about other schemas that do. Thanks Ted -- Ian Holsman i...@holsman.net
Re: Wish list [from users survey thread]
On Mon, Nov 23, 2009 at 1:45 PM, Jonathan Ellis jbel...@gmail.com wrote: 9. Design documentation: also agreed. Chris has started on this (http://wiki.apache.org/cassandra/ArchitectureSSTable) and I will try to at least sketch out some more this week. http://wiki.apache.org/cassandra/ArchitectureInternals
Re: ring state out of sync in build 883477
Looks like this is another symptom of https://issues.apache.org/jira/browse/CASSANDRA-150, which is on track to be fixed soon On Tue, Nov 24, 2009 at 11:19 AM, B. Todd Burruss bburr...@real.com wrote: they all were restarted at various times. for vmguest85 the other three are seed nodes. On Mon, 2009-11-23 at 19:21 -0600, Jonathan Ellis wrote: So vmquest85 was restarted, but gen-app02 hasn't told it that there are 2 other nodes that are down? Which one is the seed node? On Mon, Nov 23, 2009 at 6:38 PM, B. Todd Burruss bburr...@real.com wrote: i'm observing the following on a cluster that started with 4 nodes. i have been killing and restarting the various nodes as i test cassandra and now i'm seeing a lot of NotFoundException exceptions in the client because what i believe is ring state out of sync between the two nodes that are still up and available. The first ring state shown below reflects the current state of the cluster. Also I have seen similar issues when one of the nodes thinks another node is still available when in fact it has been killed. it seems to be related to bringing up, killing nodes too fast and not letting them figure out when a node is dead. in this case i see TimedOutException related to NIO SocketChannel class. thx! [cassandra.883477]$ bin/nodeprobe -host gen-app02.dev.real.com -port 8080 ring Address Status Load Range Ring 144038903974614862325597275257769797985 172.27.128.186Down 22.17 MB 31124469348629903091013930339840898757 |--| 172.27.128.23 Down 22.17 MB 64378740291415296162944450043143967518 | | 172.27.128.22 Up 22.17 MB 121134220722269938669001112695509564769 | | 172.27.128.185Up 14.69 MB 144038903974614862325597275257769797985 |--| [cassandra.883477]$ bin/nodeprobe -host vmguest85.prognet.com -port 8080 ring Address Status Load Range Ring 144038903974614862325597275257769797985 172.27.128.22 Up 22.17 MB 121134220722269938669001112695509564769 |--| 172.27.128.185Up 14.69 MB 144038903974614862325597275257769797985 |--| [cassandra.883477]$
Re: Cassandra access control
Looks like I could use: PAM auth: http://jpam.sourceforge.net/ LDAP/AD auth: http://www.openldap.org/jldap/ The first is definitely OK (Apache license), but I'm not sure about the second one (OpenLDAP public license). Looks BSDish to me. It claims to support Windows auth and is officially provided by the OpenLDAP project. Has anyone used it? Thanks Ted
Re: Cassandra users survey
We at Platform46 are building an on-premise enterprise appliance which provides twitter-like, open-follower, short messaging services for internal corporate networks. We are using Cassandra, Python and RabbitMQ to help us build a scalable solution where appliances may be configured as true peers and installed in different data centers. -- rich http://www.platform46.com/
urgent: missing data!
hi guys i have been using cassandra this version Path: . URL: http://svn.apache.org/repos/asf/incubator/cassandra/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 831540 Node Kind: directory Schedule: normal Last Changed Author: jbellis Last Changed Rev: 831433 Last Changed Date: 2009-10-30 12:45:38 -0700 (Fri, 30 Oct 2009) ./bin/nodeprobe -host localhost info 29814395632524962303611017038378268216 Load : 753.64 MB Generation No: 1259113951 Uptime (seconds) : 443 Heap Memory (MB) : 121.32 / 12285.94 i just have one server. and the config is here http://pastie.org/713889 all of a sudden super columns stored in the column family Supern1 are disappearing. I tried flushing the cassandra node and starting again, and still the same problem. Any suggestions how to figure out what the problem is and to retrieve back the data? The size of the cassandra data directory hasnt reduced. thanks a lot!
Re: Cassandra users survey
I work for Comcast, and we have tons of data that we are migrating into non-relational storage. we recently evaluated cassandra, riak, voldemort, and hdfs. I focused on cassandra, this is why you may have seen me asking dumb questions over IRC :-) A few desirables for cassandra: 1) I'm not a huge fan of thrift. it would be nice if the client jar came packaged with cassandra (I had to manually build it from the thrift-generated java). also, the lack of streaming support is troubling. a lot of our internal services are http, and I'd like to be able to connect a column's input stream to the output stream of an http response, instead of loading it all into memory. 2) a practical/situational view of managing a cassandra cluster (deployment guide, maybe) would be nice. for my evaluation, I was seeking answers to questions like: - how do I add capacity? - how do I remove capacity? (I believe you're calling it decommissioning) - what files should I backup? - how can I mitigate the risk of lost writes during a power failure? - how can I ensure that my writes go to multiple data centers? I think overall the docs are good (I found answers to most of my questions), but since a lot of groups are analyzing cassandra in this fashion, and needing to make a sales pitch to management, ops, etc. -- it would be nice to have a more comprehensive deployment guide. you fellows at Rackspace should consider offering Cassandra support. I know that the ability to have some paid professionals come in and train our ops team on how to monitor + manage a cassandra cluster would have made a huge difference for us. thanks! -matt On Fri, Nov 20, 2009 at 4:17 PM, Jonathan Ellis jbel...@gmail.com wrote: Hi all, I'd love to get a better feel for who is using Cassandra and what kind of applications it is seeing. If you are using Cassandra, could you share what you're using it for and what stage you are at with it (evaluation / testing / production)? Also, what alternatives you evaluated/are evaluating would be useful. Finally, feel free to throw in I'd love to use Cassandra if only it did X wishes. :) I can start: Rackspace is using Cassandra for stats collection (testing, almost production) and as a backend for the Mail Apps division (early testing). We evaluated HBase, Hypertable, dynomite, and Voldemort as well. Thanks, -Jonathan (If you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record.)
Re: urgent: missing data!
On Tue, Nov 24, 2009 at 6:05 PM, kevin kevincastigli...@gmail.com wrote: hi guys i have been using cassandra this version Path: . URL: http://svn.apache.org/repos/asf/incubator/cassandra/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 831540 Node Kind: directory Schedule: normal Last Changed Author: jbellis Last Changed Rev: 831433 Last Changed Date: 2009-10-30 12:45:38 -0700 (Fri, 30 Oct 2009) ./bin/nodeprobe -host localhost info 29814395632524962303611017038378268216 Load : 753.64 MB Generation No: 1259113951 Uptime (seconds) : 443 Heap Memory (MB) : 121.32 / 12285.94 i just have one server. and the config is here http://pastie.org/713889 all of a sudden super columns stored in the column family Supern1 are disappearing. I tried flushing the cassandra node and starting again, and still the same problem. Any suggestions how to figure out what the problem is and to retrieve back the data? The size of the cassandra data directory hasnt reduced. thanks a lot! this is the settings in file cassandra.in.sh http://pastie.org/713922