Re: Social network feed/wall question
Hi Ian Thanks for your answer, I'll have a look at redis. I think it should be possible doing this in cassandra if I sort the columns by timeUUID. Kristian On 22 Nov 2009, at 23:11, Ian Holsman wrote: One of the problems you may face is that the common operation is 'get last X'. You might want to look at redis as an alternative as it supports this operation natively. I'm sure the Cassandra experts can help with your schema to optimize it as well --- Sent from my phone Ian Holsman - 703 879-3128 On 23/11/2009, at 9:55 AM, Kristian Lunde kr.lu...@gmail.com wrote: I am currently building a social network application where one of the important features is a feed / wall (Something similar to the Facebook wall). We will have several feeds, one for each profile and one for each group and so on. I have looked into using Cassandra for storing this, but I am not sure I am on the right track regarding my schema. My thoughs was that the schema would be similar to this Feed [SuperColumn] - Row [user id as identifier] [Columns] - type - timestamp - message - url Each user would have his own feed super column and store all feed items related to him in this super column. I am not sure this is the best idea, since it creates an insane amount of writes whenever someone writes to their wall (this will have to write the feed of all his friends). Also I read in this thread http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg00360.html that super columns are not suited for 60k rows in a super column. What would be the optimal way of storing a set of feeds in cassandra? Thanks Kristian
Exception in Cassandra-cli tool
Hello Using the 0.4.2 last stable not from trunk. I get alway in cassandra-cli the following exception: cassandra show keyspaces Exception Cannot read. Remote side has closed. Tried to read 4 bytes, but only got 0 bytes. org.apache.thrift.transport.TTransportException: Cannot read. Remote side has closed. Tried to read 4 bytes, but only got 0 bytes. at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192) at org.apache.cassandra.service.Cassandra$Client.recv_get_string_list_property(Cassandra.java:532) at org.apache.cassandra.service.Cassandra$Client.get_string_list_property(Cassandra.java:517) at org.apache.cassandra.cli.CliClient.executeShowTables(CliClient.java:238) at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:72) at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:103) at org.apache.cassandra.cli.CliMain.main(CliMain.java:143) any idea what's going on ? thanks
Re: Social network feed/wall question
Yes, sort by anything providing time ordering like TimeUUID and then use get_slice with reversed=True to get the most recent. On Mon, Nov 23, 2009 at 4:14 AM, Kristian Lunde kr.lu...@gmail.com wrote: Hi Ian Thanks for your answer, I'll have a look at redis. I think it should be possible doing this in cassandra if I sort the columns by timeUUID. Kristian On 22 Nov 2009, at 23:11, Ian Holsman wrote: One of the problems you may face is that the common operation is 'get last X'. You might want to look at redis as an alternative as it supports this operation natively. I'm sure the Cassandra experts can help with your schema to optimize it as well --- Sent from my phone Ian Holsman - 703 879-3128 On 23/11/2009, at 9:55 AM, Kristian Lunde kr.lu...@gmail.com wrote: I am currently building a social network application where one of the important features is a feed / wall (Something similar to the Facebook wall). We will have several feeds, one for each profile and one for each group and so on. I have looked into using Cassandra for storing this, but I am not sure I am on the right track regarding my schema. My thoughs was that the schema would be similar to this Feed [SuperColumn] - Row [user id as identifier] [Columns] - type - timestamp - message - url Each user would have his own feed super column and store all feed items related to him in this super column. I am not sure this is the best idea, since it creates an insane amount of writes whenever someone writes to their wall (this will have to write the feed of all his friends). Also I read in this thread http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg00360.html that super columns are not suited for 60k rows in a super column. What would be the optimal way of storing a set of feeds in cassandra? Thanks Kristian
Re: Exception in Cassandra-cli tool
Sounds like the server is using framed transport but the cli is not. On Mon, Nov 23, 2009 at 4:58 AM, Richard Grossman richie...@gmail.com wrote: Hello Using the 0.4.2 last stable not from trunk. I get alway in cassandra-cli the following exception: cassandra show keyspaces Exception Cannot read. Remote side has closed. Tried to read 4 bytes, but only got 0 bytes. org.apache.thrift.transport.TTransportException: Cannot read. Remote side has closed. Tried to read 4 bytes, but only got 0 bytes. at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192) at org.apache.cassandra.service.Cassandra$Client.recv_get_string_list_property(Cassandra.java:532) at org.apache.cassandra.service.Cassandra$Client.get_string_list_property(Cassandra.java:517) at org.apache.cassandra.cli.CliClient.executeShowTables(CliClient.java:238) at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:72) at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:103) at org.apache.cassandra.cli.CliMain.main(CliMain.java:143) any idea what's going on ? thanks
Re: Exception in Cassandra-cli tool
if you mean the params ThriftFramedTransportfalse/ThriftFramedTransport all the instance have false here On Mon, Nov 23, 2009 at 3:46 PM, Jonathan Ellis jbel...@gmail.com wrote: Sounds like the server is using framed transport but the cli is not. On Mon, Nov 23, 2009 at 4:58 AM, Richard Grossman richie...@gmail.com wrote: Hello Using the 0.4.2 last stable not from trunk. I get alway in cassandra-cli the following exception: cassandra show keyspaces Exception Cannot read. Remote side has closed. Tried to read 4 bytes, but only got 0 bytes. org.apache.thrift.transport.TTransportException: Cannot read. Remote side has closed. Tried to read 4 bytes, but only got 0 bytes. at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192) at org.apache.cassandra.service.Cassandra$Client.recv_get_string_list_property(Cassandra.java:532) at org.apache.cassandra.service.Cassandra$Client.get_string_list_property(Cassandra.java:517) at org.apache.cassandra.cli.CliClient.executeShowTables(CliClient.java:238) at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:72) at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:103) at org.apache.cassandra.cli.CliMain.main(CliMain.java:143) any idea what's going on ? thanks
Re: Problem with cassdict import
On Sat, 2009-11-21 at 12:20 -0600, Saran wrote: In [2]: from cassdict.cass import * --- ImportError Traceback (most recent call last) /home/sridhar/Desktop/cassdict/ipython console in module() /home/sridhar/Desktop/cassdict/cassdict/cass.py in module() 13 import pdb 14 --- 15 from cassandra.ttypes import Column, SuperColumn, ColumnOrSuperColumn, \ 16 SlicePredicate, SliceRange, ConsistencyLevel, ColumnPath, ColumnParent 17 ImportError: cannot import name ColumnOrSuperColum I've never used cassdict, but this looks to me like you either haven't generated the thrift code for cassandra, or it is not in your PYTHONPATH. -- Eric Evans eev...@rackspace.com
Re: Exception in Cassandra-cli tool
On Mon, 2009-11-23 at 12:58 +0200, Richard Grossman wrote: Using the 0.4.2 last stable not from trunk. I get alway in cassandra-cli the following exception: cassandra show keyspaces Exception Cannot read. Remote side has closed. Tried to read 4 bytes, but only got 0 bytes. org.apache.thrift.transport.TTransportException: Cannot read. Remote side has closed. Tried to read 4 bytes, but only got 0 bytes. at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192) at org.apache.cassandra.service.Cassandra $Client.recv_get_string_list_property(Cassandra.java:532) at org.apache.cassandra.service.Cassandra $Client.get_string_list_property(Cassandra.java:517) at org.apache.cassandra.cli.CliClient.executeShowTables(CliClient.java:238) at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:72) at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:103) at org.apache.cassandra.cli.CliMain.main(CliMain.java:143) any idea what's going on ? Maybe you aren't connected? You can launch the cli with -host and -port options, or you can (re)connect once started with connect host/port. -- Eric Evans eev...@rackspace.com
Re: get_key_range timeouts
Have you tried get_range_slice on trunk instead? get_key_range's design is kind of fundamentally broken, so we're deprecating it in favor of get_range_slice starting in 0.5. (gkr will still be in the 0.5 series, but probably not after that.) On Mon, Nov 23, 2009 at 11:03 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I am trying to use get_key_range for an offline type job over about 18k keys and it keeps timing out. My current setup is 3x4G memory machines with OPP and a replication factor of 2, and an rpctimeout of 180s. To combat this I've actually made the KeyCachedFraction 100 to see if this could help, and no matter how few of the key ranges I want, I am unable to get any response from the system. get_slice's work fine, among all the other commands, I just can't seem to get get_key_range working. My current workload has few keys and lots of columns. Any advice to debug this would be appreciated. -- Dan Di Spaltro
Re: get_key_range timeouts
I haven't, is this common though? Seems like this is a pretty bad to have key_ranges time out like this... but you don't think this would happen in get_range_slice? Best, On Mon, Nov 23, 2009 at 9:07 AM, Jonathan Ellis jbel...@gmail.com wrote: Have you tried get_range_slice on trunk instead? get_key_range's design is kind of fundamentally broken, so we're deprecating it in favor of get_range_slice starting in 0.5. (gkr will still be in the 0.5 series, but probably not after that.) On Mon, Nov 23, 2009 at 11:03 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I am trying to use get_key_range for an offline type job over about 18k keys and it keeps timing out. My current setup is 3x4G memory machines with OPP and a replication factor of 2, and an rpctimeout of 180s. To combat this I've actually made the KeyCachedFraction 100 to see if this could help, and no matter how few of the key ranges I want, I am unable to get any response from the system. get_slice's work fine, among all the other commands, I just can't seem to get get_key_range working. My current workload has few keys and lots of columns. Any advice to debug this would be appreciated. -- Dan Di Spaltro -- Dan Di Spaltro
Re: Cassandra users survey
On Fri, 20 Nov 2009 17:38:39 -0800 Dan Di Spaltro dan.dispal...@gmail.com wrote: DDS At Cloudkick we are using Cassandra to store monitoring statistics and DDS running analytics over the data. I would love to share some ideas DDS about how we set up our data-model, if anyone is interested. This DDS isn't the right thread to do it in, but I think it would be useful to DDS show how we store billions of points of data in Cassandra (and maybe DDS get some feedback). I'd like to see that. My Cassandra use is also for monitoring and so far it has been great. I store status updates in a SuperColumn indexed by date and each row represents a unique resource. It's really simple compared to your setup, I'm sure. Ted
Re: get_key_range timeouts
On Mon, Nov 23, 2009 at 11:17 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I haven't, is this common though? Short answer? Yes. :) On Mon, Nov 23, 2009 at 9:07 AM, Jonathan Ellis jbel...@gmail.com wrote: Have you tried get_range_slice on trunk instead? get_key_range's design is kind of fundamentally broken, so we're deprecating it in favor of get_range_slice starting in 0.5. (gkr will still be in the 0.5 series, but probably not after that.) On Mon, Nov 23, 2009 at 11:03 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I am trying to use get_key_range for an offline type job over about 18k keys and it keeps timing out. My current setup is 3x4G memory machines with OPP and a replication factor of 2, and an rpctimeout of 180s. To combat this I've actually made the KeyCachedFraction 100 to see if this could help, and no matter how few of the key ranges I want, I am unable to get any response from the system. get_slice's work fine, among all the other commands, I just can't seem to get get_key_range working. My current workload has few keys and lots of columns. Any advice to debug this would be appreciated. -- Dan Di Spaltro -- Dan Di Spaltro
Re: Cassandra access control
sysauth says it is GPL v2 (also not compatible) 2009/11/23 Ted Zlatanov t...@lifelogs.com: On Fri, 20 Nov 2009 15:22:07 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE Kasai is LGPL, and thus not compatible w/ Cassandra. (See JE http://www.apache.org/legal/3party.html) How annoying, it was exactly what I needed. I hate reinventing the wheel. I can at least use SysAuth (www.scribblin.gs/software/sysauth.html), I think. Ted
Re: java.util.concurrent.TimeoutException: Operation timed out - received only 0 responses from .
this error keeps coming up... any ideas on how to avoid this error? On Wed, Nov 18, 2009 at 11:25 PM, mobiledream...@gmail.com wrote: Can you plase tell what this error is ? ERROR - error writing key ruske *java.util.concurrent.TimeoutException: Operation timed out - received only 0 responses from .* at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88) at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:164) at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:468) at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:448) at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:854) at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Errors -- Bidegg worlds best auction site http://bidegg.com
Re: cassandra over hbase
On Mon, Nov 23, 2009 at 10:02 AM, Eric Evans eev...@rackspace.com wrote: What would people on the list say are the primary reasons to use Cassandra over HBase? HA and speed are very important for my application. HBase's tighter integration with Hadoop and therefore easier reporting and analytics using M/R appeals to me, but I intuitively prefer the Cassandra community and generally like the architectural approach. HBase's Hadoop foundations also strike me as both an advantage and a disadvantage, as it seems to tie their hands a bit. For myself it would be: * The flexibility to choose between consistency and availability. * No single points of failure, (every node is identical). * Linear scalability (i.e 20 nodes gives you 2x what 10 does, etc). I would add that every node is identical is a huge win in monitoring and troubleshooting as well. Other reasons to prefer Cassandra include clusters spanning multiple data centers, and at the API level, Cassandra provides row slicing and customizable CompareWith. -Jonathan
Re: Cassandra users survey
Thanks for the replies, everyone. A couple people have suggested that we put this on the wiki to replace the old, never-updated PoweredBy page (maybe as UsersSurvey09). I'll pull the _public_ responses only into a wiki page later this week; if you replied to the list but don't want to be on the page, let me know in the next couple days and I'll leave you off. And of course since it is a publicly editable wiki, you can always remove yourself later too. -Jonathan
Re: Cassandra access control
On Mon, 23 Nov 2009 12:22:37 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE sysauth says it is GPL v2 (also not compatible) Hmm. I guess I have to reimplement SysAuth. At least the code is not terribly complicated, but it's a shame to reinvent the cart and the wheel. Ted
Cassandra Database Format Compatibility
Hello Everyone, Will the Cassandra database format for the current Cassandra source trunk be compatible with the 0.5 Cassandra release? If there are database version differences, is there a migration path to convert older data formats to the new versions? Is there an estimated release date for the 0.5 release? Thanks, Jon
Re: Cassandra Database Format Compatibility
On Mon, Nov 23, 2009 at 4:47 PM, Jon Graham sjclou...@gmail.com wrote: Hello Everyone, Will the Cassandra database format for the current Cassandra source trunk be compatible with the 0.5 Cassandra release? Yes. Only the commitlog format changes between 0.4 and 0.5, and that isn't a problem since you can just nodeprobe flush to get around it. If there are database version differences, is there a migration path to convert older data formats to the new versions? In the future if the format changes, there will be a migration path. Is there an estimated release date for the 0.5 release? Beta should happen fairly soon -- it's up for a vote in the IPMC right now. -Brandon
ring state out of sync in build 883477
i'm observing the following on a cluster that started with 4 nodes. i have been killing and restarting the various nodes as i test cassandra and now i'm seeing a lot of NotFoundException exceptions in the client because what i believe is ring state out of sync between the two nodes that are still up and available. The first ring state shown below reflects the current state of the cluster. Also I have seen similar issues when one of the nodes thinks another node is still available when in fact it has been killed. it seems to be related to bringing up, killing nodes too fast and not letting them figure out when a node is dead. in this case i see TimedOutException related to NIO SocketChannel class. thx! [cassandra.883477]$ bin/nodeprobe -host gen-app02.dev.real.com -port 8080 ring Address Status Load Range Ring 144038903974614862325597275257769797985 172.27.128.186Down 22.17 MB 31124469348629903091013930339840898757 |--| 172.27.128.23 Down 22.17 MB 64378740291415296162944450043143967518 | | 172.27.128.22 Up 22.17 MB 121134220722269938669001112695509564769| | 172.27.128.185Up 14.69 MB 144038903974614862325597275257769797985|--| [cassandra.883477]$ bin/nodeprobe -host vmguest85.prognet.com -port 8080 ring Address Status Load Range Ring 144038903974614862325597275257769797985 172.27.128.22 Up 22.17 MB 121134220722269938669001112695509564769|--| 172.27.128.185Up 14.69 MB 144038903974614862325597275257769797985|--| [cassandra.883477]$
Re: ring state out of sync in build 883477
So vmquest85 was restarted, but gen-app02 hasn't told it that there are 2 other nodes that are down? Which one is the seed node? On Mon, Nov 23, 2009 at 6:38 PM, B. Todd Burruss bburr...@real.com wrote: i'm observing the following on a cluster that started with 4 nodes. i have been killing and restarting the various nodes as i test cassandra and now i'm seeing a lot of NotFoundException exceptions in the client because what i believe is ring state out of sync between the two nodes that are still up and available. The first ring state shown below reflects the current state of the cluster. Also I have seen similar issues when one of the nodes thinks another node is still available when in fact it has been killed. it seems to be related to bringing up, killing nodes too fast and not letting them figure out when a node is dead. in this case i see TimedOutException related to NIO SocketChannel class. thx! [cassandra.883477]$ bin/nodeprobe -host gen-app02.dev.real.com -port 8080 ring Address Status Load Range Ring 144038903974614862325597275257769797985 172.27.128.186Down 22.17 MB 31124469348629903091013930339840898757 |--| 172.27.128.23 Down 22.17 MB 64378740291415296162944450043143967518 | | 172.27.128.22 Up 22.17 MB 121134220722269938669001112695509564769 | | 172.27.128.185Up 14.69 MB 144038903974614862325597275257769797985 |--| [cassandra.883477]$ bin/nodeprobe -host vmguest85.prognet.com -port 8080 ring Address Status Load Range Ring 144038903974614862325597275257769797985 172.27.128.22 Up 22.17 MB 121134220722269938669001112695509564769 |--| 172.27.128.185Up 14.69 MB 144038903974614862325597275257769797985 |--| [cassandra.883477]$
Re: Cassandra Database Format Compatibility
On Mon, Nov 23, 2009 at 3:27 PM, Brandon Williams dri...@gmail.com wrote: On Mon, Nov 23, 2009 at 4:47 PM, Jon Graham sjclou...@gmail.com wrote: Is there an estimated release date for the 0.5 release? Beta should happen fairly soon -- it's up for a vote in the IPMC right now. what does ipmc mean?
Re: Cassandra users survey
On Nov 23, 2009, at 12:27, Ted Zlatanov t...@lifelogs.com wrote: On Fri, 20 Nov 2009 17:38:39 -0800 Dan Di Spaltro dan.dispal...@gmail.com wrote: DDS At Cloudkick we are using Cassandra to store monitoring statistics and DDS running analytics over the data. I would love to share some ideas DDS about how we set up our data-model, if anyone is interested. This DDS isn't the right thread to do it in, but I think it would be useful to DDS show how we store billions of points of data in Cassandra (and maybe DDS get some feedback). I'd like to see that. My Cassandra use is also for monitoring and so far it has been great. I store status updates in a SuperColumn indexed by date and each row represents a unique resource. It's really simple compared to your setup, I'm sure. Ted Hi Dan and Ted, Are you both using timestamps as row keys? Would be great to hear more details. -Matt