Re: cassandra as user-profile data store

2011-03-01 Thread Dave Gardner
Dave Tyler's answer already covers CFs etc.. We are using Cassandra to store user profile data for exactly the sort of use case you describe. We don't yet store _all_ the data in Cassandra; currently we are focusing on the stuff we need available for real-time access. We use Hadoop to analyse

Re: node failure, and automatic decommission (or removetoken)

2011-03-01 Thread Mimi Aluminium
It helps, Thanks a lot, miriam On Mon, Feb 28, 2011 at 9:50 PM, Aaron Morton aa...@thelastpickle.comwrote: I thought there was more to it. The steps for move or removing nodes are outlined on the operations page wiki as you probably know. What approach are you considering to rebalancing

Re: A simple script that creates multi node clusters on a single machine.

2011-03-01 Thread Sylvain Lebresne
I also wrote a script to help dealing with firing up a Cassandra cluster on localhost. It is a bit more extensive in that it deals with how many node you want (though on a local host you won't probably be able to go too crazy) and allow to easily create/start/stop/remove a cluster (and a few more

Re: Question about insert performance in multiple node cluster

2011-03-01 Thread Oleg Anastasyev
Are your test client talks to single node or to both ?

backup strategies

2011-03-01 Thread Sasha Dolgy
Hi Everyone, One of the fears I've always had with DFS and building up these large data stores, like Cassandra, is what to do in the event of an unexpected fault or failure. This happened with gmail yesterday ... http://gmailblog.blogspot.com/2011/02/gmail-back-soon-for-everyone.html ... Is

Re: I: Re: Are row-keys sorted by the compareWith?

2011-03-01 Thread Matthew Dennis
I'm not really familiar with pelops code, but I found two implementations (~ line 454 and ~ line 559) of getColumnsFromRows in Selector.java in pelops trunk. The first uses a HashMap so it clearly isn't ordered, the second uses a LinkedHashMap but it inserts the keys in the order returned by C*

Seed Nodes

2011-03-01 Thread shan.lu
How many seed nodes should I have for a cluster of 100 nodes each with about 500gb of data? Also to add seeds the nodes, must I change the seed nodes list on all existing nodes through the Cassandra.yaml file? Will changes take effect without restarting the node? Shan (Susie) Lu, Analyst

Re: cassandra as user-profile data store

2011-03-01 Thread Dave Viner
Hi Dave, Glad to hear others are using it in this fashion! Are you using Tyler's suggested strategy for user-profile data - one CF that stores the timeline, with rows of user-ids, and TimeUUID columns for each data-collection-time. Then some post-processing with Hadoop over the timelines for

Storing photos, images, docs etc.

2011-03-01 Thread mcasandra
Is it advisable or ok to store photos, images and docs in cassandra where you expect high volume of uploads and views? I was reading about facebook implementation of haystack to store the photos. They don't put anything in their mysql db. Since Cassandra is different from mysql I was wondering

Re: Storing photos, images, docs etc.

2011-03-01 Thread Edward Capriolo
On Tue, Mar 1, 2011 at 1:43 PM, mcasandra mohitanch...@gmail.com wrote: Is it advisable or ok to store photos, images and docs in cassandra where you expect high volume of uploads and views? I was reading about facebook implementation of haystack to store the photos. They don't put anything

how large can a cluster over the WAN be?

2011-03-01 Thread Mimi Aluminium
Hi, Are there clusters of 100 nodes? more? Please can you refer me to such installations/ systems? Can you comment on over-the-WAN clusters in this size or less? and can you point on system with nodes in different DCs connected by WAN ( could be dedicated or internet) ? Thanks a lot, Miriam

Re: backup strategies

2011-03-01 Thread Aaron Morton
Does this help http://wiki.apache.org/cassandra/Operations#Backing_up_data Aaron On 2/03/2011, at 1:07 AM, Sasha Dolgy sdo...@gmail.com wrote: Hi Everyone, One of the fears I've always had with DFS and building up these large data stores, like Cassandra, is what to do in the event of an

Re: Seed Nodes

2011-03-01 Thread Aaron Morton
AFAIK it's recommended to have two seed nodes per dc. Some info on seeds here http://www.datastax.com/docs/0.7/operations/clustering You will need a restart. Aaron On 2/03/2011, at 6:08 AM, shan...@accenture.com wrote: How many seed nodes should I have for a cluster of 100 nodes each with

Re: Storing photos, images, docs etc.

2011-03-01 Thread mcasandra
thanks! If I am reading it correctly it looks like Cassandra is not a good solution for storing phots/images/blobs etc. even though it says it's fixed in version .7. -- View this message in context:

Re: I: Re: Are row-keys sorted by the compareWith?

2011-03-01 Thread Dan Washusen
Pelops moved to github several months ago... https://github.com/s7/scale7-pelops/blob/master/src/main/java/org/scale7/cassandra/pelops/Selector.java#L1179 Cheers, -- Dan Washusen On Wednesday, 2 March 2011 at 3:35 AM, Matthew Dennis wrote: I'm not really familiar with pelops code, but I found

Re: Storing photos, images, docs etc.

2011-03-01 Thread A J
Depends on the specs of your large files. If the files are less than 64MB, there will be no splitting. Cassandra(actually thrift) has no streaming abilities. But if your objects are small (in a few MBs) they would fit in memory easily. I will have lot of binaries less than few MBs in size. I am

Re: limit on rows in a cf

2011-03-01 Thread Shaun Cutts
This isn't quite true, I think. RandomPartitioner uses MD5. So if you had 10^16 rows, you would have a 10^-6 chance of a collision, according to http://en.wikipedia.org/wiki/Birthday_attack ... and apparently MD5 isn't quite balanced, so your actual odds of a collision are worse (though I'm not

Re: Storing photos, images, docs etc.

2011-03-01 Thread mcasandra
Why do we think it's good to have files 64 MB? How did one arrive at this no.? If I understand correctly the problem is with Java Heap space might grow because of the large files. But doesn't it really depend on the concurrent requests * size of the response? What are other options then? Store

Re: Advice on a design

2011-03-01 Thread Burc Sade
' '123458789_136456':'136456' } 't_shared_docs': { //time + id_doc '123456789_123456':'123456' '123458789_136456':'136456' } } } users_docs // all action by users on docs { ‘123_123456’: // id_user + id_doc { 'id_doc':'123456' 'id_user':'123' 'd_readed':'20110301

RE: Column family cannot be removed

2011-03-01 Thread George Ciubotaru
Thank you guys, this solved the issue indeed. George -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: 28 February 2011 19:00 To: user@cassandra.apache.org Cc: George Ciubotaru Subject: Re: Column family cannot be removed drop and truncate both snapshot first,

Re: Seed Nodes

2011-03-01 Thread Eric Gilmore
Yes, two per DC is a recommendation I've heard from Jonathan Ellis. We put that in yet more documentation athttp://www.datastax.com/dev/tutorials/getting_started_0.7/configuring#seed-list(appreciate the citation Aaron :) I had a recent conversation with a Cassandra expert who had me convinced