Dave
Tyler's answer already covers CFs etc..
We are using Cassandra to store user profile data for exactly the sort of
use case you describe. We don't yet store _all_ the data in Cassandra;
currently we are focusing on the stuff we need available for real-time
access. We use Hadoop to analyse
It helps, Thanks a lot,
miriam
On Mon, Feb 28, 2011 at 9:50 PM, Aaron Morton aa...@thelastpickle.comwrote:
I thought there was more to it.
The steps for move or removing nodes are outlined on the operations page
wiki as you probably know.
What approach are you considering to rebalancing
I also wrote a script to help dealing with firing up a Cassandra cluster on
localhost. It is a bit more extensive in that it deals with how many
node you want (though on a local host you won't probably be able to go too
crazy)
and allow to easily create/start/stop/remove a cluster (and a few more
Are your test client talks to single node or to both ?
Hi Everyone,
One of the fears I've always had with DFS and building up these large
data stores, like Cassandra, is what to do in the event of an
unexpected fault or failure. This happened with gmail yesterday ...
http://gmailblog.blogspot.com/2011/02/gmail-back-soon-for-everyone.html
... Is
I'm not really familiar with pelops code, but I found two implementations (~
line 454 and ~ line 559) of getColumnsFromRows in Selector.java in pelops
trunk.
The first uses a HashMap so it clearly isn't ordered, the second uses a
LinkedHashMap but it inserts the keys in the order returned by C*
How many seed nodes should I have for a cluster of 100 nodes each with about
500gb of data? Also to add seeds the nodes, must I change the seed nodes list
on all existing nodes through the Cassandra.yaml file? Will changes take effect
without restarting the node?
Shan (Susie) Lu, Analyst
Hi Dave,
Glad to hear others are using it in this fashion!
Are you using Tyler's suggested strategy for user-profile data - one CF that
stores the timeline, with rows of user-ids, and TimeUUID columns for each
data-collection-time. Then some post-processing with Hadoop over the
timelines for
Is it advisable or ok to store photos, images and docs in cassandra where you
expect high volume of uploads and views?
I was reading about facebook implementation of haystack to store the photos.
They don't put anything in their mysql db.
Since Cassandra is different from mysql I was wondering
On Tue, Mar 1, 2011 at 1:43 PM, mcasandra mohitanch...@gmail.com wrote:
Is it advisable or ok to store photos, images and docs in cassandra where you
expect high volume of uploads and views?
I was reading about facebook implementation of haystack to store the photos.
They don't put anything
Hi,
Are there clusters of 100 nodes? more? Please can you refer me to such
installations/ systems?
Can you comment on over-the-WAN clusters in this size or less? and can you
point on system with nodes in different DCs connected by WAN ( could be
dedicated or internet) ?
Thanks a lot,
Miriam
Does this help http://wiki.apache.org/cassandra/Operations#Backing_up_data
Aaron
On 2/03/2011, at 1:07 AM, Sasha Dolgy sdo...@gmail.com wrote:
Hi Everyone,
One of the fears I've always had with DFS and building up these large
data stores, like Cassandra, is what to do in the event of an
AFAIK it's recommended to have two seed nodes per dc.
Some info on seeds here http://www.datastax.com/docs/0.7/operations/clustering
You will need a restart.
Aaron
On 2/03/2011, at 6:08 AM, shan...@accenture.com wrote:
How many seed nodes should I have for a cluster of 100 nodes each with
thanks! If I am reading it correctly it looks like Cassandra is not a good
solution for storing phots/images/blobs etc. even though it says it's fixed
in version .7.
--
View this message in context:
Pelops moved to github several months ago...
https://github.com/s7/scale7-pelops/blob/master/src/main/java/org/scale7/cassandra/pelops/Selector.java#L1179
Cheers,
--
Dan Washusen
On Wednesday, 2 March 2011 at 3:35 AM, Matthew Dennis wrote:
I'm not really familiar with pelops code, but I found
Depends on the specs of your large files.
If the files are less than 64MB, there will be no splitting.
Cassandra(actually thrift) has no streaming abilities. But if your
objects are small (in a few MBs) they would fit in memory easily.
I will have lot of binaries less than few MBs in size. I am
This isn't quite true, I think. RandomPartitioner uses MD5. So if you had 10^16
rows, you would have a 10^-6 chance of a collision, according to
http://en.wikipedia.org/wiki/Birthday_attack ... and apparently MD5 isn't quite
balanced, so your actual odds of a collision are worse (though I'm not
Why do we think it's good to have files 64 MB? How did one arrive at this
no.?
If I understand correctly the problem is with Java Heap space might grow
because of the large files. But doesn't it really depend on the concurrent
requests * size of the response?
What are other options then? Store
'
'123458789_136456':'136456'
}
't_shared_docs':
{
//time + id_doc
'123456789_123456':'123456'
'123458789_136456':'136456'
}
}
}
users_docs // all action by users on docs
{
‘123_123456’: // id_user + id_doc
{
'id_doc':'123456'
'id_user':'123'
'd_readed':'20110301
Thank you guys, this solved the issue indeed.
George
-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: 28 February 2011 19:00
To: user@cassandra.apache.org
Cc: George Ciubotaru
Subject: Re: Column family cannot be removed
drop and truncate both snapshot first,
Yes, two per DC is a recommendation I've heard from Jonathan Ellis. We put
that in yet more documentation
athttp://www.datastax.com/dev/tutorials/getting_started_0.7/configuring#seed-list(appreciate
the citation Aaron :)
I had a recent conversation with a Cassandra expert who had me convinced
21 matches
Mail list logo