On 12 March 2010 03:34, Bill Au wrote:
> Let take Twitter as an example. All the tweets are timestamped. I want to
> keep only a month's worth of tweets for each user. The number of tweets
> that fit within this one month window varies from user to user. What is the
> best way to accomplish t
On 7 March 2010 19:05, Jonathan Ellis wrote:
> Yes, but I would guess 90% of workloads are better served with
> spending the extra money on more machines w/ cheap sata disks and lots
> of ram.
>
I'm not an expert, but I imagine that in many cases, capex is not the
limiting factor. In data centre
On 6 March 2010 09:23, Hubert Chang wrote:
> Like Category, Taxonomy, or folder/file, there will be multiple level
> hierarchical relationship.
> How to model it in Cassandra?
> Serialize all the parent id and the item id together as the key?
> How to model it when one child has many parents?
>
On 1 March 2010 13:16, HHB wrote:
>
> Hey,
> What are the typical use cases for Cassandra?
>
I'm not sure there are *really* typical use cases.
Our use case is likely to be big audit data.
But Cassandra doesn't yet support a mechanism for efficiently expiring old
data; I'm waiting for various
Hiya,
I'm looking at
http://wiki.apache.org/cassandra/RecentChanges
And there's an error.
Can someone look into it please?
Ta
Mark
I think you really want to be using the OrderPreservingPartitioner and using
time-based keys.
It depends exactly how you're querying it. All querying use-cases need to be
taken into account when deciding how to structure your data.
If you use a time-based key with OPP, typically data become very
2010/1/14 shiv shivaji
> I have looked at performance posts in the forum but was wondering if there
> are general suggestions for using cassandra in production.
>
I'd say pretty obvious stuff:
- Performance test as much as you can
- Choose the following carefully as they're difficult to change:
I also agree: Some mechanism to expire rolling data would be really good if
we can incorporate it. Using the existing client interface, deleting old
data is very cumbersome.
We want to store lots of audit data in Cassandra, this will need to be
expired eventually.
Nodes should be able to do expir
I can't see any reason to make an "easy" Cassandra interface, as the Thrift
interface isn't really very difficult.
In any case the main problems with Cassandra will be design ones, i.e.
figuring out how to use it effectively in your application. No "Easy"
library is going to make that easier.
Mar
2010/1/7 JKnight JKnight
> Dear all,
>
> Because we want to traverser all data, we need to change partitioner to
> ordered type.
>
> I want to change Partitioner from
> org.apache.cassandra.dht.RandomPartitioner to
> org.apache.cassandra.dht.OrderPreservingPartitioner.
>
> How can I do that?
>
W
No, I can't.
But I'd imagine that Cassandra will be better off with a larger number of
nodes. Three seems a bit small to be useful, particularly if your
ReplicationFactor is 3, you may as well use some other DB and high
availability solution, seeing as everything will be written everywhere you
won
2009/12/31 Ran Tavory
> Does cassandra/thrift support asynchronous IO calls?
>
asynchronous calls are really a client-side thing. I believe that there is
currently no async version of Thrift, but if one did arrive, it could be
used with Cassandra.
The fact that Cassandra uses one-thread-per-con
2009/12/27 August Zajonc
> Looking at the data model a simple solution is two column families,
> one containing items as the row-key with tags as columns, and a second
> with tags as the row-key with items as columns. This gives me fast
> access at the cost of 2x the writes (cheap) and storage (a
I had in mind the idea that nodes would store at least 3Tb of native
capacity per node. I can't see how I can get cost-effective storage
otherwise (machines are quite expensive compared to disc, especially in
terms of power which is usually the limiting factor)
Probably physically 4-6 drives per s
2009/12/13 Tatu Saloranta
> On Sat, Dec 12, 2009 at 3:08 PM, Ryan King wrote:
> > On Sat, Dec 12, 2009 at 12:05 PM, Ran Tavory wrote:
> >> As we're designing our systems for a move from mysql to Cassandra we're
> >> considering moving our file storage to Cassandra as well. Is this wise?
>
I'm
2009/12/10 JKnight JKnight
> Dear all,
>
> I wonder why Cassandra do not support the following method:
> - multi_insert: insert multi keys
> - multi_remove: remove multi keys
> - multi_batchInsert: batch insert multi keys
>
I'm not sure how the 1st and 3rd are different, but my understanding is
2009/12/8 Jonathan Ellis
> I wrote http://wiki.apache.org/cassandra/Operations to answer the
> other questions in more detail. :)
>
Jonathan,
That is awesomely useful, it answers many questions that I've never quite
known the answers to.
I hope that decomissioning an entirely failed node becom
I'm not an expert in Cassandra yet but I do know a little, here are some
(attempted) answers:
2009/12/8 Rakesh Sharma
> b) Apart from nodeprobe and Jconsole is there any other node management
> tool?
>
You can use any JMX application, apparently, to monitor it. This could be
done e.g. via a com
2009/12/7 Jonathan Ellis
> Gary Dusbabek already did this, only better:
> https://issues.apache.org/jira/browse/CASSANDRA-535,
> http://issues.apache.org/jira/browse/CASSANDRA-596
>
>
So is there now support in trunk for a "remote clients api" version of
Cassandra, if so, are there any pointers o
2009/12/7 Ramzi Rabah
>TSocket socket = new TSocket(hostName, port);
>TBinaryProtocol binaryProtocol = new
> TBinaryProtocol(socket, false, false);
>Cassandra.Client client = new
> Cassandra.Client(binaryProtocol);
>socket.open();
>
2009/12/3 Coe, Robin
>
> So, considering that I currently have to take down a node to make a CF
> change, I'm wondering how to perform automatic failover from my application?
> Is there a mechanism by which I can request from Cassandra all the
> destination IP:ports for the nodes in a cluster, s
How about we make authentication optional, and have the protocol being
stateful only if you want to authenticate?
That way we don't break backwards compatibility or introduce extra
complexity for people who don't need it.
Mark
2009/12/2 Ted Zlatanov
> OK. So what should the API be? Just one method, as Robin suggested?
>
> void login( Map credentials, String keyspace )
> throws AuthenticationException, AuthorizationException
>
> In this model the backend would still have login() and
> setKeyspace()/getKeyspace() sepa
2009/11/28
> thanks i dont have more than 1 node ie just one node operation
> so dunno if the timeout increase will help
>
Presumably this is a test system on a vmware; it may have insufficient
memory.
I'd say make sure that your test virtual machine (vmware etc) has at least
1.5G of ram alloca
We are keeping an eye on Cassandra with a view to using it in a large-scale
audit data application. Currently I don't think it does quite what we want
but I'm still very impressed with what it does do.
We're not yet at the stage of really properly evaluating it for production
use, but I have had a
https://issues.apache.org/jira/browse/CASSANDRA-293
> > Project: Cassandra
> > Issue Type: New Feature
> > Components: Core
> >Reporter: Mark Robson
> >Assignee: Gary Dusbabek
> >Priority: Minor
On Tue, Nov 17, 2009 at 5:01 PM, wrote:
> > I keep getting the error
> > java.lang.OutOfMemoryError: unable to create new native thread
>
Perhaps it's run out of address-space. You are running a 64-bit OS, right?
Mark
2009/11/17 Richard Grossman
> Ho do I evaluate the value I need to put here ??
> The second point is that I've many column family each with a different key
> then how do I know what is the token to distribute the data ??
>
It's not automatic at the moment.
If you leave it to make its own token,
If you only have one node in the cluster just now, would changing the
replication factor then bootstrapping the new nodes "Just work" ?
Mark
2009/11/9 Ramzi Rabah
> Hello all:
> I am confused about the need of passing a timestamp for the remove
> operation. Why does the remove operation in Cassandra require a
> timestamp? What happens if I provide a remove call with a different
> timestamp than what I inserted, will the row still be
2009/10/28 Brink
> Hi All,
>
> For a DMS, I want to replace MySQL with Cassandra to store file/folder
> nodes. Current I use adjacency list model to stores nodes hierarchy. The
> shortage of the adjacency list model is the expensive traversal cost. While
> I want to navigate the entire workspace
2009/10/27 Jonathan Ellis
>
>
> We're adding support for deleting ranges of data (similar to the range
> granularity you can get with get_slice), including across multiple
> rows, in https://issues.apache.org/jira/browse/CASSANDRA-336, but you
> can already delete row-at-a-time by specifying only
2009/10/10 Joe Stump
> I've got a guy doing a code test for us and he has some questions about
> custom partitioners:
> http://gist.github.com/205537
>
> Wondering if anyone could chime in.
>
I'm curious as to why you don't just use the OrderPreservingPartitioner and
apply the transformation to
I'm not sure there are any best practices, but I've replied with some ideas
here:
http://stackoverflow.com/questions/1502735/whats-the-best-practice-in-designing-a-cassandra-data-model/1512978#1512978
Cheers
Mark
2009/10/1 Joe Van Dyk
> Hi,
>
> How stupid would it be to use cassandra as a permanent datastore?
>
> Say I have a service that tracks clicks on ads running on other sites.
> I'd need to keep track of who clicked what when and where. And run
> reports on it. Cassandra is attractive because of
2009/9/15 Chris Goffinet
>
> Do you really expect a user to open up multiple tabs and start clicking
> concurrently? Is the use case for bots? Remember, if you're trying to
> capture a user's activity and think they might open up many windows, I
> wouldn't be saving that into a session in general
2009/9/15 Jonathan Ellis
> We don't currently have any optimizations to provide "lightweight"
> session consistency (see #132), but if you do quorum reads + quorum
> writes then you are guaranteed to read the most recent write which
> should be fine for most apps.
>
Quorum read / write would be
2009/9/15 Matt Kydd
> We need to persist the sessions and associated shopping baskets /
> activity summaries somewhere and Cass seems like a good fit, without
> the restrictions imposed by SQL there would be less necessity to purge
> old sessions.
>
Purging the old sessions in Cassandra would be
2009/9/13
> Thank you for your reply.
>
> So the best way to use Cassandra would be at least behind a firewall.
>
> In the future is it possible to add a username/password type security in? I
> plan to support the project, just as soon as I have some revenue coming in
> through my business.
>
I
> What type of username/password security is there? (for example sharing a
> Cassandra db between applications, and isolating their access controls)
>
>
> Also I should point out, that the default startup script for Cassandra also
enables the Java debugger and JMX connections from anywhere, both of
2009/9/13
> How exactly does the search work? Is it similar to fulltext searching?
>
>
No.
There are two ways you can find stuff - either by exact key (key must be
exactly right, it's byte-based), or (if you're using the
OrderPreservingPartitioner) a key range scan.
In the case of a key range
2009/8/7 Jonathan Ellis
> The default OPP now does comparisons based strictly on byte order, and
> is no longer collation aware. This is a better default choice for
> those who don't need collation since it's much faster. If you do need
> collation, the old partitioner is still available as Col
2009/7/27
> i m trying to use cassandra in a mode where everytime i create a new
> columnfamily i do not want to restart all the nodes
In my opinion you should not be doing that anyway.
Because families can have as many columns as you like anyway, it should not
normally be necessary to create
2009/7/14
> *1. If you only have 3 production servers, Cassandra may not do much for
> you. You will probably only care if you have lots more servers. 3 servers is
> a reasonable minimum for a test / dev environment*
> At How many servers does cassandra start really performing?
> or how many serv
2009/7/14
> thanks a lotmakes sense kinda like limewire, shareaza and gnutella
> networks
>
Yes Cassandra does some stuff that is similar to the file-sharing networks,
but it is intended for private use not on the public internet. I wouldn't
expose an instance to the internet or allow untrusted
2009/7/14
> But since the other servers join the cluster?
> is there a limitation of where reads/writes can go ie.,
>
> reads can go to all servers - seeds+nonseeds?
>
> writes can go only to seeds?
>
No, there is not.
Reads and writes may go to any node, seed or not.
The seeds are ONLY used f
2009/7/14
> How do we add servers other than Seeds as there is no such place in conf
> file
Servers other than seeds are automatically picked up by the cluster when
they start up; the nodes talk amongst themselves to figure out who's there.
Only the seeds need to be explicitly configured.
Thi
2009/7/14 Jonathan Ellis
> On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson wrote:
> > Cassandra doesn't provide the guarantees about the latest changes being
> > available from any given node, so you can't really use it in such an
> > application.
> >
> >
2009/7/14 Johan Stuyts
> One of the purposes I want to use Cassandra for is custom HTTP session
> replication. Instead of storing the values in the session of the servlet
> container I want to store them individually using unique keys in Cassandra.
> I was hoping Cassandra would be fast enough fo
2009/7/14 Johan Stuyts
> Is it unwise to use Cassandra in production if you use less than n servers?
> I.e. is it better to use another solution for Cassandra once n is reached?
If you are not sure whether N will ever be reached, then you don't need to
deploy Cassandra until you reach a point
2009/7/14
> I have 3 productions servers, is it better to
>
> A. start the cassandra in one node and add other seeds later
> or
> B. Start cassandra in all the 3 nodes
>
> if i do A, when i later add 2 nodes ,will cassandra pick up the other two
> nodes and start distributing the loads fairly
M
2009/7/7 Vijay
> The reason i am asking is i have multiple columns which a user can query on
> like UID, URL, TAGS (all of them are unique) but how can i get to them
> without getting stuck with the rowid? coz rowid can be one of those and
> the user at any time can know only one
Yo
>
> - what does get_key_range do? It looks like it returns a list of keys, but
> why does one have to specify a list of column family names?
It returns a list of keys which exist.
In my experiments, I think that a key "existing" is defined as having at
least one column in one column family that
53 matches
Mail list logo