Re: question about deleting from cassandra

2010-03-11 Thread Mark Robson
On 12 March 2010 03:34, Bill Au wrote: > Let take Twitter as an example. All the tweets are timestamped. I want to > keep only a month's worth of tweets for each user. The number of tweets > that fit within this one month window varies from user to user. What is the > best way to accomplish t

Re: Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-07 Thread Mark Robson
On 7 March 2010 19:05, Jonathan Ellis wrote: > Yes, but I would guess 90% of workloads are better served with > spending the extra money on more machines w/ cheap sata disks and lots > of ram. > I'm not an expert, but I imagine that in many cases, capex is not the limiting factor. In data centre

Re: How to model hierarchical structure?

2010-03-07 Thread Mark Robson
On 6 March 2010 09:23, Hubert Chang wrote: > Like Category, Taxonomy, or folder/file, there will be multiple level > hierarchical relationship. > How to model it in Cassandra? > Serialize all the parent id and the item id together as the key? > How to model it when one child has many parents? >

Re: Use cases for Cassandra

2010-03-01 Thread Mark Robson
On 1 March 2010 13:16, HHB wrote: > > Hey, > What are the typical use cases for Cassandra? > I'm not sure there are *really* typical use cases. Our use case is likely to be big audit data. But Cassandra doesn't yet support a mechanism for efficiently expiring old data; I'm waiting for various

Wiki permission denied

2010-02-24 Thread Mark Robson
Hiya, I'm looking at http://wiki.apache.org/cassandra/RecentChanges And there's an error. Can someone look into it please? Ta Mark

Re: Cassandra to store logs as a list

2010-01-20 Thread Mark Robson
I think you really want to be using the OrderPreservingPartitioner and using time-based keys. It depends exactly how you're querying it. All querying use-cases need to be taken into account when deciding how to structure your data. If you use a time-based key with OPP, typically data become very

Re: Best Practics for Cassandra in Production?

2010-01-14 Thread Mark Robson
2010/1/14 shiv shivaji > I have looked at performance posts in the forum but was wondering if there > are general suggestions for using cassandra in production. > I'd say pretty obvious stuff: - Performance test as much as you can - Choose the following carefully as they're difficult to change:

Re: Cassandra and TTL

2010-01-13 Thread Mark Robson
I also agree: Some mechanism to expire rolling data would be really good if we can incorporate it. Using the existing client interface, deleting old data is very cumbersome. We want to store lots of audit data in Cassandra, this will need to be expired eventually. Nodes should be able to do expir

Re: "easy" interface to Cassandra (was: EasyCassandra.pm Perl interface alpha 0.01)

2010-01-10 Thread Mark Robson
I can't see any reason to make an "easy" Cassandra interface, as the Thrift interface isn't really very difficult. In any case the main problems with Cassandra will be design ones, i.e. figuring out how to use it effectively in your application. No "Easy" library is going to make that easier. Mar

Re: Change Partitioner

2010-01-07 Thread Mark Robson
2010/1/7 JKnight JKnight > Dear all, > > Because we want to traverser all data, we need to change partitioner to > ordered type. > > I want to change Partitioner from > org.apache.cassandra.dht.RandomPartitioner to > org.apache.cassandra.dht.OrderPreservingPartitioner. > > How can I do that? > W

Re: Best config

2010-01-05 Thread Mark Robson
No, I can't. But I'd imagine that Cassandra will be better off with a larger number of nodes. Three seems a bit small to be useful, particularly if your ReplicationFactor is 3, you may as well use some other DB and high availability solution, seeing as everything will be written everywhere you won

Re: async calls in cassandra

2010-01-01 Thread Mark Robson
2009/12/31 Ran Tavory > Does cassandra/thrift support asynchronous IO calls? > asynchronous calls are really a client-side thing. I believe that there is currently no async version of Thrift, but if one did arrive, it could be used with Cassandra. The fact that Cassandra uses one-thread-per-con

Re: Design Pattern - Tag Cloud / Inverted Index

2009-12-27 Thread Mark Robson
2009/12/27 August Zajonc > Looking at the data model a simple solution is two column families, > one containing items as the row-key with tags as columns, and a second > with tags as the row-key with items as columns. This gives me fast > access at the cost of 2x the writes (cheap) and storage (a

Re: hard disk size

2009-12-18 Thread Mark Robson
I had in mind the idea that nodes would store at least 3Tb of native capacity per node. I can't see how I can get cost-effective storage otherwise (machines are quite expensive compared to disc, especially in terms of power which is usually the limiting factor) Probably physically 4-6 drives per s

Re: Images store in Cassandra

2009-12-13 Thread Mark Robson
2009/12/13 Tatu Saloranta > On Sat, Dec 12, 2009 at 3:08 PM, Ryan King wrote: > > On Sat, Dec 12, 2009 at 12:05 PM, Ran Tavory wrote: > >> As we're designing our systems for a move from mysql to Cassandra we're > >> considering moving our file storage to Cassandra as well. Is this wise? > I'm

Re: Why does Cassandra not support some method?

2009-12-10 Thread Mark Robson
2009/12/10 JKnight JKnight > Dear all, > > I wonder why Cassandra do not support the following method: > - multi_insert: insert multi keys > - multi_remove: remove multi keys > - multi_batchInsert: batch insert multi keys > I'm not sure how the 1st and 3rd are different, but my understanding is

Re: Configuring Cassandra

2009-12-09 Thread Mark Robson
2009/12/8 Jonathan Ellis > I wrote http://wiki.apache.org/cassandra/Operations to answer the > other questions in more detail. :) > Jonathan, That is awesomely useful, it answers many questions that I've never quite known the answers to. I hope that decomissioning an entirely failed node becom

Re: Configuring Cassandra

2009-12-08 Thread Mark Robson
I'm not an expert in Cassandra yet but I do know a little, here are some (attempted) answers: 2009/12/8 Rakesh Sharma > b) Apart from nodeprobe and Jconsole is there any other node management > tool? > You can use any JMX application, apparently, to monitor it. This could be done e.g. via a com

Re: Connecting to the cluster with failover (was: data modeling question)

2009-12-07 Thread Mark Robson
2009/12/7 Jonathan Ellis > Gary Dusbabek already did this, only better: > https://issues.apache.org/jira/browse/CASSANDRA-535, > http://issues.apache.org/jira/browse/CASSANDRA-596 > > So is there now support in trunk for a "remote clients api" version of Cassandra, if so, are there any pointers o

Re: Connecting to the cluster with failover (was: data modeling question)

2009-12-07 Thread Mark Robson
2009/12/7 Ramzi Rabah >TSocket socket = new TSocket(hostName, port); >TBinaryProtocol binaryProtocol = new > TBinaryProtocol(socket, false, false); >Cassandra.Client client = new > Cassandra.Client(binaryProtocol); >socket.open(); >

Connecting to the cluster with failover (was: data modeling question)

2009-12-07 Thread Mark Robson
2009/12/3 Coe, Robin > > So, considering that I currently have to take down a node to make a CF > change, I'm wondering how to perform automatic failover from my application? > Is there a mechanism by which I can request from Cassandra all the > destination IP:ports for the nodes in a cluster, s

Re: Cassandra access control

2009-12-02 Thread Mark Robson
How about we make authentication optional, and have the protocol being stateful only if you want to authenticate? That way we don't break backwards compatibility or introduce extra complexity for people who don't need it. Mark

Re: Cassandra access control

2009-12-02 Thread Mark Robson
2009/12/2 Ted Zlatanov > OK. So what should the API be? Just one method, as Robin suggested? > > void login( Map credentials, String keyspace ) > throws AuthenticationException, AuthorizationException > > In this model the backend would still have login() and > setKeyspace()/getKeyspace() sepa

Re: java.util.concurrent.TimeoutException: Operation timed out - received only 0 responses from .

2009-11-29 Thread Mark Robson
2009/11/28 > thanks i dont have more than 1 node ie just one node operation > so dunno if the timeout increase will help > Presumably this is a test system on a vmware; it may have insufficient memory. I'd say make sure that your test virtual machine (vmware etc) has at least 1.5G of ram alloca

Re: Cassandra users survey

2009-11-21 Thread Mark Robson
We are keeping an eye on Cassandra with a view to using it in a large-scale audit data application. Currently I don't think it does quite what we want but I'm still very impressed with what it does do. We're not yet at the stage of really properly evaluating it for production use, but I have had a

Re: [jira] Assigned: (CASSANDRA-293) remove_key_range operation

2009-11-19 Thread Mark Robson
https://issues.apache.org/jira/browse/CASSANDRA-293 > > Project: Cassandra > > Issue Type: New Feature > > Components: Core > >Reporter: Mark Robson > >Assignee: Gary Dusbabek > >Priority: Minor

Re: java.lang.OutOfMemoryError: unable to create new native thread

2009-11-17 Thread Mark Robson
On Tue, Nov 17, 2009 at 5:01 PM, wrote: > > I keep getting the error > > java.lang.OutOfMemoryError: unable to create new native thread > Perhaps it's run out of address-space. You are running a 64-bit OS, right? Mark

Re: Cassandra data distribution and configuration settings

2009-11-17 Thread Mark Robson
2009/11/17 Richard Grossman > Ho do I evaluate the value I need to put here ?? > The second point is that I've many column family each with a different key > then how do I know what is the token to distribute the data ?? > It's not automatic at the moment. If you leave it to make its own token,

Re: Changing Replication Factor?

2009-11-16 Thread Mark Robson
If you only have one node in the cluster just now, would changing the replication factor then bootstrapping the new nodes "Just work" ? Mark

Re: why does remove need a timestamp?

2009-11-09 Thread Mark Robson
2009/11/9 Ramzi Rabah > Hello all: > I am confused about the need of passing a timestamp for the remove > operation. Why does the remove operation in Cassandra require a > timestamp? What happens if I provide a remove call with a different > timestamp than what I inserted, will the row still be

Re: How to store tree structure in Cassandra

2009-10-28 Thread Mark Robson
2009/10/28 Brink > Hi All, > > For a DMS, I want to replace MySQL with Cassandra to store file/folder > nodes. Current I use adjacency list model to stores nodes hierarchy. The > shortage of the adjacency list model is the expensive traversal cost. While > I want to navigate the entire workspace

Re: Questions About Cassandra

2009-10-27 Thread Mark Robson
2009/10/27 Jonathan Ellis > > > We're adding support for deleting ranges of data (similar to the range > granularity you can get with get_slice), including across multiple > rows, in https://issues.apache.org/jira/browse/CASSANDRA-336, but you > can already delete row-at-a-time by specifying only

Re: Custom partitioners

2009-10-10 Thread Mark Robson
2009/10/10 Joe Stump > I've got a guy doing a code test for us and he has some questions about > custom partitioners: > http://gist.github.com/205537 > > Wondering if anyone could chime in. > I'm curious as to why you don't just use the OrderPreservingPartitioner and apply the transformation to

Re: What’s The Best Practice In Designing A Cassan dra Data Model?

2009-10-02 Thread Mark Robson
I'm not sure there are any best practices, but I've replied with some ideas here: http://stackoverflow.com/questions/1502735/whats-the-best-practice-in-designing-a-cassandra-data-model/1512978#1512978 Cheers Mark

Re: cassandra as permanent datastore

2009-10-01 Thread Mark Robson
2009/10/1 Joe Van Dyk > Hi, > > How stupid would it be to use cassandra as a permanent datastore? > > Say I have a service that tracks clicks on ads running on other sites. > I'd need to keep track of who clicked what when and where. And run > reports on it. Cassandra is attractive because of

Re: random n00b question

2009-09-15 Thread Mark Robson
2009/9/15 Chris Goffinet > > Do you really expect a user to open up multiple tabs and start clicking > concurrently? Is the use case for bots? Remember, if you're trying to > capture a user's activity and think they might open up many windows, I > wouldn't be saving that into a session in general

Re: random n00b question

2009-09-15 Thread Mark Robson
2009/9/15 Jonathan Ellis > We don't currently have any optimizations to provide "lightweight" > session consistency (see #132), but if you do quorum reads + quorum > writes then you are guaranteed to read the most recent write which > should be fine for most apps. > Quorum read / write would be

Re: random n00b question

2009-09-15 Thread Mark Robson
2009/9/15 Matt Kydd > We need to persist the sessions and associated shopping baskets / > activity summaries somewhere and Cass seems like a good fit, without > the restrictions imposed by SQL there would be less necessity to purge > old sessions. > Purging the old sessions in Cassandra would be

Re: Search and ACL

2009-09-13 Thread Mark Robson
2009/9/13 > Thank you for your reply. > > So the best way to use Cassandra would be at least behind a firewall. > > In the future is it possible to add a username/password type security in? I > plan to support the project, just as soon as I have some revenue coming in > through my business. > I

Re: Search and ACL

2009-09-13 Thread Mark Robson
> What type of username/password security is there? (for example sharing a > Cassandra db between applications, and isolating their access controls) > > > Also I should point out, that the default startup script for Cassandra also enables the Java debugger and JMX connections from anywhere, both of

Re: Search and ACL

2009-09-13 Thread Mark Robson
2009/9/13 > How exactly does the search work? Is it similar to fulltext searching? > > No. There are two ways you can find stuff - either by exact key (key must be exactly right, it's byte-based), or (if you're using the OrderPreservingPartitioner) a key range scan. In the case of a key range

Re: default OrderPreservingPartitioner changed

2009-08-08 Thread Mark Robson
2009/8/7 Jonathan Ellis > The default OPP now does comparisons based strictly on byte order, and > is no longer collation aware. This is a better default choice for > those who don't need collation since it's much faster. If you do need > collation, the old partitioner is still available as Col

Re: lazy boy example using column family

2009-07-27 Thread Mark Robson
2009/7/27 > i m trying to use cassandra in a mode where everytime i create a new > columnfamily i do not want to restart all the nodes In my opinion you should not be doing that anyway. Because families can have as many columns as you like anyway, it should not normally be necessary to create

Re: one server or more servers?

2009-07-14 Thread Mark Robson
2009/7/14 > *1. If you only have 3 production servers, Cassandra may not do much for > you. You will probably only care if you have lots more servers. 3 servers is > a reasonable minimum for a test / dev environment* > At How many servers does cassandra start really performing? > or how many serv

Re: one server or more servers?

2009-07-14 Thread Mark Robson
2009/7/14 > thanks a lotmakes sense kinda like limewire, shareaza and gnutella > networks > Yes Cassandra does some stuff that is similar to the file-sharing networks, but it is intended for private use not on the public internet. I wouldn't expose an instance to the internet or allow untrusted

Re: one server or more servers?

2009-07-14 Thread Mark Robson
2009/7/14 > But since the other servers join the cluster? > is there a limitation of where reads/writes can go ie., > > reads can go to all servers - seeds+nonseeds? > > writes can go only to seeds? > No, there is not. Reads and writes may go to any node, seed or not. The seeds are ONLY used f

Re: one server or more servers?

2009-07-14 Thread Mark Robson
2009/7/14 > How do we add servers other than Seeds as there is no such place in conf > file Servers other than seeds are automatically picked up by the cluster when they start up; the nodes talk amongst themselves to figure out who's there. Only the seeds need to be explicitly configured. Thi

Re: Scaling from 1 to x (was: one server or more servers?)

2009-07-14 Thread Mark Robson
2009/7/14 Jonathan Ellis > On Tue, Jul 14, 2009 at 8:33 AM, Mark Robson wrote: > > Cassandra doesn't provide the guarantees about the latest changes being > > available from any given node, so you can't really use it in such an > > application. > > > >

Re: Scaling from 1 to x (was: one server or more servers?)

2009-07-14 Thread Mark Robson
2009/7/14 Johan Stuyts > One of the purposes I want to use Cassandra for is custom HTTP session > replication. Instead of storing the values in the session of the servlet > container I want to store them individually using unique keys in Cassandra. > I was hoping Cassandra would be fast enough fo

Re: Scaling from 1 to x (was: one server or more servers?)

2009-07-14 Thread Mark Robson
2009/7/14 Johan Stuyts > Is it unwise to use Cassandra in production if you use less than n servers? > I.e. is it better to use another solution for Cassandra once n is reached? If you are not sure whether N will ever be reached, then you don't need to deploy Cassandra until you reach a point

Re: one server or more servers?

2009-07-14 Thread Mark Robson
2009/7/14 > I have 3 productions servers, is it better to > > A. start the cassandra in one node and add other seeds later > or > B. Start cassandra in all the 3 nodes > > if i do A, when i later add 2 nodes ,will cassandra pick up the other two > nodes and start distributing the loads fairly M

Re: Table Index

2009-07-06 Thread Mark Robson
2009/7/7 Vijay > The reason i am asking is i have multiple columns which a user can query on > like UID, URL, TAGS (all of them are unique) but how can i get to them > without getting stuck with the rowid? coz rowid can be one of those and > the user at any time can know only one Yo

Re: questions about operations

2009-06-04 Thread Mark Robson
> > - what does get_key_range do? It looks like it returns a list of keys, but > why does one have to specify a list of column family names? It returns a list of keys which exist. In my experiments, I think that a key "existing" is defined as having at least one column in one column family that