Re: Exceptions on 0.7.0

2011-02-23 Thread Stu Hood
I expect that this problem was due to https://issues.apache.org/jira/browse/CASSANDRA-2216 : I'll make noise to try and get it released soon as 0.7.3 On Tue, Feb 22, 2011 at 5:41 AM, David Boxenhorn da...@lookin2.com wrote: Thanks, Shimi. I'll keep you posted if we make progress. Riptano is

Re: How scalable are automatic secondary indexes in Cassandra 0.7?

2011-02-23 Thread Stu Hood
In practice, local secondary indexes scale to {RF * the limit of a single machine} for -low cardinality- values (ex: users living in a certain state) since the first node is likely to be able to answer your question. This also means they are good for performing filtering for analytics. On the

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Aaron Morton
In the case described below if less than CL nodes respond in rpc_timeout (from conf yaml) the client will get a timeout error. I think most higher level clients will automatically retry in this case. If there are not enough nodes to start the request you will get an Unavailable exception.

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
Thanks Dave but I am able to telnet to other instances on port 7000 and when i run ./nodetool --host ec2-50-18-60-117.us-west-1.compute.amazonaws.com ring... I can see only one node. Do we need to configure anything else in Cassandra.yaml or Cassandra-env.sh ??? From: Dave Viner

Re: Reads and memory usage clarification

2011-02-23 Thread Matthew Dennis
Data is in Memtables from writes before they get flushed (based on first threshold of ops/size/time exceeded; all are configurable) to SSTables on disk. There is a keycache and a rowcache. The keycache caches offsets into SSTables for the rows. the rowcache caches the entire row. There is also

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Sasha Dolgy
did you define the other host in the cassandra.yaml ? on both servers they need to know about each other On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma himanshi.sha...@tcs.comwrote: Thanks Dave but I am able to telnet to other instances on port 7000 and when i run ./nodetool --host

dedication to read write

2011-02-23 Thread Sasha Dolgy
Hi , Is there benefit to delegate some nodes specifically for read operations, and others specifically for write? When designing a web app, I could create a connection pool for reads and one for writes ... or is this me falling back to my rdbms way of thinking? -sd -- Sasha Dolgy

Re: Does Cassandra use vector clocks

2011-02-23 Thread Oleg Anastasyev
Basically: vector clocks tell you there was a conflict, but not how to resolve it (that is, you simply don't have enough information to resolve it even if you push that back to the client a la Dynamo). What dynamo-like systems mostly VC for is the trivial case of client X updated field 1,

Re: dedication to read write

2011-02-23 Thread Aaron Morton
Not necessary.All the nodes have the same function and the same access to data.AaronOn 23 Feb, 2011,at 10:27 PM, Sasha Dolgy sdo...@gmail.com wrote:Hi , Is there benefit to delegate some nodes specifically for read operations, and others specifically for write? When designing a web app, I could

Re: I: Re: Are row-keys sorted by the compareWith?

2011-02-23 Thread Matthew Dennis
The map returned by multiget_slice (what I suspect is the underlying thrift call for getColumnsFromRows) is not a order preserving map, it's a HashMap so the order of the returned results cannot be depended on. Even if it was a order preserving map, not all languages would be able to make use of

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
Ya they do. Have specified Public DNS in seed field of each node in Cassandra.yaml...nt able to figure out what the problem is ??? From: Sasha Dolgy sdo...@gmail.com To: user@cassandra.apache.org Date: 02/23/2011 02:56 PM Subject: Re: Cassandra nodes on EC2 in two different regions not

Re: Reads and memory usage clarification

2011-02-23 Thread Viktor Jevdokimov
Everything as I thought, thank you! 2011/2/23 Matthew Dennis mden...@datastax.com Data is in Memtables from writes before they get flushed (based on first threshold of ops/size/time exceeded; all are configurable) to SSTables on disk. There is a keycache and a rowcache. The keycache caches

Re: Is it possible to get list of row keys?

2011-02-23 Thread Ching-Cheng Chen
You can use the setRowCount() method to specify how many keys to return per call. By default is 100. Beware don't set it too high since it might cause OOM. And underline code will pre-allocate an array list with size you speify in setRowCount(). So you might get a OOM if you used something

Re: Is it possible to get list of row keys?

2011-02-23 Thread Sasha Dolgy
What if i want 20 rows and the next 20 rows in a subsequent query? can this only be achieved with OPP? -- Sasha Dolgy sasha.do...@gmail.com On 23 Feb 2011 13:54, Ching-Cheng Chen cc...@evidentsoftware.com wrote:

Re: Is it possible to get list of row keys?

2011-02-23 Thread Norman Maurer
query per ranges is only possible with OPP or BPP. Bye, Norman 2011/2/23 Sasha Dolgy sdo...@gmail.com: What if i want 20 rows and the next 20 rows in a subsequent query?  can this only be achieved with OPP? -- Sasha Dolgy sasha.do...@gmail.com On 23 Feb 2011 13:54, Ching-Cheng Chen

Re: Is it possible to get list of row keys?

2011-02-23 Thread Ching-Cheng Chen
Actually, if you want to get ALL keys, I believe you can still use RangeSliceQuery with RP. Just use setKeys(,) as first batch call. Then use the last key from previous batch as startKey for next batch. Beware that since startKey is inclusive, so you'd need to ignore first key from now on. Keep

Re: Is it possible to get list of row keys?

2011-02-23 Thread Roshan Dawrani
On Wed, Feb 23, 2011 at 7:17 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Actually, if you want to get ALL keys, I believe you can still use RangeSliceQuery with RP. Just use setKeys(,) as first batch call. Then use the last key from previous batch as startKey for next batch.

Re: Is it possible to get list of row keys?

2011-02-23 Thread Norman Maurer
yes but be aware that the keys will not in the right order. Bye, Norman 2011/2/23 Roshan Dawrani roshandawr...@gmail.com: On Wed, Feb 23, 2011 at 7:17 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Actually, if you want to get ALL keys, I believe you can still use RangeSliceQuery

Re: Does Cassandra use vector clocks

2011-02-23 Thread Jonathan Ellis
On Wed, Feb 23, 2011 at 3:32 AM, Oleg Anastasyev olega...@gmail.com wrote: Basically: vector clocks tell you there was a conflict, but not how to resolve it (that is, you simply don't have enough information to resolve it even if you push that back to the client a la Dynamo). What dynamo-like

Re: Is it possible to get list of row keys?

2011-02-23 Thread Roshan Dawrani
Yes. But I don't think the retrieving keys in the right order was part of the original question. :-) On Wed, Feb 23, 2011 at 7:50 PM, Norman Maurer nor...@apache.org wrote: yes but be aware that the keys will not in the right order. Bye, Norman 2011/2/23 Roshan Dawrani

Re: Is it possible to get list of row keys?

2011-02-23 Thread Daniel Lundin
They are, however, in *stable* order, which is important. On Wed, Feb 23, 2011 at 3:20 PM, Norman Maurer nor...@apache.org wrote: yes but be aware that the keys will not in the right order. Bye, Norman 2011/2/23 Roshan Dawrani roshandawr...@gmail.com: On Wed, Feb 23, 2011 at 7:17 PM,

Re: Does Cassandra use vector clocks

2011-02-23 Thread Oleg Anastasyev
From the article I linked: But wait, some might say, you can avoid all this by using vectors in a different way – to prevent update conflicts by issuing conditional writes which specify a version (vector) and only succeed if that version is still current. Sorry, but no, or at least not

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Dave Viner
Try using the IP address, not the dns name in the cassandra.yaml. If you can telnet from one to the other on port 7000, and both nodes have the other node in their config, it should work. Dave Viner On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma himanshi.sha...@tcs.comwrote: Ya they do.

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Frank LoVecchio
The internal Amazon IP address is what you will want to use so you don't have to go through DNS anyways; not sure if this works from US-East to US-West, but it does make things quicker in between zones, e.g. us-east-1a to us-east-1b. On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner davevi...@gmail.com

Splitting a single row into multiple

2011-02-23 Thread Aditya Narayan
Does it make any difference if I split a row, that needs to be accessed together, into two or three rows and then read those multiple rows ?? (Assume the keys of all the three rows are known to me programatically since I split columns by certain categories). Would the performance be any better if

Migrate from 0.6.5 to 0.7.2

2011-02-23 Thread Zhong Li
Hi all, We want migrate from version 0.6.5 to version 0.7.2. Is there step by step guide or document we can follow? Also there is new branch cassandra-0.7.2 on svn, what is purpose to create the new branch instead of one branch cassandra-0.7? Will you maintain both branches? Thanks,

Re: Does Cassandra use vector clocks

2011-02-23 Thread Jonathan Ellis
On Wed, Feb 23, 2011 at 9:57 AM, Oleg Anastasyev olega...@gmail.com wrote: From the other hand, the same article says: For conditional writes to work, the condition must be evaluated at all update sites before the write can be allowed to succeed. This means, that when doing such an update

Re: Migrate from 0.6.5 to 0.7.2

2011-02-23 Thread Jonathan Ellis
On Wed, Feb 23, 2011 at 11:18 AM, Zhong Li z...@voxeo.com wrote: Hi all, We want migrate from version 0.6.5 to version 0.7.2. Is there step by step  guide or document  we can follow? NEWS.txt Also there is new branch cassandra-0.7.2 on svn, what is purpose to create the new branch instead

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Peter Fales
I posted on this topic last September. (See http://www.mail-archive.com/user@cassandra.apache.org/msg05692.html) I was able to use Cassandra across EC2regions. However, the trick is that you have must use the external addresses in your storage-conf.xml, but since you don't have a NIC that

Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
What's the best way to bring multiple seeds up, should only one of them have auto bootstrap set to true or should neither of them? Should they list themselves and the other seed in their seed section in the yaml config? ___ This e-mail may contain

Re: Multiple Seeds

2011-02-23 Thread Eric Gilmore
The DataStax documentation offers some answers to those questions in the Getting Startedhttp://www.datastax.com/dev/tutorials/getting_started_0.7/configuring#adding-nodes-to-a-cassandra-clustersection and the Clusteringhttp://www.datastax.com/docs/0.7/operations/clustering#adding-capacityreference

RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
Yeah I set the tokens, I'm more asking if I start the first seed node with autobootstrap set to false the second seed should have it set to true as well as all the slave nodes correct? I didn't see this in the docs but I may have just missed it. From: Eric Gilmore [mailto:e...@datastax.com]

Re: Multiple Seeds

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 2:30 PM, jeremy.truel...@barclayscapital.com wrote: Yeah I set the tokens, I’m more asking if I start the first seed node with autobootstrap set to false the second seed should have it set to true as well as all the slave nodes correct? I didn’t see this in the docs but

RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
So all seeds should always be set to 'auto_bootstrap: false' in their .yaml file. -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Wednesday, February 23, 2011 2:36 PM To: user@cassandra.apache.org Cc: Truelove, Jeremy: IT (NYK) Subject: Re: Multiple Seeds

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Aaron Morton
At CL levels high than ANY hinted handoff will be used if enabled. It does not contribute to the number of replicas considered written by the coordinator though. E.g. If you ask for quorum, and this is 3 nodes, and only 2 are up the write will fail without starting. In this case the HH is

Re: Multiple Seeds

2011-02-23 Thread Eric Gilmore
Well -- when you first bring a node into a ring, you will probably want to stream data to it with auto_bootstrap: true. If you want that node to be a seed, then add it to the seeds list AFTER it has joined the ring. I'd refer you to the Seed List and Autoboostrapping sections of the Getting

Re: Splitting a single row into multiple

2011-02-23 Thread Aaron Morton
AFAIK performance in the single row case will better. Multi get may require multiple seeks and reads in an sstable,, verses obviously a single seek and read for a single row. Multiplied by the number of sstables that contain row data. Using the key cache would reduce the the seeks. If it

RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
To add a host to the seeds list after it has had the data streamed to it I need to 1. stop it 2. edit the yaml file to a. include it in the seeds list b. set auto boostrap to false 3.restart it correct? Additionally you would need to add it to the other nodes

Re: Multiple Seeds

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 2:59 PM, jeremy.truel...@barclayscapital.com wrote: To add a host to the seeds list after it has had the data streamed to it I need to 1.   stop it 2.   edit the yaml file to a.   include it in the seeds list b.  set auto boostrap to false 3.  

RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
Also should non-seed host be perpetually set to auto_bootstrap: true ? From: Truelove, Jeremy: IT (NYK) Sent: Wednesday, February 23, 2011 3:00 PM To: user@cassandra.apache.org Subject: RE: Multiple Seeds To add a host to the seeds list after it has had the data streamed to it I need to 1.

RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
So does cassandra monitor the config file for changes? If it doesn't how else would it know unless you restart you had added a new seed? -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Wednesday, February 23, 2011 3:23 PM To: user@cassandra.apache.org Cc:

Is Cassandra suitable for my problem?

2011-02-23 Thread Alexandru Dan Sicoe
Hello, I'm currently doing my masters project. I need to store lots of time series data of any type (String, int, booleans, arrays of the previous) with a high writing rate(20MBytes/sec - 170TBytes/year - note not running continuously) but less strict read requirements. This is monitoring data

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Dave Revell
Ritesh, You have seen the problem. Clients may read the newly written value even though the client performing the write saw it as a failure. When the client reads, it will use the correct number of replicas for the chosen CL, then return the newest value seen at any replica. This newest value

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
Read repair will probably occur at that point (depending on your config), which would cause the newest value to propagate to more replicas. Is the newest value the quorum value which means it is the old value that will be written back to the nodes having newer non-quorum value or the newest

Re: Is Cassandra suitable for my problem?

2011-02-23 Thread Ritesh Tijoriwala
Hi Alexandru, I feel Cassandra can certainly be used to solve the problem you have but if your requires are not very strict, you need very high throughput and its okay for you to lose some data occasionally due to machine crash, then I recommend you look at Redis (http://redis.io/). It is a high

Re: I: Re: Are row-keys sorted by the compareWith?

2011-02-23 Thread Dan Washusen
Hi Matthew, As you mention the map returned from multiget_slice is not order preserving, Pelops is doing this on the client side... Cheers, Dan -- Dan Washusen Sent with Sparrow On Wednesday, 23 February 2011 at 8:38 PM, Matthew Dennis wrote: The map returned by multiget_slice (what I

Will the large datafile size affect the performance?

2011-02-23 Thread buddhasystem
I know that theoretically it should not (apart from compaction issues), but maybe somebody has experience showing otherwise: My test cluster now has 250GB of data and will have 1.5TB in its reincarnation. If all these data is in a single CF -- will it cause read or write performance problems?

Can I count on Super Column Families why planing 3 years out?

2011-02-23 Thread buddhasystem
There was a discussion here on how well (or not so well) the Super CFs are supported. I now need to make a strategic decision as to how I plan my data. What's the consensus -- will the super CF be there 3 years out? TIA Maxim -- View this message in context:

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Anthony John
Seems to me that the explanations are getting incredibly complicated - while I submit the real issue is not! Salient points here:- 1. To be guaranteed data consistency - the writes and reads have to be at Quorum CL or more 2. Any W/R at lesser CL means that the application has to handle the

Re: Multiple Seeds

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 3:28 PM, jeremy.truel...@barclayscapital.com wrote: So does cassandra monitor the config file for changes? If it doesn't how else would it know unless you restart you had added a new seed? -Original Message- From: Edward Capriolo

Re: Does Cassandra use vector clocks

2011-02-23 Thread Oleg Anastastasyev
Jonathan Ellis jbellis at gmail.com writes: IMO if you only get CL.ALL it's not superior enough to pessimistic locking to justify the complexity of adding it. Yes, may be youre right, but CL.ALL is neccessary only to solve this problem in a generic way. In some (most?) cases, conflicts

Re: Will the large datafile size affect the performance?

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 4:51 PM, buddhasystem potek...@bnl.gov wrote: I know that theoretically it should not (apart from compaction issues), but maybe somebody has experience showing otherwise: My test cluster now has 250GB of data and will have 1.5TB in its reincarnation. If all these data

map reduce job over indexed range of keys

2011-02-23 Thread Matt Kennedy
Let me start out by saying that I think I'm going to have to write a patch to get what I want, but I'm fine with that. I just wanted to check here first to make sure that I'm not missing something obvious. I'd like to be able to run a MapReduce job that takes a value in an indexed column as a

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
hi Anthony, While you stated the facts right, I don't see how it relates to the question I ask. Can you elaborate specifically what happens in the case I mentioned above to Dave? thanks, Ritesh On Wed, Feb 23, 2011 at 1:57 PM, Anthony John chirayit...@gmail.com wrote: Seems to me that the

Understanding Indexes

2011-02-23 Thread mcasandra
So far my understanding about indexes is that you can create indexes only on column values (username in below eg). Does it make sense to also have index on the keys that columnFamily uses to store rows (row keys abc in below example). I am thinking in an event rows keep growing would search be

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Anthony John
Ritesh, At CL ANY - if all endpoints are down - a HH is written. And it is a successful write - not a failed write. Now that does not guarantee a READ of the value just written - but that is a risk that you take when you use the ANY CL! HTH, -JA On Wed, Feb 23, 2011 at 4:40 PM, Ritesh

cassandra as user-profile data store

2011-02-23 Thread Dave Viner
Hi all, I'm wondering if anyone has used cassandra as a datastore for a user-profile service. I'm thinking of applications like behavioral targeting, where there are lots lots of users (10s to 100s of millions), and lots lots of data about them intermixed in, say, weblogs (probably TBs worth).

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
Hi Anthony, I am not talking about the case of CL ANY. I am talking about the case where your consistency level is R + W N and you want to write to W nodes but only succeed in writing to X ( where X W) nodes and hence fail the write to the client. thanks, Ritesh On Wed, Feb 23, 2011 at 2:48

How come key cache increases speed by x4?

2011-02-23 Thread buddhasystem
Well I know the cache is there for a reason, I just can't explain the factor of 4 when I run my queries on a hot vs cold cache. My queries are actually a chain of one on an inverted index, which produces a tuple of keys to be used in the main query. The inverted index query should be downright

Re: How come key cache increases speed by x4?

2011-02-23 Thread Robert Coli
On Wed, Feb 23, 2011 at 4:04 PM, buddhasystem potek...@bnl.gov wrote: Well I know the cache is there for a reason, I just can't explain the factor of 4 when I run my queries on a hot vs cold cache. My queries are actually a chain of one on an inverted index, which produces a tuple of keys to

Changing comparators

2011-02-23 Thread Narendra Sharma
Today it is not possible to change the comparators (compare_with and compare_subcolumns_with). I went through the discussion on thread http://comments.gmane.org/gmane.comp.db.cassandra.user/12466. Does it make sense to atleast allow one way change i.e. from specific types to generic type? For eg

Re: Understand eventually consistent

2011-02-23 Thread mcasandra
I am reading this again http://wiki.apache.org/cassandra/HintedHandoff and got little confused. This is my understanding about how HH should work based on what I read in Dynamo Paper: 1) Say node A, B, C, D, E are in the cluster in a ring (in that order). 2) For a given key K RF=3. 3) Node B

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Narendra Sharma
Remember the simple rule. Column with highest timestamp is the one that will be considered correct EVENTUALLY. So consider following case: Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL = QUORUM a. QUORUM in this case requires 2 nodes. Write failed with successful write to

Re: Is it possible to get list of row keys?

2011-02-23 Thread Joshua Partogi
Hi everyone, Thank you to everyone that have responded to my email. I really appreciate that. I am sorry for not making it clear in my original post that what I am looking for is the list of keys in the database assuming that the client application does not know the keys. From what I understand,

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
Thanks Narendra. This is exactly what I was looking for. So the read will return with old value but at the same time, repair will occur and next reads will return new value. But the new value was never written successfully in the first place as Quorum was never achieved. Isn't that semantically

Re: Is it possible to get list of row keys?

2011-02-23 Thread Roshan Dawrani
On Thu, Feb 24, 2011 at 6:54 AM, Joshua Partogi joshua.j...@gmail.comwrote: I am sorry for not making it clear in my original post that what I am looking for is the list of keys in the database assuming that the client application does not know the keys. From what I understand,

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Anthony John
Remember the simple rule. Column with highest timestamp is the one that will be considered correct EVENTUALLY. So consider following case: I am sorry, that will return inconsistent results even a Q. Time stamp have nothing to do with this. It is just an application provided artifact and could be

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
In this case - N1 will be identified as a discrepancy and the change will be discarded via read repair Brilliant. This does sound correct :) One more related question - how are read repairs protected against a quorum write that is in-progress? For e.g. say nodes A, B, C and Client C1 intends to

New Chain for : Does Cassandra use vector clocks

2011-02-23 Thread Anthony John
Apologies : For some reason my response on the original mail keeps bouncing back, thus this new one! From the other hand, the same article says: For conditional writes to work, the condition must be evaluated at all update sites before the write can be allowed to succeed. This means, that

Re: Splitting a single row into multiple

2011-02-23 Thread Aditya Narayan
Thanks Aaron.. I was looking to spliting the rows so that I could use a standard CF instead of super.. but your argument also makes sense. On Thu, Feb 24, 2011 at 1:19 AM, Aaron Morton aa...@thelastpickle.com wrote: AFAIK performance in the single row case will better. Multi get may require

Re: New Chain for : Does Cassandra use vector clocks

2011-02-23 Thread Ritesh Tijoriwala
I was about to ask what Anthony's latest post below captures - if we don't have vector clocks and no locking, how does cassandra prevent/detect conflicts? This is somewhat related to the question I asked in last post -

Re: New Chain for : Does Cassandra use vector clocks

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 9:28 PM, Ritesh Tijoriwala tijoriwala.rit...@gmail.com wrote: I was about to ask what Anthony's latest post below captures - if we don't have vector clocks and no locking, how does cassandra prevent/detect conflicts? This is somewhat related to the question I asked in

Fill disks more than 50%

2011-02-23 Thread Terje Marthinussen
Hi, Given that you have have always increasing key values (timestamps) and never delete and hardly ever overwrite data. If you want to minimize work on rebalancing and statically assign (new) token ranges to new nodes as you add them so they always get the latest data Lets say you add a new

Re: Is it possible to get list of row keys?

2011-02-23 Thread Joshua Partogi
Thanks Roshan, I think I understand now. The setRowCount() is in the Java Cassandra driver. I'll try to find the similar method in the Ruby API. Kind regards, Joshua On Thu, Feb 24, 2011 at 1:04 PM, Roshan Dawrani roshandawr...@gmail.com wrote: On Thu, Feb 24, 2011 at 6:54 AM, Joshua Partogi

A simple script that creates multi node clusters on a single machine.

2011-02-23 Thread Edward Capriolo
On the mailing list and IRC there are many questions about Cassandra internals. I understand where the questions are coming from because it took me a while to get a grip on it. However if you have a laptop with a descent amount of RAM 2 GB is enough for 3-5 nodes, (4GB is better). You can kick up

Re: Fill disks more than 50%

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Hi, Given that you have have always increasing key values (timestamps) and never delete and hardly ever overwrite data. If you want to minimize work on rebalancing and statically assign (new) token ranges to

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
Hi Dave, Thanks for ur reply..I tried using elastics ips. And below is the configuration of the cassandra.yaml in both the nodes. seeds: - 50.18.60.117 - 175.41.143.192 Now when i run cassandra i get following exception INFO 04:30:56,680 Heap size: 878116864/879165440 INFO

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Dave Viner
That looks like it's not an issue of communicating between nodes. It appears that the node can not bind to the address on the localhost that you're asking for. java.net.BindException: Cannot assign requested address I think the issue is that the Elastic IP address is not actually an IP address

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Narendra Sharma
c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that was written to node1 will be returned. In this case - N1 will be identified as a discrepancy and the change will be discarded via read repair [Naren] How will Cassandra know this is a discrepancy? On Wed, Feb 23, 2011

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
Hi Dave, I tried with the public ips. If i mention the public ip in rpc address field, Cassandra gives the same exceptionbut if leave it blank then Cassandra runs but again in the nodetool command with ring option it does'nt show the node in another region. Thanks, Himanshi-Dave Viner

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Dave Viner
Try using the private ipv4 address in the rpc_address field, and the public ipv4 (NOT the elastic ip) in the listen_address. If that fails, go back to rpc_address empty, and start up cassandra. Then from the other node, please telnet to port 7000 on the first node. And show the output of that

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
giving private ip to rpc address gives the same exception and the keeping it blank and providing public to listen also fails. I tried keeping both blank and did telnet on 7000 so i get following o/p [root@ip-10-166-223-150 bin]# telnet 122.248.193.37 7000Trying 122.248.193.37...Connected to