I expect that this problem was due to
https://issues.apache.org/jira/browse/CASSANDRA-2216 : I'll make noise to
try and get it released soon as 0.7.3
On Tue, Feb 22, 2011 at 5:41 AM, David Boxenhorn da...@lookin2.com wrote:
Thanks, Shimi. I'll keep you posted if we make progress. Riptano is
In practice, local secondary indexes scale to {RF * the limit of a single
machine} for -low cardinality- values (ex: users living in a certain state)
since the first node is likely to be able to answer your question. This also
means they are good for performing filtering for analytics.
On the
In the case described below if less than CL nodes respond in rpc_timeout (from
conf yaml) the client will get a timeout error. I think most higher level
clients will automatically retry in this case.
If there are not enough nodes to start the request you will get an Unavailable
exception.
Thanks Dave but I am able to telnet to other instances on port 7000
and when i run ./nodetool --host
ec2-50-18-60-117.us-west-1.compute.amazonaws.com ring... I can see only
one node.
Do we need to configure anything else in Cassandra.yaml or
Cassandra-env.sh ???
From:
Dave Viner
Data is in Memtables from writes before they get flushed (based on first
threshold of ops/size/time exceeded; all are configurable) to SSTables on
disk.
There is a keycache and a rowcache. The keycache caches offsets into
SSTables for the rows. the rowcache caches the entire row. There is also
did you define the other host in the cassandra.yaml ? on both servers
they need to know about each other
On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma
himanshi.sha...@tcs.comwrote:
Thanks Dave but I am able to telnet to other instances on port 7000
and when i run ./nodetool --host
Hi ,
Is there benefit to delegate some nodes specifically for read
operations, and others specifically for write? When designing a web
app, I could create a connection pool for reads and one for writes ...
or is this me falling back to my rdbms way of thinking?
-sd
--
Sasha Dolgy
Basically: vector clocks tell you there was a conflict, but not how to
resolve it (that is, you simply don't have enough information to
resolve it even if you push that back to the client a la Dynamo).
What dynamo-like systems mostly VC for is the trivial case of client
X updated field 1,
Not necessary.All the nodes have the same function and the same access to data.AaronOn 23 Feb, 2011,at 10:27 PM, Sasha Dolgy sdo...@gmail.com wrote:Hi ,
Is there benefit to delegate some nodes specifically for read
operations, and others specifically for write? When designing a web
app, I could
The map returned by multiget_slice (what I suspect is the underlying thrift
call for getColumnsFromRows) is not a order preserving map, it's a HashMap
so the order of the returned results cannot be depended on. Even if it was
a order preserving map, not all languages would be able to make use of
Ya they do. Have specified Public DNS in seed field of each node in
Cassandra.yaml...nt able to figure out what the problem is ???
From:
Sasha Dolgy sdo...@gmail.com
To:
user@cassandra.apache.org
Date:
02/23/2011 02:56 PM
Subject:
Re: Cassandra nodes on EC2 in two different regions not
Everything as I thought, thank you!
2011/2/23 Matthew Dennis mden...@datastax.com
Data is in Memtables from writes before they get flushed (based on first
threshold of ops/size/time exceeded; all are configurable) to SSTables on
disk.
There is a keycache and a rowcache. The keycache caches
You can use the setRowCount() method to specify how many keys to return per
call.
By default is 100.
Beware don't set it too high since it might cause OOM.
And underline code will pre-allocate an array list with size you speify in
setRowCount(). So you might get a OOM if
you used something
What if i want 20 rows and the next 20 rows in a subsequent query? can this
only be achieved with OPP?
--
Sasha Dolgy
sasha.do...@gmail.com
On 23 Feb 2011 13:54, Ching-Cheng Chen cc...@evidentsoftware.com wrote:
query per ranges is only possible with OPP or BPP.
Bye,
Norman
2011/2/23 Sasha Dolgy sdo...@gmail.com:
What if i want 20 rows and the next 20 rows in a subsequent query? can this
only be achieved with OPP?
--
Sasha Dolgy
sasha.do...@gmail.com
On 23 Feb 2011 13:54, Ching-Cheng Chen
Actually, if you want to get ALL keys, I believe you can still use
RangeSliceQuery with RP.
Just use setKeys(,) as first batch call.
Then use the last key from previous batch as startKey for next batch.
Beware that since startKey is inclusive, so you'd need to ignore first key
from now on.
Keep
On Wed, Feb 23, 2011 at 7:17 PM, Ching-Cheng Chen cc...@evidentsoftware.com
wrote:
Actually, if you want to get ALL keys, I believe you can still use
RangeSliceQuery with RP.
Just use setKeys(,) as first batch call.
Then use the last key from previous batch as startKey for next batch.
yes but be aware that the keys will not in the right order.
Bye,
Norman
2011/2/23 Roshan Dawrani roshandawr...@gmail.com:
On Wed, Feb 23, 2011 at 7:17 PM, Ching-Cheng Chen
cc...@evidentsoftware.com wrote:
Actually, if you want to get ALL keys, I believe you can still use
RangeSliceQuery
On Wed, Feb 23, 2011 at 3:32 AM, Oleg Anastasyev olega...@gmail.com wrote:
Basically: vector clocks tell you there was a conflict, but not how to
resolve it (that is, you simply don't have enough information to
resolve it even if you push that back to the client a la Dynamo).
What dynamo-like
Yes. But I don't think the retrieving keys in the right order was part of
the original question. :-)
On Wed, Feb 23, 2011 at 7:50 PM, Norman Maurer nor...@apache.org wrote:
yes but be aware that the keys will not in the right order.
Bye,
Norman
2011/2/23 Roshan Dawrani
They are, however, in *stable* order, which is important.
On Wed, Feb 23, 2011 at 3:20 PM, Norman Maurer nor...@apache.org wrote:
yes but be aware that the keys will not in the right order.
Bye,
Norman
2011/2/23 Roshan Dawrani roshandawr...@gmail.com:
On Wed, Feb 23, 2011 at 7:17 PM,
From the article I linked:
But wait, some might say, you can avoid all this by using vectors in
a different way – to prevent update conflicts by issuing conditional
writes which specify a version (vector) and only succeed if that
version is still current. Sorry, but no, or at least not
Try using the IP address, not the dns name in the cassandra.yaml.
If you can telnet from one to the other on port 7000, and both nodes have
the other node in their config, it should work.
Dave Viner
On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma himanshi.sha...@tcs.comwrote:
Ya they do.
The internal Amazon IP address is what you will want to use so you don't
have to go through DNS anyways; not sure if this works from US-East to
US-West, but it does make things quicker in between zones, e.g. us-east-1a
to us-east-1b.
On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner davevi...@gmail.com
Does it make any difference if I split a row, that needs to be
accessed together, into two or three rows and then read those multiple
rows ??
(Assume the keys of all the three rows are known to me programatically
since I split columns by certain categories).
Would the performance be any better if
Hi all,
We want migrate from version 0.6.5 to version 0.7.2. Is there step by
step guide or document we can follow?
Also there is new branch cassandra-0.7.2 on svn, what is purpose to
create the new branch instead of one branch cassandra-0.7? Will you
maintain both branches?
Thanks,
On Wed, Feb 23, 2011 at 9:57 AM, Oleg Anastasyev olega...@gmail.com wrote:
From the other hand, the same article says:
For conditional writes to work, the condition must be evaluated at all update
sites before the write can be allowed to succeed.
This means, that when doing such an update
On Wed, Feb 23, 2011 at 11:18 AM, Zhong Li z...@voxeo.com wrote:
Hi all,
We want migrate from version 0.6.5 to version 0.7.2. Is there step by step
guide or document we can follow?
NEWS.txt
Also there is new branch cassandra-0.7.2 on svn, what is purpose to create
the new branch instead
I posted on this topic last September. (See
http://www.mail-archive.com/user@cassandra.apache.org/msg05692.html)
I was able to use Cassandra across EC2regions. However, the trick is
that you have must use the external addresses in your storage-conf.xml,
but since you don't have a NIC that
What's the best way to bring multiple seeds up, should only one of them have
auto bootstrap set to true or should neither of them? Should they list
themselves and the other seed in their seed section in the yaml config?
___
This e-mail may contain
The DataStax documentation offers some answers to those questions in
the Getting
Startedhttp://www.datastax.com/dev/tutorials/getting_started_0.7/configuring#adding-nodes-to-a-cassandra-clustersection
and the
Clusteringhttp://www.datastax.com/docs/0.7/operations/clustering#adding-capacityreference
Yeah I set the tokens, I'm more asking if I start the first seed node with
autobootstrap set to false the second seed should have it set to true as well
as all the slave nodes correct? I didn't see this in the docs but I may have
just missed it.
From: Eric Gilmore [mailto:e...@datastax.com]
On Wed, Feb 23, 2011 at 2:30 PM, jeremy.truel...@barclayscapital.com wrote:
Yeah I set the tokens, I’m more asking if I start the first seed node with
autobootstrap set to false the second seed should have it set to true as
well as all the slave nodes correct? I didn’t see this in the docs but
So all seeds should always be set to 'auto_bootstrap: false' in their .yaml
file.
-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Wednesday, February 23, 2011 2:36 PM
To: user@cassandra.apache.org
Cc: Truelove, Jeremy: IT (NYK)
Subject: Re: Multiple Seeds
At CL levels high than ANY hinted handoff will be used if enabled. It does not
contribute to the number of replicas considered written by the coordinator
though. E.g. If you ask for quorum, and this is 3 nodes, and only 2 are up the
write will fail without starting. In this case the HH is
Well -- when you first bring a node into a ring, you will probably want to
stream data to it with auto_bootstrap: true.
If you want that node to be a seed, then add it to the seeds list AFTER it
has joined the ring.
I'd refer you to the Seed List and Autoboostrapping sections of the
Getting
AFAIK performance in the single row case will better. Multi get may require
multiple seeks and reads in an sstable,, verses obviously a single seek and
read for a single row. Multiplied by the number of sstables that contain row
data.
Using the key cache would reduce the the seeks.
If it
To add a host to the seeds list after it has had the data streamed to it I need
to
1. stop it
2. edit the yaml file to
a. include it in the seeds list
b. set auto boostrap to false
3.restart it
correct? Additionally you would need to add it to the other nodes
On Wed, Feb 23, 2011 at 2:59 PM, jeremy.truel...@barclayscapital.com wrote:
To add a host to the seeds list after it has had the data streamed to it I
need to
1. stop it
2. edit the yaml file to
a. include it in the seeds list
b. set auto boostrap to false
3.
Also should non-seed host be perpetually set to auto_bootstrap: true ?
From: Truelove, Jeremy: IT (NYK)
Sent: Wednesday, February 23, 2011 3:00 PM
To: user@cassandra.apache.org
Subject: RE: Multiple Seeds
To add a host to the seeds list after it has had the data streamed to it I need
to
1.
So does cassandra monitor the config file for changes? If it doesn't how else
would it know unless you restart you had added a new seed?
-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Wednesday, February 23, 2011 3:23 PM
To: user@cassandra.apache.org
Cc:
Hello,
I'm currently doing my masters project. I need to store lots of time series
data of any type (String, int, booleans, arrays of the previous) with a high
writing rate(20MBytes/sec - 170TBytes/year - note not running continuously)
but less strict read requirements. This is monitoring data
Ritesh,
You have seen the problem. Clients may read the newly written value even
though the client performing the write saw it as a failure. When the client
reads, it will use the correct number of replicas for the chosen CL, then
return the newest value seen at any replica. This newest value
Read repair will probably occur at that point (depending on your config),
which would cause the newest value to propagate to more replicas.
Is the newest value the quorum value which means it is the old value that
will be written back to the nodes having newer non-quorum value or the
newest
Hi Alexandru,
I feel Cassandra can certainly be used to solve the problem you have but if
your requires are not very strict, you need very high throughput and its
okay for you to lose some data occasionally due to machine crash, then I
recommend you look at Redis (http://redis.io/). It is a high
Hi Matthew,
As you mention the map returned from multiget_slice is not order preserving,
Pelops is doing this on the client side...
Cheers,
Dan
--
Dan Washusen
Sent with Sparrow
On Wednesday, 23 February 2011 at 8:38 PM, Matthew Dennis wrote:
The map returned by multiget_slice (what I
I know that theoretically it should not (apart from compaction issues), but
maybe somebody has experience showing otherwise:
My test cluster now has 250GB of data and will have 1.5TB in its
reincarnation. If all these data is in a single CF -- will it cause read or
write performance problems?
There was a discussion here on how well (or not so well) the Super CFs are
supported. I now need to make a strategic decision as to how I plan my data.
What's the consensus -- will the super CF be there 3 years out?
TIA
Maxim
--
View this message in context:
Seems to me that the explanations are getting incredibly complicated - while
I submit the real issue is not!
Salient points here:-
1. To be guaranteed data consistency - the writes and reads have to be at
Quorum CL or more
2. Any W/R at lesser CL means that the application has to handle the
On Wed, Feb 23, 2011 at 3:28 PM, jeremy.truel...@barclayscapital.com wrote:
So does cassandra monitor the config file for changes? If it doesn't how else
would it know unless you restart you had added a new seed?
-Original Message-
From: Edward Capriolo
Jonathan Ellis jbellis at gmail.com writes:
IMO if you only get CL.ALL it's not superior enough to pessimistic
locking to justify the complexity of adding it.
Yes, may be youre right, but CL.ALL is neccessary only to solve this problem in
a generic way.
In some (most?) cases, conflicts
On Wed, Feb 23, 2011 at 4:51 PM, buddhasystem potek...@bnl.gov wrote:
I know that theoretically it should not (apart from compaction issues), but
maybe somebody has experience showing otherwise:
My test cluster now has 250GB of data and will have 1.5TB in its
reincarnation. If all these data
Let me start out by saying that I think I'm going to have to write a patch
to get what I want, but I'm fine with that. I just wanted to check here
first to make sure that I'm not missing something obvious.
I'd like to be able to run a MapReduce job that takes a value in an indexed
column as a
hi Anthony,
While you stated the facts right, I don't see how it relates to the question
I ask. Can you elaborate specifically what happens in the case I mentioned
above to Dave?
thanks,
Ritesh
On Wed, Feb 23, 2011 at 1:57 PM, Anthony John chirayit...@gmail.com wrote:
Seems to me that the
So far my understanding about indexes is that you can create indexes only on
column values (username in below eg).
Does it make sense to also have index on the keys that columnFamily uses to
store rows (row keys abc in below example). I am thinking in an event rows
keep growing would search be
Ritesh,
At CL ANY - if all endpoints are down - a HH is written. And it is a
successful write - not a failed write.
Now that does not guarantee a READ of the value just written - but that is a
risk that you take when you use the ANY CL!
HTH,
-JA
On Wed, Feb 23, 2011 at 4:40 PM, Ritesh
Hi all,
I'm wondering if anyone has used cassandra as a datastore for a user-profile
service. I'm thinking of applications like behavioral targeting, where
there are lots lots of users (10s to 100s of millions), and lots lots of
data about them intermixed in, say, weblogs (probably TBs worth).
Hi Anthony,
I am not talking about the case of CL ANY. I am talking about the case where
your consistency level is R + W N and you want to write to W nodes but
only succeed in writing to X ( where X W) nodes and hence fail the write
to the client.
thanks,
Ritesh
On Wed, Feb 23, 2011 at 2:48
Well I know the cache is there for a reason, I just can't explain the factor
of 4 when I run my queries on a hot vs cold cache. My queries are actually a
chain of one on an inverted index, which produces a tuple of keys to be used
in the main query. The inverted index query should be downright
On Wed, Feb 23, 2011 at 4:04 PM, buddhasystem potek...@bnl.gov wrote:
Well I know the cache is there for a reason, I just can't explain the factor
of 4 when I run my queries on a hot vs cold cache. My queries are actually a
chain of one on an inverted index, which produces a tuple of keys to
Today it is not possible to change the comparators (compare_with and
compare_subcolumns_with). I went through the discussion on thread
http://comments.gmane.org/gmane.comp.db.cassandra.user/12466.
Does it make sense to atleast allow one way change i.e. from specific types
to generic type? For eg
I am reading this again http://wiki.apache.org/cassandra/HintedHandoff and
got little confused. This is my understanding about how HH should work based
on what I read in Dynamo Paper:
1) Say node A, B, C, D, E are in the cluster in a ring (in that order).
2) For a given key K RF=3.
3) Node B
Remember the simple rule. Column with highest timestamp is the one that will
be considered correct EVENTUALLY. So consider following case:
Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL =
QUORUM
a. QUORUM in this case requires 2 nodes. Write failed with successful write
to
Hi everyone,
Thank you to everyone that have responded to my email. I really
appreciate that. I am sorry for not making it clear in my original
post that what I am looking for is the list of keys in the database
assuming that the client application does not know the keys. From what
I understand,
Thanks Narendra. This is exactly what I was looking for. So the read will
return with old value but at the same time, repair will occur and next reads
will return new value. But the new value was never written successfully in
the first place as Quorum was never achieved. Isn't that semantically
On Thu, Feb 24, 2011 at 6:54 AM, Joshua Partogi joshua.j...@gmail.comwrote:
I am sorry for not making it clear in my original
post that what I am looking for is the list of keys in the database
assuming that the client application does not know the keys. From what
I understand,
Remember the simple rule. Column with highest timestamp is the one that
will be considered correct EVENTUALLY. So consider following case:
I am sorry, that will return inconsistent results even a Q. Time stamp have
nothing to do with this. It is just an application provided artifact and
could be
In this case - N1 will be identified as a discrepancy and the change will
be discarded via read repair
Brilliant. This does sound correct :)
One more related question - how are read repairs protected against a quorum
write that is in-progress? For e.g. say nodes A, B, C and Client C1 intends
to
Apologies : For some reason my response on the original mail keeps bouncing
back, thus this new one!
From the other hand, the same article says:
For conditional writes to work, the condition must be evaluated at all
update
sites before the write can be allowed to succeed.
This means, that
Thanks Aaron.. I was looking to spliting the rows so that I could use
a standard CF instead of super.. but your argument also makes sense.
On Thu, Feb 24, 2011 at 1:19 AM, Aaron Morton aa...@thelastpickle.com wrote:
AFAIK performance in the single row case will better. Multi get may require
I was about to ask what Anthony's latest post below captures - if we don't
have vector clocks and no locking, how does cassandra prevent/detect
conflicts? This is somewhat related to the question I asked in last post -
On Wed, Feb 23, 2011 at 9:28 PM, Ritesh Tijoriwala
tijoriwala.rit...@gmail.com wrote:
I was about to ask what Anthony's latest post below captures - if we don't
have vector clocks and no locking, how does cassandra prevent/detect
conflicts? This is somewhat related to the question I asked in
Hi,
Given that you have have always increasing key values (timestamps) and never
delete and hardly ever overwrite data.
If you want to minimize work on rebalancing and statically assign (new)
token ranges to new nodes as you add them so they always get the latest
data
Lets say you add a new
Thanks Roshan,
I think I understand now. The setRowCount() is in the Java Cassandra
driver. I'll try to find the similar method in the Ruby API.
Kind regards,
Joshua
On Thu, Feb 24, 2011 at 1:04 PM, Roshan Dawrani roshandawr...@gmail.com wrote:
On Thu, Feb 24, 2011 at 6:54 AM, Joshua Partogi
On the mailing list and IRC there are many questions about Cassandra
internals. I understand where the questions are coming from because it
took me a while to get a grip on it.
However if you have a laptop with a descent amount of RAM 2 GB is
enough for 3-5 nodes, (4GB is better). You can kick up
On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Hi,
Given that you have have always increasing key values (timestamps) and never
delete and hardly ever overwrite data.
If you want to minimize work on rebalancing and statically assign (new)
token ranges to
Hi Dave,
Thanks for ur reply..I tried using elastics ips.
And below is the configuration of the cassandra.yaml in both the nodes.
seeds:
- 50.18.60.117
- 175.41.143.192
Now when i run cassandra i get following exception
INFO 04:30:56,680 Heap size: 878116864/879165440
INFO
That looks like it's not an issue of communicating between nodes. It
appears that the node can not bind to the address on the localhost that
you're asking for.
java.net.BindException: Cannot assign requested address
I think the issue is that the Elastic IP address is not actually an IP
address
c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
that was written to node1 will be returned.
In this case - N1 will be identified as a discrepancy and the change will
be discarded via read repair
[Naren] How will Cassandra know this is a discrepancy?
On Wed, Feb 23, 2011
Hi Dave,
I tried with the public ips. If i mention the public ip in rpc address field, Cassandra gives the same exceptionbut if leave it blank then Cassandra runs but again in the nodetool command with ring option it does'nt show the node in another region.
Thanks,
Himanshi-Dave Viner
Try using the private ipv4 address in the rpc_address field, and the public
ipv4 (NOT the elastic ip) in the listen_address.
If that fails, go back to rpc_address empty, and start up cassandra.
Then from the other node, please telnet to port 7000 on the first node. And
show the output of that
giving private ip to rpc address gives the same exception
and the keeping it blank and providing public to listen also fails. I tried keeping both blank and did telnet on 7000 so i get following o/p
[root@ip-10-166-223-150 bin]# telnet 122.248.193.37 7000Trying 122.248.193.37...Connected to
82 matches
Mail list logo