date:20110616

Lots of folk use a single disk or raid-1 for the system and commit log and 
raid-0 for the data volumes http://wiki.apache.org/cassandra/CassandraHardware

Your money is probably better spent on more nodes with more disks and more 
memory. More nodes is always better.  

Happy to hear reasons otherwise. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15 Jun 2011, at 15:50, Marcos Ortiz wrote:

 
 
 El 6/14/2011 1:43 PM, Eric Czech escribió:
 
 Thanks Aaron.  I'll make sure to copy the system tables.
 
 Another thing -- do you have any suggestions on raid configurations for main 
 data drives?  We're looking at RAID5 and 10 and I can't seem to find a 
 convincing argument one way or the other.
 Well, I learned from administrating other databases (like PostgreSQL and 
 Oracle) that RAID 10 is the best solution for data. With RAID 5, the discs 
 suffer a lot for the excesive I/O and It can arrive to 
 data lost. You can search about the RAID 5 Write Hole to view this.
 
 
 Thanks again for your help.
 
 On Mon, Jun 6, 2011 at 5:45 AM, aaron morton aa...@thelastpickle.com wrote:
 Sounds like you are OK to turn off the existing cluster first.
 
 Assuming so, deliver any hints using JMX then do a nodetool flush to write 
 out all the memtables and checkpoint the commit logs. You can then copy the 
 data directories.
 
 The System data directory contains the nodes token and the schema, you will 
 want to copy this directory. You may also want to copy the cassandra.yaml or 
 create new ones with the correct initial tokens.
 
 The nodes will sort themselves out when they start up and get new IP's, the 
 important thing to them is the token.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 6 Jun 2011, at 23:25, Eric Czech wrote:
 
  Hi, I have a quick question about migrating a cluster.
 
  We have a cassandra cluster with 10 nodes that we'd like to move to a new 
  DC and what I was hoping to do is just copy the SSTables for each node to 
  a corresponding node in the new DC (the new cluster will also have 10 
  nodes).  Is there any reason that a straight file copy like this wouldn't 
  work?  Do any system tables need to be moved as well or is there anything 
  else that needs to be done?
 
  Thanks!
 
 
 
 -- 
 Marcos Luís Ortíz Valmaseda
  Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://twitter.com/marcosluis2186

Re: Slowdowns during repair

Look for log messages at the ERROR level first to find out why it's crashing. 

Check for GC pressure during the repair, either using JConsole or log messages 
from the GCInspector. 

Check the nodetool tpstats to get an idea if the nodes are saturated, i.e. are 
their tasks in the pending list. Or are they just running with high latency. 

If a node crashes when calculating the Merkle tree's for it's neighbours the 
repair will hang (for 48 hours i think) on the node that initiated the repair. 
I dont think this is immediately obvious though tpstats .

Start with why it's crashing and whats happening with the GC. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 10:20, Aurynn Shaw wrote:

 Hey all;
 
 So, we have Cassandra running on a 5-server ring, with a RF of 3, and we're 
 regularly seeing major slowdowns in read  write performance while running 
 nodetool repair, as well as the occasional Cassandra crash during the repair 
 window - slowdowns past 10 seconds to perform a single write.
 
 The repair cycle runs nightly on a different server, so each server has it 
 run once a week.
 
 We're running 0.7.0 currently, and we'll be upgrading to 0.7.6 shortly.
 
 System load on the Cassandra servers is never more than 10% CPU and utterly 
 minimal IO usage, so I wouldn't think we'd be seeing issues quite like this.
 
 What sort of knobs should I be looking at tuning to reduce the impact that 
 nodetool repair has on Cassandra? What questions should I be asking as to why 
 Cassandra slows down to the level that it does, and what I should be 
 optimizing?
 
 Additionally, what should I be looking for in the logs when this is 
 happening? There's a lot in the logs, but I'm not sure what to look for.
 
 Cassadra is, in this instance, backing a system that supports around a 
 million requests a day, so not terribly heavy traffic.
 
 Thanks,
 
 Aurynn

Re: Where is my data?

I wrote a blog post about this sort of thing the other day 
http://thelastpickle.com/2011/06/13/Down-For-Me/

Let me know if you spot any problems. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 02:20, AJ wrote:

 Thanks
 
 On 6/15/2011 3:20 AM, Sylvain Lebresne wrote:
 You can use the thrift call describe_ring(). It will returns a map
 that associate to each range of the
 ring who is a replica. Once any range has all it's endpoint
 unavailable, that range of the data is unavailable.
 
 --
 Sylvain

Re: What's the best approach to search in Cassandra

2011-06-16 Thread Jake Luciani

Mark,

Solandra doesn't use secondary indexes, the functionality is too limited for
the lucene api.  It maintain's it's own indexes in regular column families.
 I suggest you look at Solr and decide if this is the functionality you
need, Solandra offers the same api but on Cassandra's distributed model.

-Jake

On Thu, Jun 16, 2011 at 12:56 AM, Mark Kerzner markkerz...@gmail.comwrote:

 Jake,

 *You need to maintain a huge number of distinct indexes.*
 *
 *
 *Are we talking about secondary indexes? If yes, this sounds like exactly
 my problem. There is so little documentation! - but I think that if I read
 all there is on GitHub, I can probably start using it.
 *

 Thank you,
 Mark

 On Fri, Jun 3, 2011 at 8:07 PM, Jake Luciani jak...@gmail.com wrote:

 Mark,

 Check out Solandra.  http://github.com/tjake/Solandra


 On Fri, Jun 3, 2011 at 7:56 PM, Mark Kerzner markkerz...@gmail.comwrote:

 Hi,

 I need to store, say, 10M-100M documents, with each document having say
 100 fields, like author, creation date, access date, etc., and then I want
 to ask questions like

 give me all documents whose author is like abc**, and creation date any
 time in 2010 and access date in 2010-2011, and so on, perhaps 10-20
 conditions, matching a list of some keywords.

 What's best, Lucene, Katta, Cassandra CF with secondary indices, or plan
 scan and compare of every record?

 Thanks a bunch!

 Mark




 --
 http://twitter.com/tjake





-- 
http://twitter.com/tjake

Re: Force a node to form part of quorum

Short answer:  No. 

Medium answer: No all nodes are equal. It could create a single point of 
failure if a QUOURM could not be formed without a specific node. 

Writes are sent to every replica. Reads with Read Repair enabled are also sent 
to every replica. For reads the closest UP node as determined by the snitch 
and possibly re-ordered by the Dynamic Snitch  is asked to return the actual 
data. This replica must respond for the request to complete. 

If it's a question about maximising cache hits see 
https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L308

Cheers
   
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 05:58, A J wrote:

 Is there a way to favor a node to always participate (or never
 participate) towards fulfillment of read consistency as well as write
 consistency ?
 
 Thanks
 AJ

Re: Atomicity of batch updates

See http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 06:26, chovatia jaydeep wrote:

 Cassandra write operation is atomic for all the columns/super columns for a 
 given row key in Column Family. So in your case not all previous operations 
 (assuming each operation was on separate key) will be reverted.
 
 Thank you,
 Jaydeep
 
 From: Artem Orobets artem.orob...@exigenservices.com
 To: user@cassandra.apache.org user@cassandra.apache.org
 Cc: Andrey Lomakin andrey.loma...@exigenservices.com
 Sent: Wednesday, 15 June 2011 7:42 AM
 Subject: Atomicity of batch updates
 
 Hi,
 Wiki says that write operation is atomic within ColumnFamily 
 (http://wiki.apache.org/cassandra/ArchitectureOverview chapter “write 
 properties”).
 If I use batch update for single CF, and get an exception in last mutation 
 operation, is it means that all previous operation will be reverted.
 If no, what means atomic in this context?

Re: Easy way to overload a single node on purpose?

 DEBUG 14:36:55,546 ... timed out

Is logged when the coordinator times out waiting for the replicas to respond, 
the timeout setting is rpc_timeout in the yaml file. This results in the client 
getting a TimedOutException. 

AFAIK There is no global everything is good / bad flags to check. e.g. AFAIK I 
node will not mark its self down if it runs out of disk space.  So you need to 
monitor the free disk space and alert on that. 

Having a ping column can work if every key is replicated to every node. It 
would tell you the cluster is working, sort of. Once the number of nodes is 
greater than the RF, it tells you a subset of the nodes works. 

If you google around you'll find discussions about monitoring with munin, 
ganglia, cloud kick and Ops Centre. 

If you install mx4j you can access the JMX metrics via HTTP,

Cheers
  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 10:38, Suan Aik Yeo wrote:

 Here's a weird one... what's the best way to get a Cassandra node into a 
 half-crashed state?
 
 We have a 3-node cluster running 0.7.5. A few days ago this happened 
 organically to node1 - the partition the commitlog was on was 100% full and 
 there was a No space left on device error, and after a while, although the 
 cluster and node1 was still up, to the other nodes it was down, and messages 
 like:
 DEBUG 14:36:55,546 ... timed out
 started to show up in its debug logs.
 
 We have a tool to indicate to the load balancer that a Cassandra node is 
 down, but it didn't detect it that time. Now I'm having trouble purposefully 
 getting the node back to that state, so that I can try other monitoring 
 methods. I've tried to fill up the commitlog partition with other files, and 
 although I get the No space left on device error, the node still doesn't go 
 down and show the other symptoms it showed before.
 
 Also, if anyone could recommend a good way for a node itself to detect that 
 its in such a state I'd be interested in that too. Currently what we're doing 
 is making a describe_cluster_name() thrift call, but that still worked when 
 the node was down. I'm thinking of something like reading/writing to a 
 fixed value in a keyspace as a check... Unfortunately Java-based solutions 
 are out of the question.
 
 
 Thanks,
 Suan

Re: Is there a way from a running Cassandra node to determine whether or not itself is up?

take a look at mx4j 
http://wiki.apache.org/cassandra/Operations#Monitoring_with_MX4J

someone told me once you can call the JMX ops via http, i've not checked 
though. 

Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 14:45, Jake Luciani wrote:

 No force a node down you can use nodetool disablegossip
 
 On Wed, Jun 15, 2011 at 6:42 PM, Suan Aik Yeo yeosuan...@gmail.com wrote:
 Thanks, Aaron, but we determined that adding Java into the equation just 
 brings in too much complexity for something that's called out of an Nginx 
 Perl module. Right now I'm having trouble even replicating the above scenario 
 and posted a question here: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Easy-way-to-overload-a-single-node-on-purpose-tt6480958.html
 
 
 - Suan
 
 
 On Thu, Jun 9, 2011 at 3:58 AM, aaron morton aa...@thelastpickle.com wrote:
 None via thrift that I can recall, but the StorageService MBean exposes 
 getLiveNodes() this is what nodetool uses to see which nodes are live.
 
 From the code...
/**
 * Retrieve the list of live nodes in the cluster, where liveness is
 * determined by the failure detector of the node being queried.
 *
 * @return set of IP addresses, as Strings
 */
public ListString getLiveNodes();
 
 Hope that helps.
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 9 Jun 2011, at 17:56, Suan Aik Yeo wrote:
 
  Is there a way (preferably an exposed method accessible through Thrift), 
  from a running Cassandra node to determine whether or not itself is up? 
  (Per Cassandra standards, I'm assuming based on the gossip protocol). 
  Another way to think of what I'm looking for is basically running nodetool 
  ring just on myself, but I'm only interested in knowing whether I'm Up 
  or Down?
 
  I'm currently using the describe_cluster method, but earlier today when 
  the commitlogs for a node filled up and it appeared down to the other 
  nodes, describe_cluster() still worked fine, thus failing the check.
 
  Thanks,
  Suan
 
 
 
 
 
 -- 
 http://twitter.com/tjake

Re: Docs: Token Selection

See this thread for background
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replica-data-distributing-between-racks-td6324819.html

In a multi DC environment, if you calculate the initial tokens for the entire
cluster data will not be evenly distributed.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 15:51, Vijay wrote:

+1 for more documentation (I guess contributions are always welcomed) I
will try to write it down sometime when we have a bit more time...

0.8 nodetool ring command adds the DC and RAC information

http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
http://www.datastax.com/products/opscenter

Hope this helps...

Regards,
/VJ

On Wed, Jun 15, 2011 at 7:24 PM, AJ a...@dude.podzone.net wrote:
Ok. I understand the reasoning you laid out. But, I think it should be
documented more thoroughly. I was trying to get an idea as to how flexible
Cass lets you be with the various combinations of strategies, snitches, token
ranges, etc..

It would be instructional to see what a graphical representation of a cluster
ring with multiple data centers looks like. Google turned-up nothing. I
imagine it's a multilayer ring; one layer per data center with the nodes of
one layer slightly offset from the ones in the other (based on the example in
the wiki). I would also like to know which node is next in the ring such so
as to understand replica placement in, for example, the
OldNetworkTopologyStrategy when it's doc states,

...It places one replica in a different data center from the first (if there
is any such data center), the third replica in a different rack in the first
datacenter, and any remaining replicas on the first unused nodes on the ring.

I can only assume for now that the ring referred to is the local ring of
the first data center.

On 6/15/2011 5:51 PM, Vijay wrote:

No it wont it will assume you are doing the right thing...

Regards,
/VJ

On Wed, Jun 15, 2011 at 2:34 PM, AJ a...@dude.podzone.net wrote:
Vijay, thank you for your thoughtful reply. Will Cass complain if I don't
setup my tokens like in the examples?

On 6/15/2011 2:41 PM, Vijay wrote:

All you heard is right...
You are not overriding Cassandra's token assignment by saying here is your
token...

Logic is:
Calculate a token for the given key...
find the node in each region independently (If you use NTS and if you set
the strategy options which says you want to replicate to the other
region)...
Search for the ranges in each region independntly
Replicate the data to that node.

For multi DC cassandra needs nodes to be equally partitioned within each dc
(If you care that the load equally distributed) as well
as there shouldn't be any collusion of tokens within a cluster

The documentation tried to explain the same and the example in the
documentation.
Hope this clarifies...

More examples if it helps

DC1 Node 1 : token 0
DC1 Node 2 : token 8..

DC2 Node 1 : token 4..
DC2 Node 1 : token 12..

DC1 Node 1 : token 0
DC1 Node 2 : token 1..

DC2 Node 1 : token 8..
DC2 Node 1 : token 7..

Regards,
/VJ

On Wed, Jun 15, 2011 at 12:28 PM, AJ a...@dude.podzone.net wrote:
On 6/15/2011 12:14 PM, Vijay wrote:

Correction

The problem in the above approach is you have 2 nodes between 12 to 4 in
DC1 but from 4 to 12 you just have 1

should be

The problem in the above approach is you have 1 node between 0-4 (25%)
and and one node covering the rest which is 4-16, 0-0 (75%)

Regards,
/VJ

Ok, I think you are saying that the computed token range intervals are
incorrect and that they would be:

DC1
*node 1 = 0 Range: (4, 16], (0, 0]

node 2 = 4 Range: (0, 4]

DC2
*node 3 = 8 Range: (12, 16], (0, 8]

node 4 = 12 Range: (8, 12]

If so, then yes, this is what I am seeking to confirm since I haven't found
any documentation stating this directly and that reference that I gave only
implies this; that is, that the token ranges are calculated per data center
rather than per cluster. I just need someone to confirm that 100% because
it doesn't sound right to me based on everything else I've read.

SO, the question is: Does Cass calculate the consecutive node token ranges
A.) per cluster, or B.) for the whole data center?

From all I understand, the answer is B. But, that documentation (reprinted
below) implies A... or something that doesn't make sense to me because of
the token placement in the example:

With NetworkTopologyStrategy, you should calculate the tokens the nodes in
each DC independantly...

DC1
node 1 = 0
node 2 = 85070591730234615865843651857942052864

DC2
node 3 = 1
node 4 = 850705917302346158658436518579
42052865

However, I do see

Querying superColumn

2011-06-16 Thread Vivek Mishra

I have a question about querying super column

For example:

I have a supercolumnFamily  DEPARTMENT with dynamic superColumn 'EMPLOYEE'( 
name, country).

Now for rowKey 'DEPT1' I have inserted multiple super column like:

Employee1{
Name: Vivek
country:  India
}

Employee2{
Name: Vivs
country:  USA
}



Now if I want to retrieve a super column whose rowkey is 'DEPT1' and  employee 
name is 'Vivek'. Can I get only 'EMPLOYEE1' ?



-Vivek



Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a 
live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor 
Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '.

Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus 
webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 .


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Re: Important Variables for Scaling

It's a difficult questions to answer in the abstract. Some thoughts...

Scaling by adding one node at  time is not optimal. The best case scenario is 
to double the number of nodes, as this means existing nodes only have to stream 
their data to a new node. Obviously this is not always possible. When adding 
less nodes existing nodes keeping a balanced ring may mean streams data to 
other nodes and accepting data from nodes.  

In general try to keep the data volume  50% full, so there is lots of free 
space to do moves. 

In general nodes with a few 100GB's of data are easiest to manage. 

The pending column in nodetool tpstats will let you know how many read or write 
requests are waiting to be serviced. If this is consistently above 
concurrent_reads or concurrent_writes it means there is a queue  1 for each 
thread. This will add to request latency, once maximum throughput is reached 
additional requests will queue. See the SEDA paper.  

Sometime in the 0.7 dev client connection pooling was added to better manage 
those resources. See cassandra.yaml for info.  

The o.a.c.db.StorageProxy JMX MBean provides a latency trackers for total 
request time including wait times. And o.a.c.db.ColumnFamiles... provides 
latency trackers for the local node operations to do the read or write.

if your data set grows quickly watch the disk space etc. If you do a lot of 
requests but you data grows slowly watch the throughout and latency numbers. 

 
Hope that helps.   
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jun 2011, at 18:28, Schuilenga, Jan Taeke wrote:

 Which variables (for instance: throughput, CPU, I/O, connections) are leading 
 in deciding to add a node to a Cassandra setup which is put under strain. We 
 are trying to proove scalibility, but when is the time there to add a node 
 and have the optimum scalibilty result.

Upgrading Cassandra cluster from 0.6.3 to 0.7.5

2011-06-16 Thread Ali Ahsan


Hi All,

We are upgrading cassandra from 0.6.3 to 0.7.5.We have two node in 
cluster.I am bit confused how to upgrade them can you have any guide.


--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.

Re: Querying superColumn

2011-06-16 Thread Donal Zang


Well, you are looking for the secondary index.
But for now,AFAIK, the supercolumn can not use secondary index .
On 16/06/2011 13:55, Vivek Mishra wrote:


Now for rowKey 'DEPT1' I have inserted multiple super column like:

*Employee1{*

*Name: Vivek*

*country:  India*

*}*

**

*Employee2{*

*Name: Vivs*

*country:  USA*

*}*

Now if I want to retrieve a super column whose rowkey is 'DEPT1' and  
employee name is 'Vivek'. Can I get only 'EMPLOYEE1' ?





--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018

snitch thrift

2011-06-16 Thread Terje Marthinussen

Hi all!

Assuming a node ends up in GC land for a while, there is a good chance that
even though it performs terribly and the dynamic snitching will help you to
avoid it on the gossip side, it will not really help you much if thrift
still accepts requests and the thrift interface has choppy performance.

This makes me wonder if it is a potential idea with thrift only client mode
nodes.

I don't think I have seen that this exists today (or is it possible that I
have missed a way to configure that?), but it does not seem like a very hard
thing to make and could maybe be good in some usage patterns for the
datanode as well as the thrift side.

Any thoughts?

Regards,
Terje

Re: Docs: Token Selection

2011-06-16 Thread Eric tamme

AJ,

sorry I seemed to miss the original email on this thread.  As Aaron
said, when computing tokens for multiple data centers, you should
compute them independently for each data center - as if it were its
own Cassandra cluster.

You can have overlapping token ranges between multiple data centers,
but no two nodes can have the same token, so for subsequent data
centers I just increment the tokens.

For two data centers with two nodes each using RandomPartitioner
calculate the tokens for the first DC normally, but int he second data
center, increment the tokens by one.

In DC 1
node 1 = 0
node 2 = 85070591730234615865843651857942052864

In DC 2
node 1 = 1
node 2 =  85070591730234615865843651857942052865

For RowMutations this will give each data center a local set of nodes
that it can write to for complete coverage of the entire token space.
If you are using NetworkTopologyStrategy for replication, it will give
an offset mirror replication between the two data centers so that your
replicas will not get pinned to a node in the remote DC.  There are
other ways to select the tokens, but the increment method is the
simplest to manage and continue to grow with.

Hope that helps.

-Eric

Re: Docs: Token Selection

LOL, I feel Eric's pain.  This double-ring thing can throw you for a 
loop since, like I said, there is only one place it is documented and it 
is only *implied*, so one is not sure he is interpreting it correctly.  
Even the source for NTS doesn't mention this.


Thanks for everyone's help on this.

On 6/16/2011 5:43 AM, aaron morton wrote:
See this thread for background 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replica-data-distributing-between-racks-td6324819.html 



In a multi DC environment, if you calculate the initial tokens for the 
entire cluster data will not be evenly distributed.


Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

Re: Docs: Token Selection

Thanks Eric!  I've finally got it!  I feel like I've just been initiated 
or something by discovering this secret.  I kid!


But, I'm thinking about using OldNetworkTopStrat.  Do you, or anyone 
else, know if the same rules for token assignment applies to ONTS?



On 6/16/2011 7:21 AM, Eric tamme wrote:

AJ,

sorry I seemed to miss the original email on this thread.  As Aaron
said, when computing tokens for multiple data centers, you should
compute them independently for each data center - as if it were its
own Cassandra cluster.

You can have overlapping token ranges between multiple data centers,
but no two nodes can have the same token, so for subsequent data
centers I just increment the tokens.

For two data centers with two nodes each using RandomPartitioner
calculate the tokens for the first DC normally, but int he second data
center, increment the tokens by one.

In DC 1
node 1 = 0
node 2 = 85070591730234615865843651857942052864

In DC 2
node 1 = 1
node 2 =  85070591730234615865843651857942052865

For RowMutations this will give each data center a local set of nodes
that it can write to for complete coverage of the entire token space.
If you are using NetworkTopologyStrategy for replication, it will give
an offset mirror replication between the two data centers so that your
replicas will not get pinned to a node in the remote DC.  There are
other ways to select the tokens, but the increment method is the
simplest to manage and continue to grow with.

Hope that helps.

-Eric

Cassandra JVM GC settings

2011-06-16 Thread Sebastien Coutu

Hi Everyone,

I'm seeing Cassandra GC a lot and I would like to tune the Young space and
the Tenured space. Anyone would have recommendations on the NewRatio or
NewSize/MaxNewSize to use for an environment where Cassandra has several
column families and in which we are doing a mixed load of reading and
writing. The JVM has 8G of heap space assigned to it and there are 9 nodes
to this cluster.

Thanks for the comments!

Sébastien Coutu

client API

2011-06-16 Thread karim abbouh

i use jdk1.6 to install and launch cassandra in a linux platform,but can i use 
jdk1.5 for my cassandra Client ?

Re: Querying superColumn

2011-06-16 Thread Sasha Dolgy

Have 1 row with employee info for country/office/division, each column an
employee id and json info about the employee or a reference.to.another row
id for that employee data

No more supercolumn.
On Jun 16, 2011 1:56 PM, Vivek Mishra vivek.mis...@impetus.co.in wrote:
 I have a question about querying super column

 For example:

 I have a supercolumnFamily DEPARTMENT with dynamic superColumn 'EMPLOYEE'(
name, country).

 Now for rowKey 'DEPT1' I have inserted multiple super column like:

 Employee1{
 Name: Vivek
 country: India
 }

 Employee2{
 Name: Vivs
 country: USA
 }



 Now if I want to retrieve a super column whose rowkey is 'DEPT1' and
employee name is 'Vivek'. Can I get only 'EMPLOYEE1' ?



 -Vivek

 

 Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to
attend a live session by Head of Impetus Labs on 'Secrets of Building a
Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the
Cloud '.

 Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus
webinar on May 27 by registering at
http://www.impetus.com/webinar?eventid=42 .


 NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.

Propose new ConsistencyLevel.ALL_AVAIL for reads


Good morning all.

Hypothetical Setup:
1 data center
RF = 3
Total nodes  3

Problem:
Suppose I need maximum consistency for one critical operation; thus I 
specify CL = ALL for reads.  However, this will fail if only 1 replica 
endpoint is down.  I don't see why this fail is necessary all of the 
time since the data could have been updated since the node became 
unavailable and it's data is old anyways.  If only one node goes down 
and it has the key I need, then the app is not 100% available and it 
could take some time making the node available again.


Proposal:
If all of the *available* replica nodes answer the read operation and 
the latest value timestamp is clearly AFTER the time the down node 
became unavailable, then this situation can meet the requirements for 
*near* 100% consistency since the value in the down node would be 
outdated anyway.  Clearly, the value was updated some time *after* the 
node went down or unavailable.  This way, you can have max availability 
when using read with CL.ALL... or something CL close in meaning to ALL.


I say near 100% consistency to leave room for some situation where the 
unavailable node was only unavailable to the coordinating node for some 
reason such as a network issue and thus still received an update by some 
other route after it appeared unavailable to the current coordinating 
node.  In a situation like this, there is a chance the read will still 
not return the latest value.  So, this will not be truly 100% consistent 
which CL.ALL guarantees.  However, I think this logic could justify a 
new consistency level slightly lower than ALL, such as ALL_AVAIL.


What do you think?  Is my logic correct?  Is there a conflict with the 
architecture or base principles?  This fits with the tunable consistency 
principle for sure.


Thanks for listening

Re: Docs: Token Selection

2011-06-16 Thread Sasha Dolgy

So, with ec2 ... 3 regions (DC's), each one is +1 from another?
On Jun 16, 2011 3:40 PM, AJ a...@dude.podzone.net wrote:
 Thanks Eric! I've finally got it! I feel like I've just been initiated
 or something by discovering this secret. I kid!

 But, I'm thinking about using OldNetworkTopStrat. Do you, or anyone
 else, know if the same rules for token assignment applies to ONTS?


 On 6/16/2011 7:21 AM, Eric tamme wrote:
 AJ,

 sorry I seemed to miss the original email on this thread. As Aaron
 said, when computing tokens for multiple data centers, you should
 compute them independently for each data center - as if it were its
 own Cassandra cluster.

 You can have overlapping token ranges between multiple data centers,
 but no two nodes can have the same token, so for subsequent data
 centers I just increment the tokens.

 For two data centers with two nodes each using RandomPartitioner
 calculate the tokens for the first DC normally, but int he second data
 center, increment the tokens by one.

 In DC 1
 node 1 = 0
 node 2 = 85070591730234615865843651857942052864

 In DC 2
 node 1 = 1
 node 2 = 85070591730234615865843651857942052865

 For RowMutations this will give each data center a local set of nodes
 that it can write to for complete coverage of the entire token space.
 If you are using NetworkTopologyStrategy for replication, it will give
 an offset mirror replication between the two data centers so that your
 replicas will not get pinned to a node in the remote DC. There are
 other ways to select the tokens, but the increment method is the
 simplest to manage and continue to grow with.

 Hope that helps.

 -Eric

Re: Docs: Token Selection

2011-06-16 Thread Eric tamme

On Thu, Jun 16, 2011 at 11:11 AM, Sasha Dolgy sdo...@gmail.com wrote:
 So, with ec2 ... 3 regions (DC's), each one is +1 from another?

I dont use ec2, so I am not familiar with the specifics of deployment
there.  That said, if you have 3 data centers with equal nodes in each
(so that you would calculate the same tokens for each DC) - the first
DC you would add 0, the second DC you would add 1, the third DC you
would add 2.

so it could look like the following

In DC 1
node 1 = 0
node 2 = 85070591730234615865843651857942052864

In DC 2
node 1 = 1
node 2 =  85070591730234615865843651857942052865

In DC 3
node 1 = 2
node 2 =  85070591730234615865843651857942052866

Keep in mind, the only reason you need to offset tokens is if there is
another node that would have the exact same token.  So if you have
different numbers of nodes in different data centers, it is possible
you wont need any token offsets.  Just calculate tokens normally, as
if the DC was the only one, then check for any node in another DC with
the same token and add +1 to offset the token.

-Eric

Unable to access column family in CLI after building CF in CQL

2011-06-16 Thread yikes bigdata

Hi,

I was following the CQL example on the DataStax website and was able to
create a new column family and query it. But when I viewed the column family
in the CLI, it gives me the following error.

# Unable to read column family created from CQL

[default@store] list users2;
*users2 not found in current keyspace.*

Also, when I try to query the user table from CQL, i'm unable to filter on a
key.  The user table was created in the CLI but accessible by CQL with a
simple select * from users;

cqlsh select * from users where key='tyler';
*Bad Request: cannot parse 'tyler' as hex bytes*





# In the CLI, the store keyspaces displays two column families .

[default@store] show keyspaces;
Keyspace: store:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Options: [replication_factor:1]
  Column Families:
*ColumnFamily: users*
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator:
org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 0.267187497/57/1440 (millions of
ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: false
  Built indexes: []
  Column Metadata:
Column Name: email
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Column Name: userName
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
*ColumnFamily: users2*
  Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Default column value validator:
org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 0.267187497/57/1440 (millions of
ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: []
  Column Metadata:
Column Name: session_token
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Column Name: state
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Column Name: password
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Column Name: birth_year
  Validation Class: org.apache.cassandra.db.marshal.LongType
Column Name: gender
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Keyspace: system:

Able to see the list of keys generate within the CLI

[default@store] list users;
Using default limit of 100
---
RowKey: foo
= (column=age, value=3339, timestamp=1308182349595000)
= (column=email, value=f...@email.com, timestamp=1308182349594000)
= (column=userName, value=foo, timestamp=1308182349591000)
---
RowKey: bar
= (column=email, value=b...@email.com, timestamp=1308182355297000)
= (column=gender, value=66, timestamp=1308182355299000)
= (column=userName, value=bar, timestamp=1308182355295000)
---
RowKey: tyler
= (column=email, value=ty...@email.com, timestamp=1308182355303000)
= (column=sports, value=6261736562616c6c, timestamp=1308182355309000)
= (column=userName, value=tyler, timestamp=1308182355302000)

Re: Upgrading Cassandra cluster from 0.6.3 to 0.7.5

2011-06-16 Thread Jonathan Ellis

Read NEWS.txt.

0.7.6 is better than 0.7.5, btw.

On Thu, Jun 16, 2011 at 5:03 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote:
 Hi All,

 We are upgrading cassandra from 0.6.3 to 0.7.5.We have two node in cluster.I
 am bit confused how to upgrade them can you have any guide.

 --
 S.Ali Ahsan

 Senior System Engineer

 e-Business (Pvt) Ltd

 49-C Jail Road, Lahore, P.O. Box 676
 Lahore 54000, Pakistan

 Tel: +92 (0)42 3758 7140 Ext. 128

 Mobile: +92 (0)345 831 8769

 Fax: +92 (0)42 3758 0027

 Email: ali.ah...@panasiangroup.com



 www.ebusiness-pg.com

 www.panasiangroup.com

 Confidentiality: This e-mail and any attachments may be confidential
 and/or privileged. If you are not a named recipient, please notify the
 sender immediately and do not disclose the contents to another person
 use it for any purpose or store or copy the information in any medium.
 Internet communications cannot be guaranteed to be timely, secure, error
 or virus-free. We do not accept liability for any errors or omissions.





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Unable to access column family in CLI after building CF in CQL

2011-06-16 Thread Jonathan Ellis

If you create CFs outside the cli, you may need to restart it to
refresh its internal cache of the schema.

On Thu, Jun 16, 2011 at 8:51 AM, yikes bigdata yikes.bigd...@gmail.com wrote:
 Hi,
 I was following the CQL example on the DataStax website and was able to
 create a new column family and query it. But when I viewed the column family
 in the CLI, it gives me the following error.
 # Unable to read column family created from CQL
 [default@store] list users2;
 users2 not found in current keyspace.
 Also, when I try to query the user table from CQL, i'm unable to filter on a
 key.  The user table was created in the CLI but accessible by CQL with a
 simple select * from users;
 cqlsh select * from users where key='tyler';
 Bad Request: cannot parse 'tyler' as hex bytes




 # In the CLI, the store keyspaces displays two column families .
 [default@store] show keyspaces;
 Keyspace: store:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
     Options: [replication_factor:1]
   Column Families:
     ColumnFamily: users
       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
       Default column value validator:
 org.apache.cassandra.db.marshal.BytesType
       Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
       Row cache size / save period in seconds: 0.0/0
       Key cache size / save period in seconds: 20.0/14400
       Memtable thresholds: 0.267187497/57/1440 (millions of
 ops/MB/minutes)
       GC grace seconds: 864000
       Compaction min/max thresholds: 4/32
       Read repair chance: 1.0
       Replicate on write: false
       Built indexes: []
       Column Metadata:
         Column Name: email
           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
         Column Name: userName
           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
     ColumnFamily: users2
       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
       Default column value validator:
 org.apache.cassandra.db.marshal.UTF8Type
       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
       Row cache size / save period in seconds: 0.0/0
       Key cache size / save period in seconds: 20.0/14400
       Memtable thresholds: 0.267187497/57/1440 (millions of
 ops/MB/minutes)
       GC grace seconds: 864000
       Compaction min/max thresholds: 4/32
       Read repair chance: 1.0
       Replicate on write: true
       Built indexes: []
       Column Metadata:
         Column Name: session_token
           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
         Column Name: state
           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
         Column Name: password
           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
         Column Name: birth_year
           Validation Class: org.apache.cassandra.db.marshal.LongType
         Column Name: gender
           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Keyspace: system:
 Able to see the list of keys generate within the CLI
 [default@store] list users;
 Using default limit of 100
 ---
 RowKey: foo
 = (column=age, value=3339, timestamp=1308182349595000)
 = (column=email, value=f...@email.com, timestamp=1308182349594000)
 = (column=userName, value=foo, timestamp=1308182349591000)
 ---
 RowKey: bar
 = (column=email, value=b...@email.com, timestamp=1308182355297000)
 = (column=gender, value=66, timestamp=1308182355299000)
 = (column=userName, value=bar, timestamp=1308182355295000)
 ---
 RowKey: tyler
 = (column=email, value=ty...@email.com, timestamp=1308182355303000)
 = (column=sports, value=6261736562616c6c, timestamp=1308182355309000)
 = (column=userName, value=tyler, timestamp=1308182355302000)




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Unable to access column family in CLI after building CF in CQL

2011-06-16 Thread Konstantin Naryshkin



The second error (the CQL select) is because you have different Key Validation 
Class values for your two user columns. users is 
org.apache.cassandra.db.marshal.BytesType, while users2 is 
org.apache.cassandra.db.marshal.UTF8Type. The select is failing because you are 
comparing a String to a bunch of bytes. 

- Original Message -
From: yikes bigdata yikes.bigd...@gmail.com 
To: user@cassandra.apache.org 
Sent: Thursday, June 16, 2011 3:51:41 PM 
Subject: Unable to access column family in CLI after building CF in CQL 

Hi, 


I was following the CQL example on the DataStax website and was able to create 
a new column family and query it. But when I viewed the column family in the 
CLI, it gives me the following error. 


# Unable to read column family created from CQL 


[default@store] list users2; 
users2 not found in current keyspace. 


Also, when I try to query the user table from CQL, i'm unable to filter on a 
key. The user table was created in the CLI but accessible by CQL with a simple 
select * from users; 



cqlsh select * from users where key='tyler'; 
Bad Request: cannot parse 'tyler' as hex bytes 










# In the CLI, the store keyspaces displays two column families . 


[default@store] show keyspaces; 
Keyspace: store: 
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy 
Options: [replication_factor:1] 
Column Families: 
ColumnFamily: users 
Key Validation Class: org.apache.cassandra.db.marshal.BytesType 
Default column value validator: org.apache.cassandra.db.marshal.BytesType 
Columns sorted by: org.apache.cassandra.db.marshal.AsciiType 
Row cache size / save period in seconds: 0.0/0 
Key cache size / save period in seconds: 20.0/14400 
Memtable thresholds: 0.267187497/57/1440 (millions of ops/MB/minutes) 
GC grace seconds: 864000 
Compaction min/max thresholds: 4/32 
Read repair chance: 1.0 
Replicate on write: false 
Built indexes: [] 
Column Metadata: 
Column Name: email 
Validation Class: org.apache.cassandra.db.marshal.UTF8Type 
Column Name: userName 
Validation Class: org.apache.cassandra.db.marshal.UTF8Type 
ColumnFamily: users2 
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type 
Default column value validator: org.apache.cassandra.db.marshal.UTF8Type 
Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type 
Row cache size / save period in seconds: 0.0/0 
Key cache size / save period in seconds: 20.0/14400 
Memtable thresholds: 0.267187497/57/1440 (millions of ops/MB/minutes) 
GC grace seconds: 864000 
Compaction min/max thresholds: 4/32 
Read repair chance: 1.0 
Replicate on write: true 
Built indexes: [] 
Column Metadata: 
Column Name: session_token 
Validation Class: org.apache.cassandra.db.marshal.UTF8Type 
Column Name: state 
Validation Class: org.apache.cassandra.db.marshal.UTF8Type 
Column Name: password 
Validation Class: org.apache.cassandra.db.marshal.UTF8Type 
Column Name: birth_year 
Validation Class: org.apache.cassandra.db.marshal.LongType 
Column Name: gender 
Validation Class: org.apache.cassandra.db.marshal.UTF8Type 
Keyspace: system: 


Able to see the list of keys generate within the CLI 



[default@store] list users; 
Using default limit of 100 
--- 
RowKey: foo 
= (column=age, value=3339, timestamp=1308182349595000) 
= (column=email, value= f...@email.com , timestamp=1308182349594000) 
= (column=userName, value=foo, timestamp=1308182349591000) 
--- 
RowKey: bar 
= (column=email, value= b...@email.com , timestamp=1308182355297000) 
= (column=gender, value=66, timestamp=1308182355299000) 
= (column=userName, value=bar, timestamp=1308182355295000) 
--- 
RowKey: tyler 
= (column=email, value= ty...@email.com , timestamp=1308182355303000) 
= (column=sports, value=6261736562616c6c, timestamp=1308182355309000) 
= (column=userName, value=tyler, timestamp=1308182355302000)

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

On Thu, Jun 16, 2011 at 8:18 AM, AJ a...@dude.podzone.net wrote:
 Good morning all.

 Hypothetical Setup:
 1 data center
 RF = 3
 Total nodes  3

 Problem:
 Suppose I need maximum consistency for one critical operation; thus I
 specify CL = ALL for reads.  However, this will fail if only 1 replica
 endpoint is down.  I don't see why this fail is necessary all of the time
 since the data could have been updated since the node became unavailable and
 it's data is old anyways.  If only one node goes down and it has the key I
 need, then the app is not 100% available and it could take some time making
 the node available again.

 Proposal:
 If all of the *available* replica nodes answer the read operation and the
 latest value timestamp is clearly AFTER the time the down node became
 unavailable, then this situation can meet the requirements for *near* 100%
 consistency since the value in the down node would be outdated anyway.
  Clearly, the value was updated some time *after* the node went down or
 unavailable.  This way, you can have max availability when using read with
 CL.ALL... or something CL close in meaning to ALL.

 I say near 100% consistency to leave room for some situation where the
 unavailable node was only unavailable to the coordinating node for some
 reason such as a network issue and thus still received an update by some
 other route after it appeared unavailable to the current coordinating
 node.  In a situation like this, there is a chance the read will still not
 return the latest value.  So, this will not be truly 100% consistent which
 CL.ALL guarantees.  However, I think this logic could justify a new
 consistency level slightly lower than ALL, such as ALL_AVAIL.

 What do you think?  Is my logic correct?  Is there a conflict with the
 architecture or base principles?  This fits with the tunable consistency
 principle for sure.

I don't think this buys you anything that you can't get with quorum
reads and writes.

-ryan

Re: snitch thrift

On Thu, Jun 16, 2011 at 6:11 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
 Hi all!
 Assuming a node ends up in GC land for a while, there is a good chance that
 even though it performs terribly and the dynamic snitching will help you to
 avoid it on the gossip side, it will not really help you much if thrift
 still accepts requests and the thrift interface has choppy performance.
 This makes me wonder if it is a potential idea with thrift only client mode
 nodes.

Those could GC too, albeit to a lesser degree.

 I don't think I have seen that this exists today (or is it possible that I
 have missed a way to configure that?), but it does not seem like a very hard
 thing to make and could maybe be good in some usage patterns for the
 datanode as well as the thrift side.

It might be sometimes useful, but we can't really know without running
some tests.

-ryan

Re: Unable to access column family in CLI after building CF in CQL

2011-06-16 Thread yikes bigdata

Ah that works.

Thanks everyone for the help.



On Thu, Jun 16, 2011 at 9:04 AM, Konstantin Naryshkin
konstant...@a-bb.netwrote:

 The second error (the CQL select) is because you have different Key
 Validation Class values for your two user columns. users is 
 org.apache.cassandra.db.marshal.BytesType,
 while users2 is org.apache.cassandra.db.marshal.UTF8Type. The select is
 failing because you are comparing a String to a bunch of bytes.

 --
 *From: *yikes bigdata yikes.bigd...@gmail.com
 *To: *user@cassandra.apache.org
 *Sent: *Thursday, June 16, 2011 3:51:41 PM
 *Subject: *Unable to access column family in CLI after building CF in CQL


 Hi,

 I was following the CQL example on the DataStax website and was able to
 create a new column family and query it. But when I viewed the column family
 in the CLI, it gives me the following error.

 # Unable to read column family created from CQL

 [default@store] list users2;
 *users2 not found in current keyspace.*

 Also, when I try to query the user table from CQL, i'm unable to filter on
 a key.  The user table was created in the CLI but accessible by CQL with a
 simple select * from users;

 cqlsh select * from users where key='tyler';
 *Bad Request: cannot parse 'tyler' as hex bytes*





 # In the CLI, the store keyspaces displays two column families .

 [default@store] show keyspaces;
 Keyspace: store:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
 Options: [replication_factor:1]
   Column Families:
 *ColumnFamily: users*
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   Default column value validator:
 org.apache.cassandra.db.marshal.BytesType
   Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
   Row cache size / save period in seconds: 0.0/0
   Key cache size / save period in seconds: 20.0/14400
   Memtable thresholds: 0.267187497/57/1440 (millions of
 ops/MB/minutes)
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Replicate on write: false
   Built indexes: []
   Column Metadata:
 Column Name: email
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Column Name: userName
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 *ColumnFamily: users2*
   Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Default column value validator:
 org.apache.cassandra.db.marshal.UTF8Type
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   Row cache size / save period in seconds: 0.0/0
   Key cache size / save period in seconds: 20.0/14400
   Memtable thresholds: 0.267187497/57/1440 (millions of
 ops/MB/minutes)
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Replicate on write: true
   Built indexes: []
   Column Metadata:
 Column Name: session_token
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Column Name: state
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Column Name: password
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Column Name: birth_year
   Validation Class: org.apache.cassandra.db.marshal.LongType
 Column Name: gender
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Keyspace: system:

 Able to see the list of keys generate within the CLI

 [default@store] list users;
 Using default limit of 100
 ---
 RowKey: foo
 = (column=age, value=3339, timestamp=1308182349595000)
 = (column=email, value=f...@email.com, timestamp=1308182349594000)
 = (column=userName, value=foo, timestamp=1308182349591000)
 ---
 RowKey: bar
 = (column=email, value=b...@email.com, timestamp=1308182355297000)
 = (column=gender, value=66, timestamp=1308182355299000)
 = (column=userName, value=bar, timestamp=1308182355295000)
 ---
 RowKey: tyler
 = (column=email, value=ty...@email.com, timestamp=1308182355303000)
 = (column=sports, value=6261736562616c6c, timestamp=1308182355309000)
 = (column=userName, value=tyler, timestamp=1308182355302000)

Re: Cassandra Statistics and Metrics

2011-06-16 Thread Viktor Jevdokimov

There's possibility to use command line JMX client with standard Zabbix
agent to request JMX counters without incorporating zapcat into Cassandra or
another Java app.
I'm investigating this feature right now, will post results when finish.

2011/6/15 Viktor Jevdokimov vjevdoki...@gmail.com

 http://www.kjkoster.org/zapcat/Zapcat_JMX_Zabbix_Bridge.html

 2011/6/14 Marcos Ortiz mlor...@uci.cu

  Where I can find the source code?

 El 6/14/2011 10:13 AM, Viktor Jevdokimov escribió:

 We're using open source monitoring solution Zabbix from
 http://www.zabbix.com/ using zapcat - not only for Cassandra but for the
 whole system.

  As MX4J tools plugin is supported by Cassandra, support of zapcat in
 Cassandra by default is welcome - we have to use a wrapper to start zapcat
 agent.

 2011/6/14 Marcos Ortiz mlor...@uci.cu

 Regards to all.
 My team and me here on the University are working on a generic solution
 for Monitoring and Capacity Planning for Open Sources Databases, and one of
 the NoSQL db that we choosed to give it support is Cassandra.
 Where I can find all the metrics and statistics of Cassandra? I'm
 thinking for example:
 - Available space
 - Number of CF
 and all kind of metrics

 We are using for this development: Python + Django + Twisted + Orbited +
 jQuery. The idea behind is to build a Comet-based web application on top of
 these technologies.
 Any advice is welcome

 --
 Marcos Luís Ortíz Valmaseda
  Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://twitter.com/marcosluis2186




 --
 Marcos Luís Ortíz Valmaseda
  Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://twitter.com/marcosluis2186

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads


On 6/16/2011 10:05 AM, Ryan King wrote:


I don't think this buys you anything that you can't get with quorum
reads and writes.

-ryan



QUORUM = ALL_AVAIL = ALL == RF

RE: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Dan Hendry

I think this would add a lot of complexity behind the scenes and be 
conceptually confusing, particularly for new users. The Cassandra consistency 
model is pretty elegant and this type of approach breaks that elegance in many 
ways. It would also only really be useful when the value has a high probability 
of being updated between a node going down and the value being read. Perhaps 
the simpler approach which is fairly trivial and does not require any Cassandra 
change is to simply downgrade your read from ALL to QUORUM when you get an 
unavailable exception for this particular read.

I think the general answerer for 'maximum consistency' is QUORUM reads/writes. 
Based on the fact you are using CL=ALL for reads I assume you are using CL=ONE 
for writes: this itself strikes me as a bad idea if you require 'maximum 
consistency for one critical operation'.

Dan


-Original Message-
From: Ryan King [mailto:r...@twitter.com] 
Sent: June-16-11 12:05
To: user@cassandra.apache.org
Subject: Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

On Thu, Jun 16, 2011 at 8:18 AM, AJ a...@dude.podzone.net wrote:
 Good morning all.

 Hypothetical Setup:
 1 data center
 RF = 3
 Total nodes  3

 Problem:
 Suppose I need maximum consistency for one critical operation; thus I
 specify CL = ALL for reads.  However, this will fail if only 1 replica
 endpoint is down.  I don't see why this fail is necessary all of the time
 since the data could have been updated since the node became unavailable and
 it's data is old anyways.  If only one node goes down and it has the key I
 need, then the app is not 100% available and it could take some time making
 the node available again.

 Proposal:
 If all of the *available* replica nodes answer the read operation and the
 latest value timestamp is clearly AFTER the time the down node became
 unavailable, then this situation can meet the requirements for *near* 100%
 consistency since the value in the down node would be outdated anyway.
  Clearly, the value was updated some time *after* the node went down or
 unavailable.  This way, you can have max availability when using read with
 CL.ALL... or something CL close in meaning to ALL.

 I say near 100% consistency to leave room for some situation where the
 unavailable node was only unavailable to the coordinating node for some
 reason such as a network issue and thus still received an update by some
 other route after it appeared unavailable to the current coordinating
 node.  In a situation like this, there is a chance the read will still not
 return the latest value.  So, this will not be truly 100% consistent which
 CL.ALL guarantees.  However, I think this logic could justify a new
 consistency level slightly lower than ALL, such as ALL_AVAIL.

 What do you think?  Is my logic correct?  Is there a conflict with the
 architecture or base principles?  This fits with the tunable consistency
 principle for sure.

I don't think this buys you anything that you can't get with quorum
reads and writes.

-ryan
No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.901 / Virus Database: 271.1.1/3707 - Release Date: 06/16/11 
02:34:00

Re: Cassandra Statistics and Metrics

2011-06-16 Thread Héctor Izquierdo Seliva

This is what I use:

http://code.google.com/p/simple-cassandra-monitoring/

Disclaimer: I did it myself, don't expect too much :P

El jue, 16-06-2011 a las 19:35 +0300, Viktor Jevdokimov escribió:
 There's possibility to use command line JMX client with standard
 Zabbix agent to request JMX counters without incorporating zapcat into
 Cassandra or another Java app.
 I'm investigating this feature right now, will post results when
 finish.
 
 2011/6/15 Viktor Jevdokimov vjevdoki...@gmail.com
 http://www.kjkoster.org/zapcat/Zapcat_JMX_Zabbix_Bridge.html
 
 
 
 2011/6/14 Marcos Ortiz mlor...@uci.cu
 Where I can find the source code?
 
 El 6/14/2011 10:13 AM, Viktor Jevdokimov escribió: 
 
  We're using open source monitoring solution Zabbix
  from http://www.zabbix.com/ using zapcat - not only
  for Cassandra but for the whole system. 
  
  
  As MX4J tools plugin is supported by Cassandra,
  support of zapcat in Cassandra by default is welcome
  - we have to use a wrapper to start zapcat agent.
  
  2011/6/14 Marcos Ortiz mlor...@uci.cu
  Regards to all.
  My team and me here on the University are
  working on a generic solution for Monitoring
  and Capacity Planning for Open Sources
  Databases, and one of the NoSQL db that we
  choosed to give it support is Cassandra.
  Where I can find all the metrics and
  statistics of Cassandra? I'm thinking for
  example:
  - Available space
  - Number of CF
  and all kind of metrics
  
  We are using for this development: Python +
  Django + Twisted + Orbited + jQuery. The
  idea behind is to build a Comet-based web
  application on top of these technologies.
  Any advice is welcome
  
  -- 
  Marcos Luís Ortíz Valmaseda
   Software Engineer (UCI)
   http://marcosluis2186.posterous.com
   http://twitter.com/marcosluis2186
   
  
  
 
 -- 
 Marcos Luís Ortíz Valmaseda
  Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://twitter.com/marcosluis2186

Re: snitch thrift

2011-06-16 Thread Jonathan Ellis

Seems like a more robust solution would be to implement
dynamic-snitch-like behavior in the client.  Hector has done this for
a few months now.
https://github.com/rantav/hector/blob/master/core/src/main/java/me/prettyprint/cassandra/connection/DynamicLoadBalancingPolicy.java

On Thu, Jun 16, 2011 at 9:12 AM, Ryan King r...@twitter.com wrote:
 On Thu, Jun 16, 2011 at 6:11 AM, Terje Marthinussen
 tmarthinus...@gmail.com wrote:
 Hi all!
 Assuming a node ends up in GC land for a while, there is a good chance that
 even though it performs terribly and the dynamic snitching will help you to
 avoid it on the gossip side, it will not really help you much if thrift
 still accepts requests and the thrift interface has choppy performance.
 This makes me wonder if it is a potential idea with thrift only client mode
 nodes.

 Those could GC too, albeit to a lesser degree.

 I don't think I have seen that this exists today (or is it possible that I
 have missed a way to configure that?), but it does not seem like a very hard
 thing to make and could maybe be good in some usage patterns for the
 datanode as well as the thrift side.

 It might be sometimes useful, but we can't really know without running
 some tests.

 -ryan




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: need some help with counters

2011-06-16 Thread Ian Holsman


On Jun 13, 2011, at 5:10 AM, aaron morton wrote:

 I am wondering how to index on the most recent hour as well. (ie show me top 
 5 URLs type query).. 
 
 AFAIK thats not a great application for counters. You would need range 
 support in the secondary indexes so you could get the first X rows ordered by 
 a column value. 
 
 To be honest, depending on scale, I'd consider a sorted set in redis for 
 that. 

It does.
Thanks Aaron.

 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 11 Jun 2011, at 00:36, Ian Holsman wrote:
 
 
 On Jun 9, 2011, at 10:04 PM, aaron morton wrote:
 
 I may be missing something but could you use a column for each of the last 
 48 hours all in the same row for a url ?
 
 e.g. 
 {
 /url.com/hourly : {
 20110609T01:00:00 : 456,
 20110609T02:00:00 : 4567,
 }
 }
 
 yes.. that would work better... I was storing all the different times in the 
 same row.
 {
  /url.com : {
   H-20110609T01:00:00 : 456,
   H-0110609T02:00:00 : 4567,
   D-0110609 : 5678,
  }
 }
 
 I am wondering how to index on the most recent hour as well. (ie show me top 
 5 URLs type query).. 
 
 
 Increment the current hour only. Delete the older columns either when a 
 read detects there are old values or as a maintenance job. Or as part of 
 writing values for the first 5 minutes of any hour. 
 
 yes.. I thought of that. The problem with doing it on read is there may be a 
 case where a old URL never gets read.. so it will just sit there taking up 
 space.. the maintenance job is the route I went down.
 
 
 The row will get spread out over a lot of sstables which may reduce read 
 speed. If this is a problem consider a separate CF with more aggressive GC 
 and compaction settings. 
 
 Thanks!
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 10 Jun 2011, at 09:28, Ian Holsman wrote:
 
 So would doing something like storing it in reverse (so I know what to 
 delete) work? Or is storing a million columns in a supercolumn impossible. 
 
 I could always use a logfile and run the archiver off that as a worst case 
 I guess. 
 Would doing so many deletes screw up the db/cause other problems?
 
 ---
 Ian Holsman - 703 879-3128
 
 I saw the angel in the marble and carved until I set him free -- 
 Michelangelo
 
 On 09/06/2011, at 4:22 PM, Ryan King r...@twitter.com wrote:
 
 On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman had...@holsman.net wrote:
 Hi Ryan.
 you wouldn't have your version of cassandra up on github would you??
 
 No, and the patch isn't in our version yet either. We're still working on 
 it.
 
 -ryan

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads


On 6/16/2011 10:58 AM, Dan Hendry wrote:

I think this would add a lot of complexity behind the scenes and be 
conceptually confusing, particularly for new users.
I'm not so sure about this.  Cass is already somewhat sophisticated and 
I don't see how this could trip-up anyone who can already grasp the 
basics.  The only thing I am adding to the CL concept is the concept of 
available replication nodes, versus total replication nodes.  But, don't 
forget; a competitor to Cass is probably in the works this very minute 
so constant improvement is a good thing.

The Cassandra consistency model is pretty elegant and this type of approach 
breaks that elegance in many ways. It would also only really be useful when the 
value has a high probability of being updated between a node going down and the 
value being read.
I'm not sure what you mean.  A node can be down for days during which 
time the value can be updated.  The intention is to use the nodes 
available even if they fall below the RF.  If there is only 1 node 
available for accepting a replica, that should be enough given the 
conditions I stated and updated below.

Perhaps the simpler approach which is fairly trivial and does not require any 
Cassandra change is to simply downgrade your read from ALL to QUORUM when you 
get an unavailable exception for this particular read.
It's not so trivial, esp since you would have to build that into your 
client at many levels.  I think it would be more appropriate (if this 
idea survives) to put it into Cass.

I think the general answerer for 'maximum consistency' is QUORUM reads/writes. 
Based on the fact you are using CL=ALL for reads I assume you are using CL=ONE 
for writes: this itself strikes me as a bad idea if you require 'maximum 
consistency for one critical operation'.

Very true.  Specifying quorum for BOTH reads/writes provides the 100% 
consistency because of the overlapping of the availability numbers.  
But, only if the # of available nodes is not  RF.


Upon further reflection, this idea can be used for any consistency 
level.  The general thrust of my argument is:  If a particular value can 
be overwritten by one process regardless of it's prior value, then that 
implies that the value in the down node is no longer up-to-date and can 
be disregarded.  Just work with the nodes that are available.


Actually, now that I think about it...

ALL_AVAIL guarantees 100% consistency iff the latest timestamp of the 
value  latest unavailability time of all unavailable replica nodes for 
that value's row key.  Unavailable is defined as a node's Cass process 
that is not reachable from ANY node in the cluster in the same data 
center.  If the node in question is available to at least one node, then 
the read should fail as there is a possibility that the value could have 
been updated some other way.


After looking at the code, it doesn't look like it will be difficult.  
Instead of skipping the request for values from the nodes when CL nodes 
aren't available, it would have to go ahead and request the values from 
the available nodes as usual and then look at the timestamps which it 
does anyways and compare it to the latest unavailability time of the 
relevant replica nodes.  The code that keeps track of what nodes are 
down simply records the time it went down.  But, I've only been looking 
at the code for a few days so I'm not claiming to know everything by any 
stretch.



Dan



Thanks for your reply.  I still welcome critiques.

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

On Thu, Jun 16, 2011 at 1:05 PM, AJ a...@dude.podzone.net wrote:
 On 6/16/2011 10:58 AM, Dan Hendry wrote:

 I think this would add a lot of complexity behind the scenes and be
 conceptually confusing, particularly for new users.

 I'm not so sure about this.  Cass is already somewhat sophisticated and I
 don't see how this could trip-up anyone who can already grasp the basics.
  The only thing I am adding to the CL concept is the concept of available
 replication nodes, versus total replication nodes.  But, don't forget; a
 competitor to Cass is probably in the works this very minute so constant
 improvement is a good thing.

There are already many competitors.

 The Cassandra consistency model is pretty elegant and this type of
 approach breaks that elegance in many ways. It would also only really be
 useful when the value has a high probability of being updated between a node
 going down and the value being read.

 I'm not sure what you mean.  A node can be down for days during which time
 the value can be updated.  The intention is to use the nodes available even
 if they fall below the RF.  If there is only 1 node available for accepting
 a replica, that should be enough given the conditions I stated and updated
 below.

If this is your constraint, then you should just use CL.ONE.

 Perhaps the simpler approach which is fairly trivial and does not require
 any Cassandra change is to simply downgrade your read from ALL to QUORUM
 when you get an unavailable exception for this particular read.

 It's not so trivial, esp since you would have to build that into your client
 at many levels.  I think it would be more appropriate (if this idea
 survives) to put it into Cass.

 I think the general answerer for 'maximum consistency' is QUORUM
 reads/writes. Based on the fact you are using CL=ALL for reads I assume you
 are using CL=ONE for writes: this itself strikes me as a bad idea if you
 require 'maximum consistency for one critical operation'.

 Very true.  Specifying quorum for BOTH reads/writes provides the 100%
 consistency because of the overlapping of the availability numbers.  But,
 only if the # of available nodes is not  RF.

No, it will work as long as the available nodes is = RF/2 + 1

 Upon further reflection, this idea can be used for any consistency level.
  The general thrust of my argument is:  If a particular value can be
 overwritten by one process regardless of it's prior value, then that implies
 that the value in the down node is no longer up-to-date and can be
 disregarded.  Just work with the nodes that are available.

 Actually, now that I think about it...

 ALL_AVAIL guarantees 100% consistency iff the latest timestamp of the value
 latest unavailability time of all unavailable replica nodes for that
 value's row key.  Unavailable is defined as a node's Cass process that is
 not reachable from ANY node in the cluster in the same data center.  If the
 node in question is available to at least one node, then the read should
 fail as there is a possibility that the value could have been updated some
 other way.

Node A can't reliably and consistently know  whether node B and node C
can communicate.

 After looking at the code, it doesn't look like it will be difficult.
  Instead of skipping the request for values from the nodes when CL nodes
 aren't available, it would have to go ahead and request the values from the
 available nodes as usual and then look at the timestamps which it does
 anyways and compare it to the latest unavailability time of the relevant
 replica nodes.  The code that keeps track of what nodes are down simply
 records the time it went down.  But, I've only been looking at the code for
 a few days so I'm not claiming to know everything by any stretch.

-ryan

Visiting Auckland

So long as the Volcanic Ash stays away I'll be visiting Auckland next week on 
the 23rd and 24th. 

Drop me an email if you would like to meet to talk about things Cassandra. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads


On 6/16/2011 2:37 PM, Ryan King wrote:

On Thu, Jun 16, 2011 at 1:05 PM, AJa...@dude.podzone.net  wrote:



snip

The Cassandra consistency model is pretty elegant and this type of
approach breaks that elegance in many ways. It would also only really be
useful when the value has a high probability of being updated between a node
going down and the value being read.

I'm not sure what you mean.  A node can be down for days during which time
the value can be updated.  The intention is to use the nodes available even
if they fall below the RF.  If there is only 1 node available for accepting
a replica, that should be enough given the conditions I stated and updated
below.

If this is your constraint, then you should just use CL.ONE.


My constraint is a CL = All Available.  So, CL.ONE will not work.

Perhaps the simpler approach which is fairly trivial and does not require
any Cassandra change is to simply downgrade your read from ALL to QUORUM
when you get an unavailable exception for this particular read.

It's not so trivial, esp since you would have to build that into your client
at many levels.  I think it would be more appropriate (if this idea
survives) to put it into Cass.

I think the general answerer for 'maximum consistency' is QUORUM
reads/writes. Based on the fact you are using CL=ALL for reads I assume you
are using CL=ONE for writes: this itself strikes me as a bad idea if you
require 'maximum consistency for one critical operation'.


Very true.  Specifying quorum for BOTH reads/writes provides the 100%
consistency because of the overlapping of the availability numbers.  But,
only if the # of available nodes is not  RF.

No, it will work as long as the available nodes is= RF/2 + 1
Yes, that's what I meant.  Sorry for any confusion.  Restated: But, only 
if the # of available nodes is not  RF/2 + 1.

Upon further reflection, this idea can be used for any consistency level.
  The general thrust of my argument is:  If a particular value can be
overwritten by one process regardless of it's prior value, then that implies
that the value in the down node is no longer up-to-date and can be
disregarded.  Just work with the nodes that are available.

Actually, now that I think about it...

ALL_AVAIL guarantees 100% consistency iff the latest timestamp of the value

latest unavailability time of all unavailable replica nodes for that

value's row key.  Unavailable is defined as a node's Cass process that is
not reachable from ANY node in the cluster in the same data center.  If the
node in question is available to at least one node, then the read should
fail as there is a possibility that the value could have been updated some
other way.

Node A can't reliably and consistently know  whether node B and node C
can communicate.
Well, theoretically, of course; that's the nature of distributed 
systems.  But, Cass does indeed make that determination when it counts 
the number available replica nodes before it decides if enough replica 
nodes are available.  But, this is obvious to you I'm sure so maybe I 
don't understand your statement.

After looking at the code, it doesn't look like it will be difficult.
  Instead of skipping the request for values from the nodes when CL nodes
aren't available, it would have to go ahead and request the values from the
available nodes as usual and then look at the timestamps which it does
anyways and compare it to the latest unavailability time of the relevant
replica nodes.  The code that keeps track of what nodes are down simply
records the time it went down.  But, I've only been looking at the code for
a few days so I'm not claiming to know everything by any stretch.

-ryan

Re: Force a node to form part of quorum

2011-06-16 Thread A J

It would be great if Cassandra puts this on their roadmap. There is
lot of durability benefits by incorporating dc awareness into the
write consistency equation.

MongoDB has this feature in their upcoming release:
http://www.mongodb.org/display/DOCS/Data+Center+Awareness#DataCenterAwareness-Tagging%28version1.9.1%29



On Thu, Jun 16, 2011 at 6:57 AM, aaron morton aa...@thelastpickle.com wrote:
 Short answer:  No.
 Medium answer: No all nodes are equal. It could create a single point of
 failure if a QUOURM could not be formed without a specific node.
 Writes are sent to every replica. Reads with Read Repair enabled are also
 sent to every replica. For reads the closest UP node as determined by the
 snitch and possibly re-ordered by the Dynamic Snitch  is asked to return the
 actual data. This replica must respond for the request to complete.
 If it's a question about maximising cache hits
 see https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L308
 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 16 Jun 2011, at 05:58, A J wrote:

 Is there a way to favor a node to always participate (or never
 participate) towards fulfillment of read consistency as well as write
 consistency ?

 Thanks
 AJ

Re: Force a node to form part of quorum

2011-06-16 Thread Peter Schuller

 It would be great if Cassandra puts this on their roadmap. There is
 lot of durability benefits by incorporating dc awareness into the
 write consistency equation.

You may be interested in the discussion here:

   https://issues.apache.org/jira/browse/CASSANDRA-2338

-- 
/ Peter Schuller

Re: Easy way to overload a single node on purpose?

2011-06-16 Thread Suan Aik Yeo

 Having a ping column can work if every key is replicated to every node. It
would tell you the cluster is working, sort of. Once the number of nodes is
greater than the RF, it tells you a subset of the nodes works.

The way our check works is that each node checks itself, so in this context
we're not concerned about whether the cluster is up, but that each
individual node is up.

So the symptoms I saw, the node actually going down etc, were probably due
to many different events happening at the time, and will be very hard to
recreate?

On Thu, Jun 16, 2011 at 6:16 AM, aaron morton aa...@thelastpickle.comwrote:

  DEBUG 14:36:55,546 ... timed out

 Is logged when the coordinator times out waiting for the replicas to
 respond, the timeout setting is rpc_timeout in the yaml file. This results
 in the client getting a TimedOutException.

 AFAIK There is no global everything is good / bad flags to check. e.g.
 AFAIK I node will not mark its self down if it runs out of disk space.  So
 you need to monitor the free disk space and alert on that.

 Having a ping column can work if every key is replicated to every node. It
 would tell you the cluster is working, sort of. Once the number of nodes is
 greater than the RF, it tells you a subset of the nodes works.

 If you google around you'll find discussions about monitoring with munin,
 ganglia, cloud kick and Ops Centre.

 If you install mx4j you can access the JMX metrics via HTTP,

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 16 Jun 2011, at 10:38, Suan Aik Yeo wrote:

  Here's a weird one... what's the best way to get a Cassandra node into a
 half-crashed state?
 
  We have a 3-node cluster running 0.7.5. A few days ago this happened
 organically to node1 - the partition the commitlog was on was 100% full and
 there was a No space left on device error, and after a while, although the
 cluster and node1 was still up, to the other nodes it was down, and messages
 like:
  DEBUG 14:36:55,546 ... timed out
  started to show up in its debug logs.
 
  We have a tool to indicate to the load balancer that a Cassandra node is
 down, but it didn't detect it that time. Now I'm having trouble purposefully
 getting the node back to that state, so that I can try other monitoring
 methods. I've tried to fill up the commitlog partition with other files, and
 although I get the No space left on device error, the node still doesn't
 go down and show the other symptoms it showed before.
 
  Also, if anyone could recommend a good way for a node itself to detect
 that its in such a state I'd be interested in that too. Currently what we're
 doing is making a describe_cluster_name() thrift call, but that still
 worked when the node was down. I'm thinking of something like
 reading/writing to a fixed value in a keyspace as a check... Unfortunately
 Java-based solutions are out of the question.
 
 
  Thanks,
  Suan

compression for regular column names?

2011-06-16 Thread E R

Hi all,

As a way of gaining familiarity with Cassandra I am migrating a table
that is currently stored in a relational database and mapping it into
a Cassandra column family. We add about 700,000 new rows a day to this
table, and the average disk space used per row is ~ 300 bytes
including indexes.

The mapping from table to column family is straight forward - there is
a one-one relationship between table columns and column family column
names. The relational table has 19 columns. The length of the names of
the columns is nearly 200 bytes whereas the average amount of data per
row is only 130 bytes.

Initially I used the identify map for this translation - i.e. my
Cassandra column names were the same as the relational column names. I
then found out I could save a lot of disk space by using single letter
column names instead of the original relational names. I.e. use 'L'
instead of 'LINK_IDENTIFIER' for a column name.

The procedure I use to determine space used is:

1. rm -rf the cassandra var-lib directory
2. start cassandra, create keyspace, column families, etc.
3. insert records
4. stop cassandra
5. re-start cassandra
6. measure disk space with du -s the cassandra var-lib directory

This seems to replace the commit logs with .db files.

My questions are:

1. Is this a common practice (i.e. making the client responsible for
shortening the column names) when dealing with a large number of fixed
column names and a high volume of inserts? Is there any way that
Cassandra can help out here?

2. Is there another way to transform the commit logs into .db files
without stopping and starting the server?

Thanks,
ER

Re: compression for regular column names?

On Thu, Jun 16, 2011 at 3:41 PM, E R pc88m...@gmail.com wrote:
 Hi all,

 As a way of gaining familiarity with Cassandra I am migrating a table
 that is currently stored in a relational database and mapping it into
 a Cassandra column family. We add about 700,000 new rows a day to this
 table, and the average disk space used per row is ~ 300 bytes
 including indexes.

 The mapping from table to column family is straight forward - there is
 a one-one relationship between table columns and column family column
 names. The relational table has 19 columns. The length of the names of
 the columns is nearly 200 bytes whereas the average amount of data per
 row is only 130 bytes.

 Initially I used the identify map for this translation - i.e. my
 Cassandra column names were the same as the relational column names. I
 then found out I could save a lot of disk space by using single letter
 column names instead of the original relational names. I.e. use 'L'
 instead of 'LINK_IDENTIFIER' for a column name.

 The procedure I use to determine space used is:

 1. rm -rf the cassandra var-lib directory
 2. start cassandra, create keyspace, column families, etc.
 3. insert records
 4. stop cassandra
 5. re-start cassandra
 6. measure disk space with du -s the cassandra var-lib directory

 This seems to replace the commit logs with .db files.

 My questions are:

 1. Is this a common practice (i.e. making the client responsible for
 shortening the column names) when dealing with a large number of fixed
 column names and a high volume of inserts? Is there any way that
 Cassandra can help out here?

Yes, we're working on a new, compressed format CASSANDRA-674.

 2. Is there another way to transform the commit logs into .db files
 without stopping and starting the server?

nodetool flush.

-ryan

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

On Thu, Jun 16, 2011 at 2:12 PM, AJ a...@dude.podzone.net wrote:
 On 6/16/2011 2:37 PM, Ryan King wrote:

 On Thu, Jun 16, 2011 at 1:05 PM, AJa...@dude.podzone.net  wrote:

 snip

 The Cassandra consistency model is pretty elegant and this type of
 approach breaks that elegance in many ways. It would also only really be
 useful when the value has a high probability of being updated between a
 node
 going down and the value being read.

 I'm not sure what you mean.  A node can be down for days during which
 time
 the value can be updated.  The intention is to use the nodes available
 even
 if they fall below the RF.  If there is only 1 node available for
 accepting
 a replica, that should be enough given the conditions I stated and
 updated
 below.

 If this is your constraint, then you should just use CL.ONE.

 My constraint is a CL = All Available.  So, CL.ONE will not work.

That's a solution, not a requirement. What's your requirement?


 Perhaps the simpler approach which is fairly trivial and does not
 require
 any Cassandra change is to simply downgrade your read from ALL to QUORUM
 when you get an unavailable exception for this particular read.

 It's not so trivial, esp since you would have to build that into your
 client
 at many levels.  I think it would be more appropriate (if this idea
 survives) to put it into Cass.

 I think the general answerer for 'maximum consistency' is QUORUM
 reads/writes. Based on the fact you are using CL=ALL for reads I assume
 you
 are using CL=ONE for writes: this itself strikes me as a bad idea if you
 require 'maximum consistency for one critical operation'.

 Very true.  Specifying quorum for BOTH reads/writes provides the 100%
 consistency because of the overlapping of the availability numbers.  But,
 only if the # of available nodes is not  RF.

 No, it will work as long as the available nodes is= RF/2 + 1

 Yes, that's what I meant.  Sorry for any confusion.  Restated: But, only if
 the # of available nodes is not  RF/2 + 1.

 Upon further reflection, this idea can be used for any consistency level.
  The general thrust of my argument is:  If a particular value can be
 overwritten by one process regardless of it's prior value, then that
 implies
 that the value in the down node is no longer up-to-date and can be
 disregarded.  Just work with the nodes that are available.

 Actually, now that I think about it...

 ALL_AVAIL guarantees 100% consistency iff the latest timestamp of the
 value

 latest unavailability time of all unavailable replica nodes for that

 value's row key.  Unavailable is defined as a node's Cass process that is
 not reachable from ANY node in the cluster in the same data center.  If
 the
 node in question is available to at least one node, then the read should
 fail as there is a possibility that the value could have been updated
 some
 other way.

 Node A can't reliably and consistently know  whether node B and node C
 can communicate.

 Well, theoretically, of course; that's the nature of distributed systems.
  But, Cass does indeed make that determination when it counts the number
 available replica nodes before it decides if enough replica nodes are
 available.  But, this is obvious to you I'm sure so maybe I don't understand
 your statement.

Consider this scenario: given nodes, A, B and C and A thinks C is down
but B thinks C is up. What do you do? Remember, A doesn't know that B
thinks C is up, it only knows its own state.

-ryan

Re: jsvc hangs shell

2011-06-16 Thread Ken Brumer


Anton Belyaev anton.belyaev at gmail.com writes:

 
 I guess it is not trivial to modify the package to make it use JSW
 instead of JSVC.
 I am still not sure the JSVC itself is a culprit. Maybe something is
 wrong in my setup.
 
 
 


I am seeing similar behavior using the Brisk Debian packages for Maverick:

http://www.datastax.com/docs/0.8/brisk/install_brisk_packages#installing-the-brisk-packaged-releases

Not sure if it's my configuration, but I verified in on two seperate installs.

-Ken

Brisk .rpm packages for CentOS/RH/Fedora

2011-06-16 Thread Marcos Ortiz Valmaseda

Regards to all Cassandra´ users
I don´t know if Brisk has its own mailing list, so I ask here.
Has Brisk .rpm packages for Red Hat and based distributions (CentOS/Fedora)?
If this is true, Where I can find them?

Thanks a lot for your time.


-- 
Marcos Luís Ortíz Valmaseda
 Software Engineer (Large-Scaled Distributed Systems)
http://marcosluis2186.posterous.com

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Dan Hendry

How would your solution deal with complete network partitions? A node being
'down' does not actually mean it is dead, just that it is unreachable from
whatever is making the decision to mark it 'down'.

Following from Ryan's example, consider nodes A, B, and C but within a
fully partitioned network: all of the nodes are up but each thinks all the
others are down. Your ALL_AVAILABLE consistency level would boil down to
consistency level ONE for clients connecting to any of the nodes. If I
connect to A, it thinks it is the last one standing and translates
'ALL_AVALIABLE' into 'ONE'. Based on your logic, two clients connecting to
two different nodes could each modify a value then read it, thinking that
its 100% consistent yet it is actually *completely* inconsistent with the
value on other node(s).

I suggest you review the principles of the infamous CAP theorem. The
consistency levels as the stand now, allow for an explicit trade off between
'available and partition tolerant' (ONE read/write) OR 'consistent and
available' (QUORUM read/write). Your solution achieves only availability and
can guarantee neither consistency nor partition tolerance.

On Thu, Jun 16, 2011 at 7:50 PM, Ryan King r...@twitter.com wrote:

 On Thu, Jun 16, 2011 at 2:12 PM, AJ a...@dude.podzone.net wrote:
  On 6/16/2011 2:37 PM, Ryan King wrote:
 
  On Thu, Jun 16, 2011 at 1:05 PM, AJa...@dude.podzone.net  wrote:
 
  snip
 
  The Cassandra consistency model is pretty elegant and this type of
  approach breaks that elegance in many ways. It would also only really
 be
  useful when the value has a high probability of being updated between
 a
  node
  going down and the value being read.
 
  I'm not sure what you mean.  A node can be down for days during which
  time
  the value can be updated.  The intention is to use the nodes available
  even
  if they fall below the RF.  If there is only 1 node available for
  accepting
  a replica, that should be enough given the conditions I stated and
  updated
  below.
 
  If this is your constraint, then you should just use CL.ONE.
 
  My constraint is a CL = All Available.  So, CL.ONE will not work.

 That's a solution, not a requirement. What's your requirement?

 
  Perhaps the simpler approach which is fairly trivial and does not
  require
  any Cassandra change is to simply downgrade your read from ALL to
 QUORUM
  when you get an unavailable exception for this particular read.
 
  It's not so trivial, esp since you would have to build that into your
  client
  at many levels.  I think it would be more appropriate (if this idea
  survives) to put it into Cass.
 
  I think the general answerer for 'maximum consistency' is QUORUM
  reads/writes. Based on the fact you are using CL=ALL for reads I
 assume
  you
  are using CL=ONE for writes: this itself strikes me as a bad idea if
 you
  require 'maximum consistency for one critical operation'.
 
  Very true.  Specifying quorum for BOTH reads/writes provides the 100%
  consistency because of the overlapping of the availability numbers.
  But,
  only if the # of available nodes is not  RF.
 
  No, it will work as long as the available nodes is= RF/2 + 1
 
  Yes, that's what I meant.  Sorry for any confusion.  Restated: But, only
 if
  the # of available nodes is not  RF/2 + 1.
 
  Upon further reflection, this idea can be used for any consistency
 level.
   The general thrust of my argument is:  If a particular value can be
  overwritten by one process regardless of it's prior value, then that
  implies
  that the value in the down node is no longer up-to-date and can be
  disregarded.  Just work with the nodes that are available.
 
  Actually, now that I think about it...
 
  ALL_AVAIL guarantees 100% consistency iff the latest timestamp of the
  value
 
  latest unavailability time of all unavailable replica nodes for that
 
  value's row key.  Unavailable is defined as a node's Cass process that
 is
  not reachable from ANY node in the cluster in the same data center.  If
  the
  node in question is available to at least one node, then the read
 should
  fail as there is a possibility that the value could have been updated
  some
  other way.
 
  Node A can't reliably and consistently know  whether node B and node C
  can communicate.
 
  Well, theoretically, of course; that's the nature of distributed systems.
   But, Cass does indeed make that determination when it counts the number
  available replica nodes before it decides if enough replica nodes are
  available.  But, this is obvious to you I'm sure so maybe I don't
 understand
  your statement.

 Consider this scenario: given nodes, A, B and C and A thinks C is down
 but B thinks C is up. What do you do? Remember, A doesn't know that B
 thinks C is up, it only knows its own state.

 -ryan

cassandra crash

2011-06-16 Thread Donna Li

All:

Why cassandra crash after print the following log?

 

INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020
SSTableDeletingReference.java (line 104) Deleted
/usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-206-Data.db

 INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020
SSTableDeletingReference.java (line 104) Deleted
/usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-207-Data.db

 INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020
SSTableDeletingReference.java (line 104) Deleted
/usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-137-Data.db

 INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021
SSTableDeletingReference.java (line 104) Deleted
/usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-205-Data.db

 INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021
SSTableDeletingReference.java (line 104) Deleted
/usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-139-Data.db

 INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021
SSTableDeletingReference.java (line 104) Deleted
/usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-138-Data.db

 INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021
SSTableDeletingReference.java (line 104) Deleted
/usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-208-Data.db

 INFO [GC inspection] 2011-06-16 14:22:59,562 GCInspector.java (line
110) GC for ParNew: 385 ms, 26859800 reclaimed leaving 117789112 used;
max is 118784

 

 

Best Regards

Donna li

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads


UPDATE to my suggestion is below.



On 6/16/2011 5:50 PM, Ryan King wrote:

On Thu, Jun 16, 2011 at 2:12 PM, AJa...@dude.podzone.net  wrote:

On 6/16/2011 2:37 PM, Ryan King wrote:

On Thu, Jun 16, 2011 at 1:05 PM, AJa...@dude.podzone.netwrote:
snip

The Cassandra consistency model is pretty elegant and this type of
approach breaks that elegance in many ways. It would also only really be
useful when the value has a high probability of being updated between a
node
going down and the value being read.

I'm not sure what you mean.  A node can be down for days during which
time
the value can be updated.  The intention is to use the nodes available
even
if they fall below the RF.  If there is only 1 node available for
accepting
a replica, that should be enough given the conditions I stated and
updated
below.

If this is your constraint, then you should just use CL.ONE.


My constraint is a CL = All Available.  So, CL.ONE will not work.

That's a solution, not a requirement. What's your requirement?


Ok.  And this updates my suggestion removing the need for ALL_AVAIL.  
This adds logic to cope with unavailable nodes and still achieve 
consistency for a specific situation.


The general requirement is to completely eliminate read failures for 
reads specifying CL = ALL for values that have been subject to a 
specific data update pattern.  The specific data update pattern consists 
of a value that has been updated (or added) in the face of one or more, 
but less than R, unavailable replica nodes (at least 1 replica node is 
available).  If a particular data value (column value) is updated after 
the latest down node, this implies this new value is independent of any 
replica values that are currently unavailable.  Therefore, in this 
situation, the number of available replicas is irrelevant.  After 
querying all *available* replica nodes, the value with the latest 
timestamp is consistent if that timestamp is  the timestamp of the last 
replica node that became unavailable.



snip

Well, theoretically, of course; that's the nature of distributed systems.
  But, Cass does indeed make that determination when it counts the number
available replica nodes before it decides if enough replica nodes are
available.  But, this is obvious to you I'm sure so maybe I don't understand
your statement.

Consider this scenario: given nodes, A, B and C and A thinks C is down
but B thinks C is up. What do you do? Remember, A doesn't know that B
thinks C is up, it only knows its own state.



What kind of network configuration would have this kind of scenario?  
This method only applies withing a data center which should be OK since 
other replication across data centers seems to be mostly for fault 
tolerance... but I will have to think about this.



-ryan

Re: Brisk .rpm packages for CentOS/RH/Fedora

2011-06-16 Thread Nate McCall

Yes, there is a brisk list: brisk-us...@googlegroups.com

Packages are available via rpm.datastax.com

On Thu, Jun 16, 2011 at 8:21 PM, Marcos Ortiz Valmaseda mlor...@uci.cu wrote:
 Regards to all Cassandra´ users
 I don´t know if Brisk has its own mailing list, so I ask here.
 Has Brisk .rpm packages for Red Hat and based distributions (CentOS/Fedora)?
 If this is true, Where I can find them?

 Thanks a lot for your time.


 --
 Marcos Luís Ortíz Valmaseda
  Software Engineer (Large-Scaled Distributed Systems)
 http://marcosluis2186.posterous.com

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads


On 6/16/2011 7:56 PM, Dan Hendry wrote:
How would your solution deal with complete network partitions? A node 
being 'down' does not actually mean it is dead, just that it is 
unreachable from whatever is making the decision to mark it 'down'.


Following from Ryan's example, consider nodes A, B, and C but within a 
fully partitioned network: all of the nodes are up but each thinks all 
the others are down. Your ALL_AVAILABLE consistency level would boil 
down to consistency level ONE for clients connecting to any of the 
nodes. If I connect to A, it thinks it is the last one standing and 
translates 'ALL_AVALIABLE' into 'ONE'. Based on your logic, two 
clients connecting to two different nodes could each modify a value 
then read it, thinking that its 100% consistent yet it is 
actually *completely* inconsistent with the value on other node(s).


Help me out here.  I'm trying to visualize a situation where the clients 
can access all the C* nodes but the nodes can't access each other.  I 
don't see how that can happen on a regular ethernet subnet in one data 
center.  Well, Im sure there is a case that you can point out.  Ok, I 
will concede that this is an issue for some network configurations.


I suggest you review the principles of the infamous CAP theorem. The 
consistency levels as the stand now, allow for an explicit trade off 
between 'available and partition tolerant' (ONE read/write) OR 
'consistent and available' (QUORUM read/write). Your solution achieves 
only availability and can guarantee neither consistency nor partition 
tolerance.


It looks like CAP may triumph again.  Thanks for the exercise Dan and Ryan.

Re: Cassandra JVM GC settings

It would help if you can provide some log messages from the GCInspector so 
people can see how much GC is going on. 


Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 02:46, Sebastien Coutu wrote:

 Hi Everyone,
 
 I'm seeing Cassandra GC a lot and I would like to tune the Young space and 
 the Tenured space. Anyone would have recommendations on the NewRatio or 
 NewSize/MaxNewSize to use for an environment where Cassandra has several 
 column families and in which we are doing a mixed load of reading and 
 writing. The JVM has 8G of heap space assigned to it and there are 9 nodes to 
 this cluster.
 
 Thanks for the comments!
 
 Sébastien Coutu

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Dan Hendry

Help me out here.  I'm trying to visualize a situation where the clients
can access all the C* nodes but the nodes can't access each other.  I don't
see how that can happen on a regular ethernet subnet in one data center.
 Well, Im sure there is a case that you can point out.  Ok, I will concede
that this is an issue for some network configurations.

First rule of designing/developing/operating distributed systems: assume
anything and everything can and will happen, regardless of network
configuration or hardware.

This specific situation actually HAS happened to me. Our Cassandra nodes
accept client connections on one ethernet interface on one network (the
production network) yet communicate with each other on a separate ethernet
interface on a separate network which is Cassandra specific. This was done
mainly due to the relatively large inter-node Cassandra bandwidth
requirements in comparison to client bandwidth requirements. At one point,
the switch for the cassandra network went down so clients could connect yet
the cassandra nodes could not talk to eachother. (We write at ONE and read
at ALL so everything behaved as expected).


On Thu, Jun 16, 2011 at 11:00 PM, AJ a...@dude.podzone.net wrote:

 On 6/16/2011 7:56 PM, Dan Hendry wrote:

 How would your solution deal with complete network partitions? A node
 being 'down' does not actually mean it is dead, just that it is unreachable
 from whatever is making the decision to mark it 'down'.

 Following from Ryan's example, consider nodes A, B, and C but within a
 fully partitioned network: all of the nodes are up but each thinks all the
 others are down. Your ALL_AVAILABLE consistency level would boil down to
 consistency level ONE for clients connecting to any of the nodes. If I
 connect to A, it thinks it is the last one standing and translates
 'ALL_AVALIABLE' into 'ONE'. Based on your logic, two clients connecting to
 two different nodes could each modify a value then read it, thinking that
 its 100% consistent yet it is actually *completely* inconsistent with the
 value on other node(s).


 Help me out here.  I'm trying to visualize a situation where the clients
 can access all the C* nodes but the nodes can't access each other.  I don't
 see how that can happen on a regular ethernet subnet in one data center.
  Well, Im sure there is a case that you can point out.  Ok, I will concede
 that this is an issue for some network configurations.


  I suggest you review the principles of the infamous CAP theorem. The
 consistency levels as the stand now, allow for an explicit trade off between
 'available and partition tolerant' (ONE read/write) OR 'consistent and
 available' (QUORUM read/write). Your solution achieves only availability and
 can guarantee neither consistency nor partition tolerance.


 It looks like CAP may triumph again.  Thanks for the exercise Dan and Ryan.

Re: client API

The Thrift Java compiler creates code that is not compliant with Java 5.

https://issues.apache.org/jira/browse/THRIFT-1170

So you may have trouble getting the thrift API to run. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 03:14, karim abbouh wrote:

 i use jdk1.6 to install and launch cassandra in a linux platform,but can i 
 use jdk1.5 for my cassandra Client ?

Re: Docs: Token Selection

 But, I'm thinking about using OldNetworkTopStrat. 

NetworkTopologyStrategy is where it's at. 

A
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 01:39, AJ wrote:

 Thanks Eric!  I've finally got it!  I feel like I've just been initiated or 
 something by discovering this secret.  I kid!
 
 But, I'm thinking about using OldNetworkTopStrat.  Do you, or anyone else, 
 know if the same rules for token assignment applies to ONTS?
 
 
 On 6/16/2011 7:21 AM, Eric tamme wrote:
 AJ,
 
 sorry I seemed to miss the original email on this thread.  As Aaron
 said, when computing tokens for multiple data centers, you should
 compute them independently for each data center - as if it were its
 own Cassandra cluster.
 
 You can have overlapping token ranges between multiple data centers,
 but no two nodes can have the same token, so for subsequent data
 centers I just increment the tokens.
 
 For two data centers with two nodes each using RandomPartitioner
 calculate the tokens for the first DC normally, but int he second data
 center, increment the tokens by one.
 
 In DC 1
 node 1 = 0
 node 2 = 85070591730234615865843651857942052864
 
 In DC 2
 node 1 = 1
 node 2 =  85070591730234615865843651857942052865
 
 For RowMutations this will give each data center a local set of nodes
 that it can write to for complete coverage of the entire token space.
 If you are using NetworkTopologyStrategy for replication, it will give
 an offset mirror replication between the two data centers so that your
 replicas will not get pinned to a node in the remote DC.  There are
 other ways to select the tokens, but the increment method is the
 simplest to manage and continue to grow with.
 
 Hope that helps.
 
 -Eric

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads


On 6/16/2011 9:36 PM, Dan Hendry wrote:
Help me out here.  I'm trying to visualize a situation where the 
clients can access all the C* nodes but the nodes can't access each 
other.  I don't see how that can happen on a regular ethernet subnet 
in one data center.  Well, Im sure there is a case that you can point 
out.  Ok, I will concede that this is an issue for some network 
configurations.


First rule of designing/developing/operating distributed systems: 
assume anything and everything can and will happen, regardless of 
network configuration or hardware.


This specific situation actually HAS happened to me. Our Cassandra 
nodes accept client connections on one ethernet interface on one 
network (the production network) yet communicate with each other on a 
separate ethernet interface on a separate network which is Cassandra 
specific. This was done mainly due to the relatively large inter-node 
Cassandra bandwidth requirements in comparison to client bandwidth 
requirements. At one point, the switch for the cassandra network went 
down so clients could connect yet the cassandra nodes could not talk 
to eachother. (We write at ONE and read at ALL so everything behaved 
as expected).




Funny, but that's the exact same setup I'm running.  But, I'm not a 
network guy and kind of assumed it wasn't so typical.  Plus, lately I've 
had my mind on a cloud setup.

Re: Docs: Token Selection