Important Variables for Scaling

2011-06-16 Thread Schuilenga, Jan Taeke
Which variables (for instance: throughput, CPU, I/O, connections) are leading in deciding to add a node to a Cassandra setup which is put under strain. We are trying to proove scalibility, but when is the time there to add a node and have the optimum scalibilty result.

Re: Multi data center configuration - A question on read correction

2011-06-16 Thread Sylvain Lebresne
Yes, that's the way to do it. On Wed, Jun 15, 2011 at 9:43 PM, Selva Kumar wwgse...@yahoo.com wrote: Thanks Jonathan. Can we turn off RR by READ_REPAIR_CHANCE.= 0. Please advice. Selva From: Jonathan Ellis jbel...@gmail.com To: user@cassandra.apache.org

Re: sstable2json2sstable bug with json data stored

2011-06-16 Thread Timo Nentwig
On 6/15/11 17:41, Timo Nentwig wrote: (json can likely be boiled down even more...) Any JSON (well, probably anything with quotes...) breaks it: { 74657374: [[data, {foo:bar}, 1308209845388000]] } [default@foo] set transactions[test][data]='{foo:bar}'; I feared that storing data in a

Re: sstable2json2sstable bug with json data stored

2011-06-16 Thread Sasha Dolgy
The JSON you are showing below is an export from cassandra? { 74657374: [[data, {foo:bar}, 1308209845388000]] } Does this work? { 74657374: [[data, {foo:bar}, 1308209845388000]] } -sd On Thu, Jun 16, 2011 at 9:49 AM, Timo Nentwig timo.nent...@toptarif.de wrote: On 6/15/11 17:41, Timo Nentwig

Re: sstable2json2sstable bug with json data stored

2011-06-16 Thread Timo Nentwig
On 6/16/11 10:06, Sasha Dolgy wrote: The JSON you are showing below is an export from cassandra? Yes. Just posted the solution: https://issues.apache.org/jira/browse/CASSANDRA-2780?focusedCommentId=13050274page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13050274

Re: sstable2json2sstable bug with json data stored

2011-06-16 Thread Timo Nentwig
On 6/16/11 10:12, Timo Nentwig wrote: On 6/16/11 10:06, Sasha Dolgy wrote: The JSON you are showing below is an export from cassandra? Yes. Just posted the solution:

Getting Started website is out of date

2011-06-16 Thread Christian Straube
Hi, the Getting started website (http://wiki.apache.org/cassandra/GettingStarted) is out of date - the link to the Twissandra demo is broken - the new CQL is not mentioned :-) Beside this I love cassandra! Best Christian

Re: Migration question

2011-06-16 Thread aaron morton
Lots of folk use a single disk or raid-1 for the system and commit log and raid-0 for the data volumes http://wiki.apache.org/cassandra/CassandraHardware Your money is probably better spent on more nodes with more disks and more memory. More nodes is always better. Happy to hear reasons

Re: Slowdowns during repair

2011-06-16 Thread aaron morton
Look for log messages at the ERROR level first to find out why it's crashing. Check for GC pressure during the repair, either using JConsole or log messages from the GCInspector. Check the nodetool tpstats to get an idea if the nodes are saturated, i.e. are their tasks in the pending list.

Re: Where is my data?

2011-06-16 Thread aaron morton
I wrote a blog post about this sort of thing the other day http://thelastpickle.com/2011/06/13/Down-For-Me/ Let me know if you spot any problems. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 16 Jun 2011, at 02:20, AJ wrote:

Re: What's the best approach to search in Cassandra

2011-06-16 Thread Jake Luciani
Mark, Solandra doesn't use secondary indexes, the functionality is too limited for the lucene api. It maintain's it's own indexes in regular column families. I suggest you look at Solr and decide if this is the functionality you need, Solandra offers the same api but on Cassandra's distributed

Re: Force a node to form part of quorum

2011-06-16 Thread aaron morton
Short answer: No. Medium answer: No all nodes are equal. It could create a single point of failure if a QUOURM could not be formed without a specific node. Writes are sent to every replica. Reads with Read Repair enabled are also sent to every replica. For reads the closest UP node as

Re: Atomicity of batch updates

2011-06-16 Thread aaron morton
See http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 16 Jun 2011, at 06:26, chovatia jaydeep wrote: Cassandra write operation is atomic for all the columns/super columns

Re: Easy way to overload a single node on purpose?

2011-06-16 Thread aaron morton
DEBUG 14:36:55,546 ... timed out Is logged when the coordinator times out waiting for the replicas to respond, the timeout setting is rpc_timeout in the yaml file. This results in the client getting a TimedOutException. AFAIK There is no global everything is good / bad flags to check.

Re: Is there a way from a running Cassandra node to determine whether or not itself is up?

2011-06-16 Thread aaron morton
take a look at mx4j http://wiki.apache.org/cassandra/Operations#Monitoring_with_MX4J someone told me once you can call the JMX ops via http, i've not checked though. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 16 Jun 2011,

Re: Docs: Token Selection

2011-06-16 Thread aaron morton
See this thread for background http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replica-data-distributing-between-racks-td6324819.html In a multi DC environment, if you calculate the initial tokens for the entire cluster data will not be evenly distributed. Cheers

Querying superColumn

2011-06-16 Thread Vivek Mishra
I have a question about querying super column For example: I have a supercolumnFamily DEPARTMENT with dynamic superColumn 'EMPLOYEE'( name, country). Now for rowKey 'DEPT1' I have inserted multiple super column like: Employee1{ Name: Vivek country: India } Employee2{ Name: Vivs country:

Re: Important Variables for Scaling

2011-06-16 Thread aaron morton
It's a difficult questions to answer in the abstract. Some thoughts... Scaling by adding one node at time is not optimal. The best case scenario is to double the number of nodes, as this means existing nodes only have to stream their data to a new node. Obviously this is not always possible.

Upgrading Cassandra cluster from 0.6.3 to 0.7.5

2011-06-16 Thread Ali Ahsan
Hi All, We are upgrading cassandra from 0.6.3 to 0.7.5.We have two node in cluster.I am bit confused how to upgrade them can you have any guide. -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140

Re: Querying superColumn

2011-06-16 Thread Donal Zang
Well, you are looking for the secondary index. But for now,AFAIK, the supercolumn can not use secondary index . On 16/06/2011 13:55, Vivek Mishra wrote: Now for rowKey 'DEPT1' I have inserted multiple super column like: *Employee1{* *Name: Vivek* *country: India* *}* ** *Employee2{*

snitch thrift

2011-06-16 Thread Terje Marthinussen
Hi all! Assuming a node ends up in GC land for a while, there is a good chance that even though it performs terribly and the dynamic snitching will help you to avoid it on the gossip side, it will not really help you much if thrift still accepts requests and the thrift interface has choppy

Re: Docs: Token Selection

2011-06-16 Thread Eric tamme
AJ, sorry I seemed to miss the original email on this thread. As Aaron said, when computing tokens for multiple data centers, you should compute them independently for each data center - as if it were its own Cassandra cluster. You can have overlapping token ranges between multiple data

Re: Docs: Token Selection

2011-06-16 Thread AJ
LOL, I feel Eric's pain. This double-ring thing can throw you for a loop since, like I said, there is only one place it is documented and it is only *implied*, so one is not sure he is interpreting it correctly. Even the source for NTS doesn't mention this. Thanks for everyone's help on

Re: Docs: Token Selection

2011-06-16 Thread AJ
Thanks Eric! I've finally got it! I feel like I've just been initiated or something by discovering this secret. I kid! But, I'm thinking about using OldNetworkTopStrat. Do you, or anyone else, know if the same rules for token assignment applies to ONTS? On 6/16/2011 7:21 AM, Eric tamme

Cassandra JVM GC settings

2011-06-16 Thread Sebastien Coutu
Hi Everyone, I'm seeing Cassandra GC a lot and I would like to tune the Young space and the Tenured space. Anyone would have recommendations on the NewRatio or NewSize/MaxNewSize to use for an environment where Cassandra has several column families and in which we are doing a mixed load of

client API

2011-06-16 Thread karim abbouh
i use jdk1.6 to install and launch cassandra in a linux platform,but can i use jdk1.5 for my cassandra Client ?

Re: Querying superColumn

2011-06-16 Thread Sasha Dolgy
Have 1 row with employee info for country/office/division, each column an employee id and json info about the employee or a reference.to.another row id for that employee data No more supercolumn. On Jun 16, 2011 1:56 PM, Vivek Mishra vivek.mis...@impetus.co.in wrote: I have a question about

Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread AJ
Good morning all. Hypothetical Setup: 1 data center RF = 3 Total nodes 3 Problem: Suppose I need maximum consistency for one critical operation; thus I specify CL = ALL for reads. However, this will fail if only 1 replica endpoint is down. I don't see why this fail is necessary all of the

Re: Docs: Token Selection

2011-06-16 Thread Sasha Dolgy
So, with ec2 ... 3 regions (DC's), each one is +1 from another? On Jun 16, 2011 3:40 PM, AJ a...@dude.podzone.net wrote: Thanks Eric! I've finally got it! I feel like I've just been initiated or something by discovering this secret. I kid! But, I'm thinking about using OldNetworkTopStrat.

Re: Docs: Token Selection

2011-06-16 Thread Eric tamme
On Thu, Jun 16, 2011 at 11:11 AM, Sasha Dolgy sdo...@gmail.com wrote: So, with ec2 ... 3 regions (DC's), each one is +1 from another? I dont use ec2, so I am not familiar with the specifics of deployment there. That said, if you have 3 data centers with equal nodes in each (so that you

Unable to access column family in CLI after building CF in CQL

2011-06-16 Thread yikes bigdata
Hi, I was following the CQL example on the DataStax website and was able to create a new column family and query it. But when I viewed the column family in the CLI, it gives me the following error. # Unable to read column family created from CQL [default@store] list users2; *users2 not found in

Re: Upgrading Cassandra cluster from 0.6.3 to 0.7.5

2011-06-16 Thread Jonathan Ellis
Read NEWS.txt. 0.7.6 is better than 0.7.5, btw. On Thu, Jun 16, 2011 at 5:03 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote: Hi All, We are upgrading cassandra from 0.6.3 to 0.7.5.We have two node in cluster.I am bit confused how to upgrade them can you have any guide. -- S.Ali Ahsan

Re: Unable to access column family in CLI after building CF in CQL

2011-06-16 Thread Jonathan Ellis
If you create CFs outside the cli, you may need to restart it to refresh its internal cache of the schema. On Thu, Jun 16, 2011 at 8:51 AM, yikes bigdata yikes.bigd...@gmail.com wrote: Hi, I was following the CQL example on the DataStax website and was able to create a new column family and

Re: Unable to access column family in CLI after building CF in CQL

2011-06-16 Thread Konstantin Naryshkin
The second error (the CQL select) is because you have different Key Validation Class values for your two user columns. users is org.apache.cassandra.db.marshal.BytesType, while users2 is org.apache.cassandra.db.marshal.UTF8Type. The select is failing because you are comparing a String to a

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Ryan King
On Thu, Jun 16, 2011 at 8:18 AM, AJ a...@dude.podzone.net wrote: Good morning all. Hypothetical Setup: 1 data center RF = 3 Total nodes 3 Problem: Suppose I need maximum consistency for one critical operation; thus I specify CL = ALL for reads.  However, this will fail if only 1 replica

Re: snitch thrift

2011-06-16 Thread Ryan King
On Thu, Jun 16, 2011 at 6:11 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: Hi all! Assuming a node ends up in GC land for a while, there is a good chance that even though it performs terribly and the dynamic snitching will help you to avoid it on the gossip side, it will not really

Re: Unable to access column family in CLI after building CF in CQL

2011-06-16 Thread yikes bigdata
Ah that works. Thanks everyone for the help. On Thu, Jun 16, 2011 at 9:04 AM, Konstantin Naryshkin konstant...@a-bb.netwrote: The second error (the CQL select) is because you have different Key Validation Class values for your two user columns. users is

Re: Cassandra Statistics and Metrics

2011-06-16 Thread Viktor Jevdokimov
There's possibility to use command line JMX client with standard Zabbix agent to request JMX counters without incorporating zapcat into Cassandra or another Java app. I'm investigating this feature right now, will post results when finish. 2011/6/15 Viktor Jevdokimov vjevdoki...@gmail.com

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread AJ
On 6/16/2011 10:05 AM, Ryan King wrote: I don't think this buys you anything that you can't get with quorum reads and writes. -ryan QUORUM = ALL_AVAIL = ALL == RF

RE: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Dan Hendry
I think this would add a lot of complexity behind the scenes and be conceptually confusing, particularly for new users. The Cassandra consistency model is pretty elegant and this type of approach breaks that elegance in many ways. It would also only really be useful when the value has a high

Re: Cassandra Statistics and Metrics

2011-06-16 Thread Héctor Izquierdo Seliva
This is what I use: http://code.google.com/p/simple-cassandra-monitoring/ Disclaimer: I did it myself, don't expect too much :P El jue, 16-06-2011 a las 19:35 +0300, Viktor Jevdokimov escribió: There's possibility to use command line JMX client with standard Zabbix agent to request JMX

Re: snitch thrift

2011-06-16 Thread Jonathan Ellis
Seems like a more robust solution would be to implement dynamic-snitch-like behavior in the client. Hector has done this for a few months now. https://github.com/rantav/hector/blob/master/core/src/main/java/me/prettyprint/cassandra/connection/DynamicLoadBalancingPolicy.java On Thu, Jun 16, 2011

Re: need some help with counters

2011-06-16 Thread Ian Holsman
On Jun 13, 2011, at 5:10 AM, aaron morton wrote: I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs type query).. AFAIK thats not a great application for counters. You would need range support in the secondary indexes so you could get the first X rows

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread AJ
On 6/16/2011 10:58 AM, Dan Hendry wrote: I think this would add a lot of complexity behind the scenes and be conceptually confusing, particularly for new users. I'm not so sure about this. Cass is already somewhat sophisticated and I don't see how this could trip-up anyone who can already

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Ryan King
On Thu, Jun 16, 2011 at 1:05 PM, AJ a...@dude.podzone.net wrote: On 6/16/2011 10:58 AM, Dan Hendry wrote: I think this would add a lot of complexity behind the scenes and be conceptually confusing, particularly for new users. I'm not so sure about this.  Cass is already somewhat

Visiting Auckland

2011-06-16 Thread aaron morton
So long as the Volcanic Ash stays away I'll be visiting Auckland next week on the 23rd and 24th. Drop me an email if you would like to meet to talk about things Cassandra. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread AJ
On 6/16/2011 2:37 PM, Ryan King wrote: On Thu, Jun 16, 2011 at 1:05 PM, AJa...@dude.podzone.net wrote: snip The Cassandra consistency model is pretty elegant and this type of approach breaks that elegance in many ways. It would also only really be useful when the value has a high

Re: Force a node to form part of quorum

2011-06-16 Thread A J
It would be great if Cassandra puts this on their roadmap. There is lot of durability benefits by incorporating dc awareness into the write consistency equation. MongoDB has this feature in their upcoming release:

Re: Force a node to form part of quorum

2011-06-16 Thread Peter Schuller
It would be great if Cassandra puts this on their roadmap. There is lot of durability benefits by incorporating dc awareness into the write consistency equation. You may be interested in the discussion here: https://issues.apache.org/jira/browse/CASSANDRA-2338 -- / Peter Schuller

Re: Easy way to overload a single node on purpose?

2011-06-16 Thread Suan Aik Yeo
Having a ping column can work if every key is replicated to every node. It would tell you the cluster is working, sort of. Once the number of nodes is greater than the RF, it tells you a subset of the nodes works. The way our check works is that each node checks itself, so in this context we're

compression for regular column names?

2011-06-16 Thread E R
Hi all, As a way of gaining familiarity with Cassandra I am migrating a table that is currently stored in a relational database and mapping it into a Cassandra column family. We add about 700,000 new rows a day to this table, and the average disk space used per row is ~ 300 bytes including

Re: compression for regular column names?

2011-06-16 Thread Ryan King
On Thu, Jun 16, 2011 at 3:41 PM, E R pc88m...@gmail.com wrote: Hi all, As a way of gaining familiarity with Cassandra I am migrating a table that is currently stored in a relational database and mapping it into a Cassandra column family. We add about 700,000 new rows a day to this table, and

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Ryan King
On Thu, Jun 16, 2011 at 2:12 PM, AJ a...@dude.podzone.net wrote: On 6/16/2011 2:37 PM, Ryan King wrote: On Thu, Jun 16, 2011 at 1:05 PM, AJa...@dude.podzone.net  wrote: snip The Cassandra consistency model is pretty elegant and this type of approach breaks that elegance in many ways. It

Re: jsvc hangs shell

2011-06-16 Thread Ken Brumer
Anton Belyaev anton.belyaev at gmail.com writes: I guess it is not trivial to modify the package to make it use JSW instead of JSVC. I am still not sure the JSVC itself is a culprit. Maybe something is wrong in my setup. I am seeing similar behavior using the Brisk Debian packages

Brisk .rpm packages for CentOS/RH/Fedora

2011-06-16 Thread Marcos Ortiz Valmaseda
Regards to all Cassandra´ users I don´t know if Brisk has its own mailing list, so I ask here. Has Brisk .rpm packages for Red Hat and based distributions (CentOS/Fedora)? If this is true, Where I can find them? Thanks a lot for your time. -- Marcos Luís Ortíz Valmaseda Software Engineer

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Dan Hendry
How would your solution deal with complete network partitions? A node being 'down' does not actually mean it is dead, just that it is unreachable from whatever is making the decision to mark it 'down'. Following from Ryan's example, consider nodes A, B, and C but within a fully partitioned

cassandra crash

2011-06-16 Thread Donna Li
All: Why cassandra crash after print the following log? INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020 SSTableDeletingReference.java (line 104) Deleted /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-206-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread AJ
UPDATE to my suggestion is below. On 6/16/2011 5:50 PM, Ryan King wrote: On Thu, Jun 16, 2011 at 2:12 PM, AJa...@dude.podzone.net wrote: On 6/16/2011 2:37 PM, Ryan King wrote: On Thu, Jun 16, 2011 at 1:05 PM, AJa...@dude.podzone.netwrote: snip The Cassandra consistency model is pretty

Re: Brisk .rpm packages for CentOS/RH/Fedora

2011-06-16 Thread Nate McCall
Yes, there is a brisk list: brisk-us...@googlegroups.com Packages are available via rpm.datastax.com On Thu, Jun 16, 2011 at 8:21 PM, Marcos Ortiz Valmaseda mlor...@uci.cu wrote: Regards to all Cassandra´ users I don´t know if Brisk has its own mailing list, so I ask here. Has Brisk .rpm

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread AJ
On 6/16/2011 7:56 PM, Dan Hendry wrote: How would your solution deal with complete network partitions? A node being 'down' does not actually mean it is dead, just that it is unreachable from whatever is making the decision to mark it 'down'. Following from Ryan's example, consider nodes A, B,

Re: Cassandra JVM GC settings

2011-06-16 Thread aaron morton
It would help if you can provide some log messages from the GCInspector so people can see how much GC is going on. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 17 Jun 2011, at 02:46, Sebastien Coutu wrote: Hi Everyone,

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Dan Hendry
Help me out here. I'm trying to visualize a situation where the clients can access all the C* nodes but the nodes can't access each other. I don't see how that can happen on a regular ethernet subnet in one data center. Well, Im sure there is a case that you can point out. Ok, I will concede

Re: client API

2011-06-16 Thread aaron morton
The Thrift Java compiler creates code that is not compliant with Java 5. https://issues.apache.org/jira/browse/THRIFT-1170 So you may have trouble getting the thrift API to run. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On

Re: Docs: Token Selection

2011-06-16 Thread aaron morton
But, I'm thinking about using OldNetworkTopStrat. NetworkTopologyStrategy is where it's at. A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 17 Jun 2011, at 01:39, AJ wrote: Thanks Eric! I've finally got it! I feel like I've just

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread AJ
On 6/16/2011 9:36 PM, Dan Hendry wrote: Help me out here. I'm trying to visualize a situation where the clients can access all the C* nodes but the nodes can't access each other. I don't see how that can happen on a regular ethernet subnet in one data center. Well, Im sure there is a case

Re: Docs: Token Selection

2011-06-16 Thread AJ
On 6/16/2011 9:45 PM, aaron morton wrote: But, I'm thinking about using OldNetworkTopStrat. NetworkTopologyStrategy is where it's at. Oh yeah? It didn't look like it would serve my requirements. I want 2 full production geo-diverse data centers with each serving as a failover for the