How to store a list of values?
I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben
Re: Error in FAQ?
If you want to modify a column family, just open the command line interface (cassandra-cli), connect to a node (probably: connect localhost/9160;). When you have to create your first keyspace type: create keyspace MyKeyspace; For modifying an existing keyspace type: use MyKeyspace; If you need more information you can just type help; Good luck! 2012/3/26 Ben McCann b...@benmccann.com Hmmm, I don't see anything regarding column families in cassandra.yaml. It seems like the answer for that question in the FAQ is very outdated. On Sun, Mar 25, 2012 at 4:04 PM, Serge Fonville serge.fonvi...@gmail.comwrote: Hi, 2012/3/26 Ben McCann b...@benmccann.com: There's a line that says Make necessary changes to your storage-conf.xml. I can't find this file. Does it still exist? If so, where should I look? I installed the packaged version of Cassandra available in the Datastax community edition. From http://wiki.apache.org/cassandra/StorageConfiguration Prior to the 0.7 release, Cassandra storage configuration is described by the conf/storage-conf.xml file. As of 0.7, it is described by the conf/cassandra.yaml file. After googling cassandra storage-conf.xml Kind regards/met vriendelijke groet, Serge Fonville http://www.sergefonville.nl Convince Google!! They need to add GAL support on Android (star to agree) http://code.google.com/p/android/issues/detail?id=4602 2012/3/26 Ben McCann b...@benmccann.com: There's a line that says Make necessary changes to your storage-conf.xml. I can't find this file. Does it still exist? If so, where should I look? I installed the packaged version of Cassandra available in the Datastax community edition. Thanks, Ben
Re: unbalanced ring
How can I fix this? add more data. 1.5M is not enough to get reliable reports
problem in create column family
It is giving errors like Unable to find abstract-type class 'org.apache.cassandra.db.marshal.utf8' and java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: cannot parse 'catalogueId' as hex bytes where catalogueId is a column that has utf8 as its data type. they may be just synactical errors.. Please suggest if u can help me out on dis??
Re: How to store a list of values?
I would take simple approach. create one other CF UserSkill with row key same as profile_cf key, In user_skill cf will add skill as column name and value null. Columns can be added or removed. UserProfile={ '*ben*'={ blah :blah blah :blah blah :blah } } UserSkill={ '*ben*'={ 'java':'' 'cassandra':'' . . . 'linux':'' 'skill':'infinity' } } On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote: I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben
Re: problem in create column family
You should use the full type names, e.g. create column family MyColumnFamily with comparator=UTF8Type; 2012/3/26 puneet loya puneetl...@gmail.com It is giving errors like Unable to find abstract-type class 'org.apache.cassandra.db.marshal.utf8' and java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: cannot parse 'catalogueId' as hex bytes where catalogueId is a column that has utf8 as its data type. they may be just synactical errors.. Please suggest if u can help me out on dis?? -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: How to store a list of values?
Thanks for the reply Samal. I did not realize that you could store a column with null value. Do you know if this solution would work with composite columns? It seems super columns are being phased out in favor of composites, but I do not understand composites very well yet. I'm trying to figure out if there's any way to accomplish what you've suggested using Astyanax https://github.com/Netflix/astyanax. Thanks for the help, Ben On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote: plus it is fully compatible with CQL. SELECT * FROM UserSkill WHERE KEY='ben'; On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote: I would take simple approach. create one other CF UserSkill with row key same as profile_cf key, In user_skill cf will add skill as column name and value null. Columns can be added or removed. UserProfile={ '*ben*'={ blah :blah blah :blah blah :blah } } UserSkill={ '*ben*'={ 'java':'' 'cassandra':'' . . . 'linux':'' 'skill':'infinity' } } On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote: I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben
Re: How to store a list of values?
plus it is fully compatible with CQL. SELECT * FROM UserSkill WHERE KEY='ben'; On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote: I would take simple approach. create one other CF UserSkill with row key same as profile_cf key, In user_skill cf will add skill as column name and value null. Columns can be added or removed. UserProfile={ '*ben*'={ blah :blah blah :blah blah :blah } } UserSkill={ '*ben*'={ 'java':'' 'cassandra':'' . . . 'linux':'' 'skill':'infinity' } } On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote: I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben
Re: How to store a list of values?
On Mon, Mar 26, 2012 at 9:20 PM, Ben McCann b...@benmccann.com wrote: Thanks for the reply Samal. I did not realize that you could store a column with null value. values can be null or any value like [default@node] set hus['test']['wowq']='\{de\'.de\;\}\+\^anything'; Value inserted. Elapsed time: 4 msec(s). [default@node] [default@node] [default@node] get hus['test']; = (column=wow, value={de.de;}, timestamp=133222503000) = (column=wowq, value={de'.de;}+^anything, timestamp=133267425000) Returned 2 results. Elapsed time: 65 msec(s). [default@node] Do you know if this solution would work with composite columns? It seems super columns are being phased out in favor of composites, but I do not understand composites very well yet. personally i have phased out Super Column year back, about CC didn't much dig into it but know key and column name can be composite. 'ben'+'task1'={ utf8+ascii:'' } I'm trying to figure out if there's any way to accomplish what you've suggested using Astyanax https://github.com/Netflix/astyanax. this is the simplest approach, should work with every client available since it is independent CF, here two call is required. Thanks for the help, Ben On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote: plus it is fully compatible with CQL. SELECT * FROM UserSkill WHERE KEY='ben'; On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote: I would take simple approach. create one other CF UserSkill with row key same as profile_cf key, In user_skill cf will add skill as column name and value null. Columns can be added or removed. UserProfile={ '*ben*'={ blah :blah blah :blah blah :blah } } UserSkill={ '*ben*'={ 'java':'' 'cassandra':'' . . . 'linux':'' 'skill':'infinity' } } On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote: I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben
Re: Counters and replication factor
Can you describe the situations where counter updates are lost or go backwards ? Do you ever get TimedOutExceptions when performing counter updates ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/03/2012, at 6:34 PM, Radim Kolar wrote: I still have wrong results (I simulated an event 5 times and it was counted 3 times by some counters 4 or 5 times by others. I have also wrong results with counters in 1.0.8, many times updates to counter column are just lost and sometimes counters are going backwards even if our app uses only increments. Dont reply on counters for something important. they are still beta quality. We are now using zookeeper for important counters and cassandra for junk like statistic.
Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?
without them overlapping/disturbing each other (assuming that keys lie in above domains) ? Not sure what you mean by overlapping. 42 as a int and 42 as a long are the same key. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 9:47 PM, Ertio Lew wrote: I have been writing rows to a CF all with integer(4 byte) keys. So my CF contains rows with keys in the entire range from Integer.MIN_VALUE to Integer.MAX_VALUE. Now I want to store Long type keys as well in this CF **without disturbing the integer keys. The range of Long type keys would be excluding the integers's range ie (-2^63 to -2^31) and (2^31 to 2^63). Would it be safe to mix the integer long keys in single CF without them overlapping/disturbing each other (assuming that keys lie in above domains) ?
Re: How to store a list of values?
Save the skills in a single column in json format. Job done. On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote: True. But I don't need the skills to be searchable, so I'd rather embed them in the user than add another top-level CF. I was thinking of doing something along the lines of adding a skills super column to the User table: skills: { 'java': null, 'c++': null, 'cobol': null } However, I'm still not sure yet how to accomplish this with Astyanax. I've only figured out how to make composite columns with predefined column names with it and not dynamic column names like this. On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote: In this case you only neem the columns for values. You don't need the column-values to hold multiple columns (the super-column principle). So a normal CF would work. 2012/3/26 Ben McCann b...@benmccann.com Thanks for the reply Samal. I did not realize that you could store a column with null value. Do you know if this solution would work with composite columns? It seems super columns are being phased out in favor of composites, but I do not understand composites very well yet. I'm trying to figure out if there's any way to accomplish what you've suggested using Astyanax https://github.com/Netflix/astyanax. Thanks for the help, Ben On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote: plus it is fully compatible with CQL. SELECT * FROM UserSkill WHERE KEY='ben'; On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote: I would take simple approach. create one other CF UserSkill with row key same as profile_cf key, In user_skill cf will add skill as column name and value null. Columns can be added or removed. UserProfile={ '*ben*'={ blah :blah blah :blah blah :blah } } UserSkill={ '*ben*'={ 'java':'' 'cassandra':'' . . . 'linux':'' 'skill':'infinity' } } On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote: I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: smart client proxy for cassandra
I've heard of people using HA Proxy http://haproxy.1wt.eu/ with php as a connection pool. Note that detecting failure in Cassandra can only be done as part of a request. So HA Proxy cannot understand if a node is actually functional, only that it allows a socket to be opened. There is some work being done on creating a proxy server with Hector https://github.com/rantav/hector/tree/lcp-first-cut Not sure on it's progress. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 10:46 PM, Piavlo wrote: Hi, Is there any smart client proxy implementation for cassandra? I'd like to proxy short lived phpcassa connections through a smart proxy that will manage a pool of connections and be aware of current cluster state, bad/slow nodes etc... The java php libraries https://github.com/s7/scale7-pelops and https://github.com/Netflix/astyanax looks like good choices. But since I'm not a java programmer I'd first check if someone already have done this or if someone could give guidelines on how to extend one of the above java clients to also proxy thrift connections. Thanks Alex
Re: Regarding nodetool tpstats
Cheers :) - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/03/2012, at 1:55 PM, Watanabe Maki wrote: - InternalResponseStage Handles response to non client initiated messages, including bootstrap, schema check, etc. maki On 2012/03/26, at 2:18, aaron morton aa...@thelastpickle.com wrote: Work is broken up into a series of stages. - ReadStage - performing a local read. - RequestResponseStage - handling responses from other nodes. - MutationStage - performing a local write. - ReplicateOnWriteStage - for counter writes, replicates after a local write - GossipStage - handles gossip rounds (ever second) - AntiEntropyStage - repairs consistency (nodetool reapir) - MigrationStage - schema changes - MemtablePostFlusher - flush commit log (and other things) after flushing memtable. - StreamStage - streams data between nodes during repair - FlushWrite - Flush memtable to disk - MiscStage - does misc stuff - InternalResponseStage - not sure. - HintedHandoff - sends missed mutations to other nodes. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/03/2012, at 9:49 PM, Rishabh Agrawal wrote: Hello I am new to Cassandra and when I run tpstats on my node (Cassandra 1.0.7) I get following output: Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 12 0 0 RequestResponseStage 0 0 20 0 0 MutationStage 0 0 14 0 0 ReadRepairStage 0 0 14 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0 273665 0 0 AntiEntropyStage 0 0 0 0 0 MigrationStage0 0119 0 0 MemtablePostFlusher 0 0 13 0 0 StreamStage 0 0 0 0 0 FlushWriter 0 0 13 0 0 MiscStage 0 0 0 0 0 InternalResponseStage 0 0318 0 0 HintedHandoff 0 0 2 0 0 Can anyone help with what each pool name stands for? Thanks and Regards Rishabh Agrawal Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Sample Data
The 'stress' tool that you can find in a source checkout of cassandra sounds like what you're looking for. It's designed to write data to (or read data from) a cluster as fast as possible, and has plenty of options for tweaking the type of data it inserts. You can read more about it here: http://www.datastax.com/docs/1.0/references/stress_java On Mon, Mar 26, 2012 at 6:59 AM, Rishabh Agrawal rishabh.agra...@impetus.co.in wrote: Thanks for the prompt response. I am looking at a solution which can generate scripts of statements which I can run or tweak to run in linux environment. -Original Message- From: Benoit Perroud [mailto:ben...@noisette.ch] Sent: Monday, March 26, 2012 4:14 PM To: user@cassandra.apache.org Subject: Re: Sample Data Cassandra unit (https://github.com/jsevellec/cassandra-unit) could help you. Le 26 mars 2012 12:34, Rishabh Agrawal rishabh.agra...@impetus.co.in a écrit : Hello, I wish to test certain things in Cassandra so can someone help me with sample database or sample database data generator which can help me flood Cassandra nodes with large amount of data. Thanks and Regards Rishabh Agrawal Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- sent from my Nokia 3210 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- Tyler Hobbs DataStax http://datastax.com/
Re: Sample Data
I wish to test certain things in Cassandra so can someone help me with sample database or sample database data generator which can help me flood Cassandra nodes with large amount of data. I would recommend YCSB: https://github.com/brianfrankcooper/YCSB/wiki/ Thanks, Tom
Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?
I need to use the range beyond the integer32 type range, so I am using Long to write those keys. I am afraid if this might lead to collisions with the previously stored integer keys in the same CF even if I leave out the int32 type range. On Mon, Mar 26, 2012 at 10:51 PM, aaron morton aa...@thelastpickle.comwrote: without them overlapping/disturbing each other (assuming that keys lie in above domains) ? Not sure what you mean by overlapping. 42 as a int and 42 as a long are the same key. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 9:47 PM, Ertio Lew wrote: I have been writing rows to a CF all with integer(4 byte) keys. So my CF contains rows with keys in the entire range from Integer.MIN_VALUE to Integer.MAX_VALUE. Now I want to store Long type keys as well in this CF **without disturbing the integer keys. The range of Long type keys would be excluding the integers's range ie (-2^63 to -2^31) and (2^31 to 2^63). Would it be safe to mix the integer long keys in single CF without them overlapping/disturbing each other (assuming that keys lie in above domains) ?
Re: Estimation of memtable size are wrong
Yes i noticed that. Its not too often, about 1 times per week. The assumption would be that the workload stabilises over time. INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186) CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0 (just-counted was 16.354632747474547). calculation took 611ms for 8287 columns Duh, forgot about the 25% fudge factor. 64 * 1.25 = 80. It's working as intended. The serialised bytes is the total throughput, which includes overwrites. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/03/2012, at 9:11 PM, Radim Kolar wrote: Dne 26.3.2012 0:36, aaron morton napsal(a): 1. its not possible to run them more often? There should be some limit - run live/serialized calculation at least once per hour. They took just few seconds. The live ratio is updated every time the operation count (since startup) for the CF doubles. Yes i noticed that. Its not too often, about 1 times per week. The ratio here is a strange 105363280 100.48 MB / 1317041 / 1.26 Mb = 80. The live ratio is capped at 64. Can you see any log messages about the live ratio for this CF ? Last report from problematic CF: INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186) CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0 (just-counted was 16.354632747474547). calculation took 611ms for 8287 columns
Re: One or Two clusters?
Use one cluster. Use lots-o-machines. The read and write paths do not directly interfere with each other like they do in a RDBMS. Compaction created by writes can suck up disk IO, but this is throttled so in practice it is not such a big problem. Excessive GC created by reads or compaction may slow down the server, but you will want to avoid them anyway. The one caveat is: it depends on how you are transforming the data. If you have a are using Hadoop consider creating a single cluster with multiple DC's (like Data Stax do). One for OLTP and one for OLAP, do the hadoop work in the OLAP DC and have the online app read-write to the OLTP one. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/03/2012, at 3:22 AM, Oleg Proudnikov wrote: Hi, Could someone please help me understand the benefits of having a single large cluster vs. having two smaller clusters separated by the pattern of use? One, MOSTLY WRITE cluster could incrementally accumulate large amounts of data throughout the day. The daily increment would be processed, summarized and stored into the second READ cluster at night. Users would only need to interact with the READ portion of the overall system mostly during the day. Writes would be spread throughout the day and will be a function of user activity with some bulk load activity from time to time. WRITE portion of the database would be an order of magnitude larger than the READ portion. READ portion would have an an order of magnitude higher traffic except during periodic bulk loads. On one hand, If I were to have a single cluster I would have more resources for the users and potentially better scalability. A single cluster may need fewer servers overall, provided write activity does not affect reads... On the other hand, write activity and associated memory consumption, GC, as well as maintenance riutines may affect READ system. The system will be hosted on EC2. I would appreciate any thoughts. Regards, Oleg
Re: Performance overhead when using start and end columns
See the test's in the article. The code I used for profiling is also available. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote: Thanks but if I do have to specify start and end columns then how much overhead roughly would that translate to since reading metadata should be constant overall? On Mon, Mar 26, 2012 at 10:18 AM, aaron morton aa...@thelastpickle.com wrote: Some information on query plans http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ Tl;Dr; Select columns with no start, in the natural Comparator order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote: I have rows with around 2K-50K columns but when I do a query I only need to fetch few columns between start and end columns. I was wondering what performance overhead does it cause by using slice query with start and end columns? Looking at the code it looks like when you give start and end column it goes in IndexSliceReader logic, but it's hard to tell how much overhead on an average one would see? Or is it even worth worrying about?
Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?
Only if you reuse a row key. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/03/2012, at 6:38 AM, Ertio Lew wrote: I need to use the range beyond the integer32 type range, so I am using Long to write those keys. I am afraid if this might lead to collisions with the previously stored integer keys in the same CF even if I leave out the int32 type range. On Mon, Mar 26, 2012 at 10:51 PM, aaron morton aa...@thelastpickle.com wrote: without them overlapping/disturbing each other (assuming that keys lie in above domains) ? Not sure what you mean by overlapping. 42 as a int and 42 as a long are the same key. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 9:47 PM, Ertio Lew wrote: I have been writing rows to a CF all with integer(4 byte) keys. So my CF contains rows with keys in the entire range from Integer.MIN_VALUE to Integer.MAX_VALUE. Now I want to store Long type keys as well in this CF **without disturbing the integer keys. The range of Long type keys would be excluding the integers's range ie (-2^63 to -2^31) and (2^31 to 2^63). Would it be safe to mix the integer long keys in single CF without them overlapping/disturbing each other (assuming that keys lie in above domains) ?
Re: Adding Long type rows to a CF containing Integer(32) type row keys, without overlapping ?
Only if you reuse a row key. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/03/2012, at 6:38 AM, Ertio Lew wrote: I need to use the range beyond the integer32 type range, so I am using Long to write those keys. I am afraid if this might lead to collisions with the previously stored integer keys in the same CF even if I leave out the int32 type range. On Mon, Mar 26, 2012 at 10:51 PM, aaron morton aa...@thelastpickle.com wrote: without them overlapping/disturbing each other (assuming that keys lie in above domains) ? Not sure what you mean by overlapping. 42 as a int and 42 as a long are the same key. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 9:47 PM, Ertio Lew wrote: I have been writing rows to a CF all with integer(4 byte) keys. So my CF contains rows with keys in the entire range from Integer.MIN_VALUE to Integer.MAX_VALUE. Now I want to store Long type keys as well in this CF **without disturbing the integer keys. The range of Long type keys would be excluding the integers's range ie (-2^63 to -2^31) and (2^31 to 2^63). Would it be safe to mix the integer long keys in single CF without them overlapping/disturbing each other (assuming that keys lie in above domains) ?
Re: Performance overhead when using start and end columns
Hi Aaron, Thanks for the benchmark. The matrix is valuable. Thanks, Charlie (@mujiang) 一个 木匠 === Data Architect Developer http://mujiang.blogspot.com On Mon, Mar 26, 2012 at 10:53 AM, aaron morton aa...@thelastpickle.com wrote: See the test's in the article. The code I used for profiling is also available. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote: Thanks but if I do have to specify start and end columns then how much overhead roughly would that translate to since reading metadata should be constant overall? On Mon, Mar 26, 2012 at 10:18 AM, aaron morton aa...@thelastpickle.com wrote: Some information on query plans http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ Tl;Dr; Select columns with no start, in the natural Comparator order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote: I have rows with around 2K-50K columns but when I do a query I only need to fetch few columns between start and end columns. I was wondering what performance overhead does it cause by using slice query with start and end columns? Looking at the code it looks like when you give start and end column it goes in IndexSliceReader logic, but it's hard to tell how much overhead on an average one would see? Or is it even worth worrying about?
Re: How to store a list of values?
Save the skills in a single column in json format. Job done. Good if it have fixed set of skills, then any add or delete changes need handle in app. -read column first-reformat JOSN-update column (2 thrift calls). skill~Java: null, skill~Cassandra: null This is also good option, but any schema change will break it. On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote: True. But I don't need the skills to be searchable, so I'd rather embed them in the user than add another top-level CF. I was thinking of doing something along the lines of adding a skills super column to the User table: skills: { 'java': null, 'c++': null, 'cobol': null } However, I'm still not sure yet how to accomplish this with Astyanax. I've only figured out how to make composite columns with predefined column names with it and not dynamic column names like this. On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote: In this case you only neem the columns for values. You don't need the column-values to hold multiple columns (the super-column principle). So a normal CF would work. 2012/3/26 Ben McCann b...@benmccann.com Thanks for the reply Samal. I did not realize that you could store a column with null value. Do you know if this solution would work with composite columns? It seems super columns are being phased out in favor of composites, but I do not understand composites very well yet. I'm trying to figure out if there's any way to accomplish what you've suggested using Astyanax https://github.com/Netflix/astyanax. Thanks for the help, Ben On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote: plus it is fully compatible with CQL. SELECT * FROM UserSkill WHERE KEY='ben'; On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote: I would take simple approach. create one other CF UserSkill with row key same as profile_cf key, In user_skill cf will add skill as column name and value null. Columns can be added or removed. UserProfile={ '*ben*'={ blah :blah blah :blah blah :blah } } UserSkill={ '*ben*'={ 'java':'' 'cassandra':'' . . . 'linux':'' 'skill':'infinity' } } On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote: I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben -- With kind regards, Robin Verlangen www.robinverlangen.nl
Server side scripting support in Cassandra - go Python !
Howdy, Some Polyglot Persistence(NoSQL) products started support server side scripting, similar to RDBMS store procedure. E.g. Redis Lua scripting. I wish it is Python when Cassandra has the server side scripting feature. FYI, http://antirez.com/post/250 http://nosql.mypopescu.com/post/19949274021/alchemydb-an-integrated-graphdb-rdbms-kv-store server side scripting support is an extremely powerful tool. Having processing close to data (i.e. data locality) is a well known advantage, ..., it can open the doors to completely new features. Thanks, Charlie (@mujiang) 一个 木匠 === Data Architect Developer http://mujiang.blogspot.com
Re: Performance overhead when using start and end columns
Thanks! On Mon, Mar 26, 2012 at 10:53 AM, aaron morton aa...@thelastpickle.comwrote: See the test's in the article. The code I used for profiling is also available. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote: Thanks but if I do have to specify start and end columns then how much overhead roughly would that translate to since reading metadata should be constant overall? On Mon, Mar 26, 2012 at 10:18 AM, aaron morton aa...@thelastpickle.comwrote: Some information on query plans http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ Tl;Dr; Select columns with no start, in the natural Comparator order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote: I have rows with around 2K-50K columns but when I do a query I only need to fetch few columns between start and end columns. I was wondering what performance overhead does it cause by using slice query with start and end columns? Looking at the code it looks like when you give start and end column it goes in IndexSliceReader logic, but it's hard to tell how much overhead on an average one would see? Or is it even worth worrying about?
Re: CQL Reversed and Comparator reversed=true
Thank you Aaron! On Mon, Mar 26, 2012 at 10:44 PM, aaron morton aa...@thelastpickle.comwrote: create column family Comments with comparator = 'CompositeType(UTF8Type(reversed=True), UTF8Type)' and key_validation_class = 'UTF8Type' and default_validation_class = 'UTF8Type'; Looks ok. SELECT FIRST 100 REVERSED 'z'..'0' from Comments where key = 'xyz'; try SELECT FIRST 100 REVERSED * from Comments where key = 'xyz'; Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/03/2012, at 9:41 AM, Praveen Baratam wrote: Hello, I am a bit confused about how to store and retrieve columns in Reversed order. Currently I store comments for every blog post in a wide row per post. I want to store and retrieve comments for each blog post in reversed/descending order for efficiency as we display comments in descending order by time. Each comment gets a time based sortable id which is stored as part of the first component of the composite type. Below is the create statement for the column family that stores comments for posts. create column family Comments with comparator = 'CompositeType(UTF8Type(reversed=True), UTF8Type)' and key_validation_class = 'UTF8Type' and default_validation_class = 'UTF8Type'; and the CQL I use to retrieve is as follows SELECT FIRST 100 REVERSED 'z'..'0' from Comments where key = 'xyz'; Am I doing the right thing? Are the comments stored in descending time order in CF and with this CQL Query am I retrieving the columns in their natural sort order with out any additional sorting overhead? Thank you.
multi region EC2
all, we just about ready to push our app live and just have some cassandra tuning left. i've been currently running a 4 node (rep factor 3, simple) in EC2 using the datastax AMIs (thanks datastax). so after reading through a bunch of docs i have a few questions. - what is the min and recommended number of nodes to use in multiple region cluster. we only have a single app server right now. - can i migrate the replication strategy one node at a time or do i need to shut to the whole cluster to do this? - what type of performance hit am i going to take having my app server cross regions to get to a node. coming from the SQL world, this is usually not a good thing. if i was to stick in a single region is the any best practices for backing up a whole cluster? from the docs it looks like i need to snapshot each node one by one and then copy off the snapshot to somewhere offsite. thanks, deno
copy data for dev
all, is there a easy way to take a 4 node snapshot and restore it on my single node dev cluster? thanks, deno
what other ports than 7199 need to be open for nodetool to work?
Hi, We opened port 7199 on a cassandra node, but were unable to get a nodetool to talk to it remotely unless we turn off the firewall entirely. So what other ports should be opened for this -- online posts all indicate that JMX uses a random dynamic port, which would be difficult to create a firewall exception unless writing a custom java agent. So we just wondering if cassandra nodetool uses a specific port/port range. Thanks. -- Y.
Re: what other ports than 7199 need to be open for nodetool to work?
You are correct about the second random dynamic port. There is a ticket open to fix that as well as some other jmx issues: https://issues.apache.org/jira/browse/CASSANDRA-2967 Regarding nodetool, it doesn't do anything special. Nodetool is often used to connect to 'localhost' which generally does not have any firewall rules at all so it usually works. It is still connecting to a random second port though. On Mon, Mar 26, 2012 at 2:42 PM, Yiming Sun yiming@gmail.com wrote: Hi, We opened port 7199 on a cassandra node, but were unable to get a nodetool to talk to it remotely unless we turn off the firewall entirely. So what other ports should be opened for this -- online posts all indicate that JMX uses a random dynamic port, which would be difficult to create a firewall exception unless writing a custom java agent. So we just wondering if cassandra nodetool uses a specific port/port range. Thanks. -- Y.
Re: what other ports than 7199 need to be open for nodetool to work?
Thanks Nick -- I didn't know about this ticket. Good to know. Yes, nodetool doesn't do anything special - but I still wish I could use nodetool to examine other nodes, instead of having to ssh to other nodes first and then nodetool each one (i am lazy :-). -- Y. On Mon, Mar 26, 2012 at 3:50 PM, Nick Bailey n...@datastax.com wrote: You are correct about the second random dynamic port. There is a ticket open to fix that as well as some other jmx issues: https://issues.apache.org/jira/browse/CASSANDRA-2967 Regarding nodetool, it doesn't do anything special. Nodetool is often used to connect to 'localhost' which generally does not have any firewall rules at all so it usually works. It is still connecting to a random second port though. On Mon, Mar 26, 2012 at 2:42 PM, Yiming Sun yiming@gmail.com wrote: Hi, We opened port 7199 on a cassandra node, but were unable to get a nodetool to talk to it remotely unless we turn off the firewall entirely. So what other ports should be opened for this -- online posts all indicate that JMX uses a random dynamic port, which would be difficult to create a firewall exception unless writing a custom java agent. So we just wondering if cassandra nodetool uses a specific port/port range. Thanks. -- Y.
Re: How to store a list of values?
but any schema change will break it How do you mean? You don't have to specify the columns in Cassandra so it should work perfect. Except for the skill~ is preserverd for your list. 2012/3/26 samal samalgo...@gmail.com Save the skills in a single column in json format. Job done. Good if it have fixed set of skills, then any add or delete changes need handle in app. -read column first-reformat JOSN-update column (2 thrift calls). skill~Java: null, skill~Cassandra: null This is also good option, but any schema change will break it. On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote: True. But I don't need the skills to be searchable, so I'd rather embed them in the user than add another top-level CF. I was thinking of doing something along the lines of adding a skills super column to the User table: skills: { 'java': null, 'c++': null, 'cobol': null } However, I'm still not sure yet how to accomplish this with Astyanax. I've only figured out how to make composite columns with predefined column names with it and not dynamic column names like this. On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote: In this case you only neem the columns for values. You don't need the column-values to hold multiple columns (the super-column principle). So a normal CF would work. 2012/3/26 Ben McCann b...@benmccann.com Thanks for the reply Samal. I did not realize that you could store a column with null value. Do you know if this solution would work with composite columns? It seems super columns are being phased out in favor of composites, but I do not understand composites very well yet. I'm trying to figure out if there's any way to accomplish what you've suggested using Astyanax https://github.com/Netflix/astyanax. Thanks for the help, Ben On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote: plus it is fully compatible with CQL. SELECT * FROM UserSkill WHERE KEY='ben'; On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote: I would take simple approach. create one other CF UserSkill with row key same as profile_cf key, In user_skill cf will add skill as column name and value null. Columns can be added or removed. UserProfile={ '*ben*'={ blah :blah blah :blah blah :blah } } UserSkill={ '*ben*'={ 'java':'' 'cassandra':'' . . . 'linux':'' 'skill':'infinity' } } On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote: I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Performance overhead when using start and end columns
@Aaron: Very interesting article! Mentioned it on my Dutch blog. 2012/3/26 Mohit Anchlia mohitanch...@gmail.com Thanks! On Mon, Mar 26, 2012 at 10:53 AM, aaron morton aa...@thelastpickle.comwrote: See the test's in the article. The code I used for profiling is also available. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote: Thanks but if I do have to specify start and end columns then how much overhead roughly would that translate to since reading metadata should be constant overall? On Mon, Mar 26, 2012 at 10:18 AM, aaron morton aa...@thelastpickle.comwrote: Some information on query plans http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ Tl;Dr; Select columns with no start, in the natural Comparator order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote: I have rows with around 2K-50K columns but when I do a query I only need to fetch few columns between start and end columns. I was wondering what performance overhead does it cause by using slice query with start and end columns? Looking at the code it looks like when you give start and end column it goes in IndexSliceReader logic, but it's hard to tell how much overhead on an average one would see? Or is it even worth worrying about? -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: what other ports than 7199 need to be open for nodetool to work?
I have documented some of the things you can do to make the random port nature of JMX happy. http://www.jointhegrid.com/highperfcassandra/?p=140 Other options are setting up mx4j or using jmxterm, or setting up a sock proxy and tell jconsole to use your proxy. Also there is the xwindows over vnc over ssh route. Edward On Mon, Mar 26, 2012 at 3:54 PM, Yiming Sun yiming@gmail.com wrote: Thanks Nick -- I didn't know about this ticket. Good to know. Yes, nodetool doesn't do anything special - but I still wish I could use nodetool to examine other nodes, instead of having to ssh to other nodes first and then nodetool each one (i am lazy :-). -- Y. On Mon, Mar 26, 2012 at 3:50 PM, Nick Bailey n...@datastax.com wrote: You are correct about the second random dynamic port. There is a ticket open to fix that as well as some other jmx issues: https://issues.apache.org/jira/browse/CASSANDRA-2967 Regarding nodetool, it doesn't do anything special. Nodetool is often used to connect to 'localhost' which generally does not have any firewall rules at all so it usually works. It is still connecting to a random second port though. On Mon, Mar 26, 2012 at 2:42 PM, Yiming Sun yiming@gmail.com wrote: Hi, We opened port 7199 on a cassandra node, but were unable to get a nodetool to talk to it remotely unless we turn off the firewall entirely. So what other ports should be opened for this -- online posts all indicate that JMX uses a random dynamic port, which would be difficult to create a firewall exception unless writing a custom java agent. So we just wondering if cassandra nodetool uses a specific port/port range. Thanks. -- Y.
Re: multi region EC2
(rep factor 3, simple) if this means you are using the SimpleStrategy I would recommend using the NetworkTopologyStrategy. - what is the min and recommended number of nodes to use in multiple region cluster. we only have a single app server right now. It depends on how exciting you want your life to be. You probably want at least 3 nodes in each cassandra DC / EC2 region. These could be spread across 3 AZ's in an EC2 region. Some background on availability http://thelastpickle.com/2011/06/13/Down-For-Me/ - can i migrate the replication strategy one node at a time or do i need to shut to the whole cluster to do this? Just use the NTS from the start. - what type of performance hit am i going to take having my app server cross regions to get to a node. coming from the SQL world, this is usually not a good thing. You will need to do some tests. Using LOCAL_QUORUM requests will only block on nodes in the local DC. if i was to stick in a single region is the any best practices for backing up a whole cluster? from the docs it looks like i need to snapshot each node one by one and then copy off the snapshot to somewhere offsite. yes. Good luck. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/03/2012, at 8:12 AM, Deno Vichas wrote: all, we just about ready to push our app live and just have some cassandra tuning left. i've been currently running a 4 node (rep factor 3, simple) in EC2 using the datastax AMIs (thanks datastax). so after reading through a bunch of docs i have a few questions. - what is the min and recommended number of nodes to use in multiple region cluster. we only have a single app server right now. - can i migrate the replication strategy one node at a time or do i need to shut to the whole cluster to do this? - what type of performance hit am i going to take having my app server cross regions to get to a node. coming from the SQL world, this is usually not a good thing. if i was to stick in a single region is the any best practices for backing up a whole cluster? from the docs it looks like i need to snapshot each node one by one and then copy off the snapshot to somewhere offsite. thanks, deno
Schema advice/help
I need to store activities by each user, on 5 items types. I always want to read last 10 activities on each item type, by a user (ie, total activities to read at a time =50). I am wanting to store these activities in a single row for each user so that they can be retrieved in single row query, since I want to read all the last 10 activities on each item.. I am thinking of creating composite names appending itemtype : activityId(activityId is just timestamp value) but then, I don't see about how to read the last 10 activities from all itemtypes. Any ideas about schema to do this better way ?
cassandra 1.08 on java7 and win7
I think I have cassandra the server started In another window: cassandra-cli.bat -h localhost -p 9160 Starting Cassandra Client Connected to: Test Cluster on localhost/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] create keyspace DEMO; log4j:WARN No appenders could be found for logger (org.apache.cassandra.config.DatabaseDescriptor). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Cannot locate cassandra.yaml Fatal configuration error; unable to start server. See log for stacktrace. C:\Workspace\cassandra\apache-cassandra-1.0.8\bin anybody seen this before ? -- Frank Hsueh | frank.hs...@gmail.com
Re: cassandra 1.08 on java7 and win7
Ben Coverston wrote earlier today: Use a version of the Java 6 runtime, Cassandra hasn't been tested at all with the Java 7 runtime So I think that might be a good way to start. 2012/3/26 Frank Hsueh frank.hs...@gmail.com I think I have cassandra the server started In another window: cassandra-cli.bat -h localhost -p 9160 Starting Cassandra Client Connected to: Test Cluster on localhost/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] create keyspace DEMO; log4j:WARN No appenders could be found for logger (org.apache.cassandra.config.DatabaseDescriptor). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Cannot locate cassandra.yaml Fatal configuration error; unable to start server. See log for stacktrace. C:\Workspace\cassandra\apache-cassandra-1.0.8\bin anybody seen this before ? -- Frank Hsueh | frank.hs...@gmail.com -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: cassandra 1.08 on java7 and win7
interesting. that behaviour _does_ happen in 1.0.8, but doesn't in 1.0.6 on windows 7 with Java 7. looks to be a problem with the CLI and not the actual Cassandra service. just tried it now. -sd On Mon, Mar 26, 2012 at 11:29 PM, R. Verlangen ro...@us2.nl wrote: Ben Coverston wrote earlier today: Use a version of the Java 6 runtime, Cassandra hasn't been tested at all with the Java 7 runtime So I think that might be a good way to start.
Re: cassandra 1.08 on java7 and win7
I'm using the latest of Java 1.6 from Oracle. On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote: Ben Coverston wrote earlier today: Use a version of the Java 6 runtime, Cassandra hasn't been tested at all with the Java 7 runtime So I think that might be a good way to start. 2012/3/26 Frank Hsueh frank.hs...@gmail.com I think I have cassandra the server started In another window: cassandra-cli.bat -h localhost -p 9160 Starting Cassandra Client Connected to: Test Cluster on localhost/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] create keyspace DEMO; log4j:WARN No appenders could be found for logger (org.apache.cassandra.config.DatabaseDescriptor). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Cannot locate cassandra.yaml Fatal configuration error; unable to start server. See log for stacktrace. C:\Workspace\cassandra\apache-cassandra-1.0.8\bin anybody seen this before ? -- Frank Hsueh | frank.hs...@gmail.com -- With kind regards, Robin Verlangen www.robinverlangen.nl -- Frank Hsueh | frank.hs...@gmail.com
Re: cassandra 1.08 on java7 and win7
err ... same thing happens with Java 1.6 On Mon, Mar 26, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.com wrote: I'm using the latest of Java 1.6 from Oracle. On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote: Ben Coverston wrote earlier today: Use a version of the Java 6 runtime, Cassandra hasn't been tested at all with the Java 7 runtime So I think that might be a good way to start. 2012/3/26 Frank Hsueh frank.hs...@gmail.com I think I have cassandra the server started In another window: cassandra-cli.bat -h localhost -p 9160 Starting Cassandra Client Connected to: Test Cluster on localhost/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] create keyspace DEMO; log4j:WARN No appenders could be found for logger (org.apache.cassandra.config.DatabaseDescriptor). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfigfor more info. Cannot locate cassandra.yaml Fatal configuration error; unable to start server. See log for stacktrace. C:\Workspace\cassandra\apache-cassandra-1.0.8\bin anybody seen this before ? -- Frank Hsueh | frank.hs...@gmail.com -- With kind regards, Robin Verlangen www.robinverlangen.nl -- Frank Hsueh | frank.hs...@gmail.com -- Frank Hsueh | frank.hs...@gmail.com
Re: cassandra 1.08 on java7 and win7
best to open an issue: https://issues.apache.org/jira/browse/CASSANDRA On Mon, Mar 26, 2012 at 11:35 PM, Frank Hsueh frank.hs...@gmail.com wrote: err ... same thing happens with Java 1.6 On Mon, Mar 26, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.comwrote: I'm using the latest of Java 1.6 from Oracle. On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote: Ben Coverston wrote earlier today: Use a version of the Java 6 runtime, Cassandra hasn't been tested at all with the Java 7 runtime So I think that might be a good way to start. 2012/3/26 Frank Hsueh frank.hs...@gmail.com I think I have cassandra the server started In another window: cassandra-cli.bat -h localhost -p 9160 Starting Cassandra Client Connected to: Test Cluster on localhost/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] create keyspace DEMO; log4j:WARN No appenders could be found for logger (org.apache.cassandra.config.DatabaseDescriptor). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfigfor more info. Cannot locate cassandra.yaml Fatal configuration error; unable to start server. See log for stacktrace. C:\Workspace\cassandra\apache-cassandra-1.0.8\bin anybody seen this before ? -- Frank Hsueh | frank.hs...@gmail.com -- With kind regards, Robin Verlangen www.robinverlangen.nl -- Frank Hsueh | frank.hs...@gmail.com -- Frank Hsueh | frank.hs...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com
The cassandra gem on rubygems.org needs to be updated
If you're not a maintainer of the cassandra gem on rubygems.org, you can stop reading. And if you are ... I'm just bringing this up to your attention: https://github.com/twitter/cassandra/issues/142 Thanks! -- Ilya
Re: cassandra 1.08 on java7 and win7
create keyspace via cassandra cli fails https://issues.apache.org/jira/browse/CASSANDRA-4085 On Mon, Mar 26, 2012 at 2:44 PM, Sasha Dolgy sdo...@gmail.com wrote: best to open an issue: https://issues.apache.org/jira/browse/CASSANDRA On Mon, Mar 26, 2012 at 11:35 PM, Frank Hsueh frank.hs...@gmail.comwrote: err ... same thing happens with Java 1.6 On Mon, Mar 26, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.comwrote: I'm using the latest of Java 1.6 from Oracle. On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote: Ben Coverston wrote earlier today: Use a version of the Java 6 runtime, Cassandra hasn't been tested at all with the Java 7 runtime So I think that might be a good way to start. 2012/3/26 Frank Hsueh frank.hs...@gmail.com I think I have cassandra the server started In another window: cassandra-cli.bat -h localhost -p 9160 Starting Cassandra Client Connected to: Test Cluster on localhost/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] create keyspace DEMO; log4j:WARN No appenders could be found for logger (org.apache.cassandra.config.DatabaseDescriptor). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfigfor more info. Cannot locate cassandra.yaml Fatal configuration error; unable to start server. See log for stacktrace. C:\Workspace\cassandra\apache-cassandra-1.0.8\bin anybody seen this before ? -- Frank Hsueh | frank.hs...@gmail.com -- With kind regards, Robin Verlangen www.robinverlangen.nl -- Frank Hsueh | frank.hs...@gmail.com -- Frank Hsueh | frank.hs...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com -- Frank Hsueh | frank.hs...@gmail.com
Re: multi region EC2
On 3/26/2012 2:15 PM, aaron morton wrote: - can i migrate the replication strategy one node at a time or do i need to shut to the whole cluster to do this? Just use the NTS from the start. but what if i already have a bunch (8g per node) data that i need and i don't have a way to re-create it. thanks, deno
Internal error processing get_slice (NullPointerException)
Has anyone seen this particular NPE before from Cassandra? This is on 1.0.8. It seems to happen transiently on multiple nodes in my cluster, every so often, and goes away. ERROR [Thrift:45] 2012-03-26 19:59:12,024 Cassandra.java (line 3041) Internal error processing get_slice java.lang.NullPointerException at org.apache.cassandra.db.SliceFromReadCommand.maybeGenerateRetryCommand(SliceFromReadCommand.java:76) at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:724) at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:564) at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:128) at org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:283) at org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:365) at org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:326) at org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:3033) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) The line in question is (I think) the one below, so it looks like the column family reference for a row can sometimes be null? int liveColumnsInRow = row != null ? row.cf.getLiveColumnCount() : 0; Thanks, John
Re: Fwd: information on cassandra
auto_bootstrap has been removed from cassandra.yaml and always enabled since 1.0. fyi. maki 2012/3/26 R. Verlangen ro...@us2.nl: Yes, you can add nodes to a running cluster. It's very simple: configure the cluster name and seed node(s) in cassandra.yaml, set auto_bootstrap to true and start the node. 2012/3/26 puneet loya puneetl...@gmail.com 5n.. consider i m starting on a single node. can I add nodes later?? plz reply :) On Sun, Mar 25, 2012 at 7:41 PM, Ertio Lew ertio...@gmail.com wrote: I guess 2 node cluster with RF=2 might also be a starting point. Isn't it ? Are there any issues with this ? On Sun, Mar 25, 2012 at 12:20 AM, samal samalgo...@gmail.com wrote: Cassandra has distributed architecture. So 1 node does not fit into it. although it can used but you loose its benefits , ok if you are just playing around, use vm to learn how cluster communicate, handle request. To get full tolerance, redundancy and consistency minimum 3 node is required. Imp read here: http://wiki.apache.org/cassandra/ http://www.datastax.com/docs/1.0/index http://thelastpickle.com/ http://www.acunu.com/blogs/all/ On Sat, Mar 24, 2012 at 11:37 PM, Garvita Mehta garvita.me...@tcs.com wrote: its not advisable to use cassandra on single node, as its basic definition says if a node fails, data still remains in the system, atleast 3 nodes must be there while setting up a cassandra cluster. Garvita Mehta CEG - Open Source Technology Group Tata Consultancy Services Ph:- +91 22 67324756 Mailto: garvita.me...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Outsourcing -puneet loya wrote: - To: user@cassandra.apache.org From: puneet loya puneetl...@gmail.com Date: 03/24/2012 06:36PM Subject: Fwd: information on cassandra hi, I m puneet, an engineering student. I would like to know that, is cassandra useful considering we just have a single node(rather a single system) having all the information. I m looking for decent response time for the database. can you please respond? Thank you , Regards, Puneet Loya =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: unbalanced ring
What version are you using? Anyway try nodetool repair compact. maki 2012/3/26 Tamar Fraenkel ta...@tok-media.com Hi! I created Amazon ring using datastax image and started filling the db. The cluster seems un-balanced. nodetool ring returns: Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 514.29 KB 33.33% 0 10.38.175.131 us-east 1c Up Normal 1.5 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 1.5 MB 33.33% 113427455640312821154458202477256070485 [default@tok] describe; Keyspace: tok: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] [default@tok] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 4687d620-7664-11e1--1bcb936807ff: [10.38.175.131, 10.34.158.33, 10.116.83.10] Any idea what is the cause? I am running similar code on local ring and it is balanced. How can I fix this? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: Error in FAQ?
I updated the FAQ to the best of my ability: http://wiki.apache.org/cassandra/FAQ#modify_cf_config On Mon, Mar 26, 2012 at 12:25 AM, R. Verlangen ro...@us2.nl wrote: If you want to modify a column family, just open the command line interface (cassandra-cli), connect to a node (probably: connect localhost/9160;). When you have to create your first keyspace type: create keyspace MyKeyspace; For modifying an existing keyspace type: use MyKeyspace; If you need more information you can just type help; Good luck! 2012/3/26 Ben McCann b...@benmccann.com Hmmm, I don't see anything regarding column families in cassandra.yaml. It seems like the answer for that question in the FAQ is very outdated. On Sun, Mar 25, 2012 at 4:04 PM, Serge Fonville serge.fonvi...@gmail.com wrote: Hi, 2012/3/26 Ben McCann b...@benmccann.com: There's a line that says Make necessary changes to your storage-conf.xml. I can't find this file. Does it still exist? If so, where should I look? I installed the packaged version of Cassandra available in the Datastax community edition. From http://wiki.apache.org/cassandra/StorageConfiguration Prior to the 0.7 release, Cassandra storage configuration is described by the conf/storage-conf.xml file. As of 0.7, it is described by the conf/cassandra.yaml file. After googling cassandra storage-conf.xml Kind regards/met vriendelijke groet, Serge Fonville http://www.sergefonville.nl Convince Google!! They need to add GAL support on Android (star to agree) http://code.google.com/p/android/issues/detail?id=4602 2012/3/26 Ben McCann b...@benmccann.com: There's a line that says Make necessary changes to your storage-conf.xml. I can't find this file. Does it still exist? If so, where should I look? I installed the packaged version of Cassandra available in the Datastax community edition. Thanks, Ben
Re: How to store a list of values?
On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote: but any schema change will break it How do you mean? You don't have to specify the columns in Cassandra so it should work perfect. Except for the skill~ is preserverd for your list. In case skill~ is decided to change to skill:: , it need to be handle at app level. Or otherwise had t update in all row, read it first, modify it, insert new version and delete old version.